Cromfs filesystem structure / format description.

cromfs - Copyright (C) 1992,2007 Bisqwit (http://iki.fi/bisqwit/)
License: GPL3
Homepage: http://bisqwit.iki.fi/source/cromfs.html

VERSION 03
	Used in 1.5.4
	However, this version will still change.
	Backward compatibility will be maintained.

--------------------------------
Note: Inode 1 is assumed to be the root directory by the kernel.

Types:
	u64 = unsigned 64-bit little-endian integer
	u32 = unsigned 32-bit little-endian integer
	u24 = unsigned 24-bit little-endian integer
	u16 = unsigned 16-bit little-endian integer
	char = unsigned 8-bit byte
	smth[] = zero or more instances of smth (a contiguous array)
	Other types are STRUCTs, and are explained in the document.

OVERALL STRUCTURE

OBJECT: SUPERBLOCK
	Offset	Type	Meaning
	0000	u64	CROMFS SIGNATURE "CROMFS03"
	0008	u64	Location of BLKDATA within the filesystem
	0010	u64	Location of FBLKTAB within the filesystem (must be last)
	0018	u64	Location of the Inotab inode within the filesystem
	0020	u64	Location of the Root directory within the filesystem
	0028	u32	FSIZE Maximum size of uncompressed FBLOCK
	002C	u32	BSIZE Size of uncompressed blocks
	0030	u64	Bytes of file data on disk (only there for statvfs)
	0038    u64     Size of Rootdir inode in bytes (only in CROMFS03 when growth is enabled)
	0040    u64     Size of Inotab inode in bytes (only in CROMFS03 when growth is enabled)
	0048    u64     Size of BLKDATA in bytes (only in CROMFS03 when growth is enabled)
	....	INODE	ROOTDIR (root directory)
	....	INODE	INOTAB (only the "list of blocks" is used)
	....	BLKDATA	LZMA-compressed array of BLOCK entries.
	....	FBLOCK[] FBLKTAB = compressed storage

Since CROMFS03, padding is allowed between elements, using sparse files.
Since CROMFS02, ROOTDIR and INOTAB are stored compressed. (Before: uncompressed)

If the "size" fields are not present in the header, the order
of ROOTDIR, INOTAB, BLKDATA and FBLKTAB must be exactly as
indicated with no padding in between.
In any case, FBLKTAB must be the last item in the filesystem.

In CROMFS03, the INOTAB inode contains flag bits in the inode's "mode" field:
      byte 3   byte 2   byte 1   byte 0
      00000000 000000mb 0000vk23 0000000f
      f:
      	1 = fblocks are stored sparsely (padded to FSIZE)
      	      (this also causes inotab to be stored sparsely)
      	0 = fblocks are stored tightly (no padding)
      23:
        00 = blocknums in inodes are 32-bit
        01 = blocknums in inodes are 24-bit
        10 = blocknums in inodes are 16-bit
        11 = undefined
      k:
        1 = BLOCKs are packed
        0 = BLOCks are not packed
      v:
        1 = Each inode has an individual block size setting (variable block size)
        0 = The superblock's block size setting is global
      m:
        1 = Using MTF (move-to-front) filtering, 0 = not
            Note: MTF is no longer supported (since version 1.5.3). Don't use.
      b:
        1 = Using BWT (burrows-wheeler-transform) filtering, 0 = not
            Note: BWT is no longer supported (since version 1.5.3). Don't use.


Technically, ROOTDIR could be placed into INOTAB just like all the other inodes.
However, the reason it was not initially done so, was because ROOTDIR must have
inode number 1, and for writing inode 1, one must know the length of the inode
before being able to write other inodes, but in earlier versions of mkcromfs,
the ROOTDIR contents were made aware of last.

---------------------------

STRUCT: FBLOCK (size: 13 + n)
	0000	u32	length of compressed data
	0004	char[]	LZMA-compressed data, length indicated in above field
	0009	u64	length of uncompressed data (this field is part of the LZMA stream)
	(Note: Since cromfs version 1.0.5, FBLOCKs are variable-length.
	 Previously they were all of FSIZE size.)
	(Note: Since cromfs version 1.1.0, FSIZE indicates the maximum size of
	 uncompressed FBLOCKs. Previously it indicated the maximum size of compressed
	 FBLOCKs.)
	If MTF or BWT filters are enabled, they are operated on
	the decompressed FBLOCK data. The order when extracting
	is to first decode MTF, then decode BWT. In compressing
	the order is the opposite.

STRUCT: BLKDATA (LZMA-compressed) (size: 8*n)
	0000	BLOCK[]  BLKTAB = all blocks of the filesystem (indexed by block number)
	(Note: To handle BLKDATA effeciently, it must be decompressed entirely
	into the RAM when the block lists are needed. This typically might consume
	several megabytes of RAM. However, cromfs-driver deallocates the
	data periodically when the filesystem has been idle for some time.)

STRUCT: BLOCK (size: 8) (when BLOCKs are not packed)
	0000	u32	FBLOCK number (0=first FBLOCK, 1=second FBLOCK, etc)
	0004	u32	starting offset within the _uncompressed data_ for this data
	(Note: different BLOCKs may utilize the same data from same FBLOCK.
	 The regions which they use may overlap partially or completely.
	 They do not need to be aligned.)
	(This record refers to at most BSIZE bytes of data, but may actually
	 refer to less data if this is the last block of the file.)

STRUCT: BLOCK (size: 4) (when BLOCKs are packed)
	0000	u32	datavalue
	datavalue is decoded into FBLOCK number and starting offset as follows:
		offset    = datavalue MOD FSIZE
		fblocknum = datavalue /   FSIZE
	For meanings, see the "not packed" explanation above.

STRUCT: INODE (size: 24 + b*n) where b = block number size in bytes
                               size is rounded up to 4-byte boundary.
	0000	u32	mode
	0004	u32	mtime
	0008	u32	rdev, if a device, hardlink count otherwise
	000C	u16	uid
	000E	u16	gid
	0010	u64	size in bytes
	
	Only when variable block sizes are enabled:
	0018	u32	block size for this inode
		N = 001C
	else, 	N = 0018
	
	when block numbers are 32-bit:
	N	u32[]	(data locators: indexes to BLKTAB, 0=first BLOCK,1=second BLOCK,...)

	when block numbers are 24-bit:
	N	u24[]	(data locators: indexes to BLKTAB, 0=first BLOCK,1=second BLOCK,...)

	when block numbers are 16-bit:
	N	u16[]	(data locators: indexes to BLKTAB, 0=first BLOCK,1=second BLOCK,...)

	(Note: Location of uid&gid and rdev have been changed in version 1.1.2)

STRUCT: ENTRY (size: 9 + n)
	0000	u64	inode number
	0008	char[]	file name, nul-terminated

ALL FILES ARE COMPRESSED (the content is spread across different FBLOCKs)

FILE CONTENT WHEN: INOTAB (size: n)
	0000	INODE[] all inodes of the filesystem (note: INODE is variable-length).
	                the beginning of each inode is at offset ((inodenumber-2) * 4)
	                inodenumber 1 is the ROOTDIR (not stored in INOTAB),
	                and inodenumber 0 is error.

FILE CONTENT WHEN: DIRECTORY (size: 4 + (4+entrysize) * n)
	0000	u32	number of files in directory
	0004	u32[]	index into each file entry (from directory entry beginning)
	0004	ENTRY[]	each file entry (variable length)
	(Note: The files in a directory are sorted in an asciibetical order.
	 This enables an implementation of lookup() using a binary search,
	 minimizing the amount of data accessed (and thus yielding fast access
	 in case of heavy fragmentation).
	)
	(Note 2: Currently the algorithm in read_dir() assumes that the directory
	 entry pointers are in numeric order. If that does not hold, it will fail.
	)

FILE CONTENT WHEN: SYMLINK (size: n)
	0000	char[]	link text, not nul-terminated

FILE CONTENT WHEN: REGULAR FILES (size: n)
	0000	content	file content

For other types of files, file has no content (size=0, blocktab=empty)

--------------------------------
