docs/concepts/filesystems/random-access-compression.md - fuchsia - Git at Google

 # Random access compression in BlobFS

 The BlobFS filesystem in Fuchsia transparently compresses files in order to save
 disk space. BlobFS supports a number of different compression strategies, with
 [zstd](https://facebook.github.io/zstd/) being the default.

 One downside of file compression is that it can prevent *random access* into
 files. For most compression algorithms, the entire contents must be read from
 disk and decompressed to access a single byte.

 For BlobFS, this is a particular challenge when files are demand paged. Demand
 paging allows a file to be partially loaded into memory, which saves system
 memory. Full-file compression prevents a file from being partially loaded.

 The [chunked-compression](/src/lib/chunked-compression/) format and library in
 Fuchsia breaks compressed files up into frames that can be independently
 decompressed. This allows BlobFS to implement demand paging of compressed files,
 since the file can be partially loaded and decompressed.

 This document describes the design of the chunked-compression format and
 explains its use in Fuchsia.

 ## Design goals and non-goals

 The following are the goals that motivate the design of the chunked-compression
 format and library:

 *   **Random access decompression**. It must be possible to independently
     decompress frames of data without needing to decompress the entire file.

 *   **Flexible decompression API**. The library was designed to give clients
     control over the seek table, so clients have fine grained control over which
     frames they decompress. This supports use cases like demand paging, where
     the client (BlobFS) has more information about access patterns and can
     control read-ahead and prefetch precisely by decompressing specific frames.

     This is in contrast to a more managed API that hides the details of which
     frames contain which decompressed bytes.

 *   **Configurable frame sizes**. It must be possible to adjust the sizes of
     decompressed frames of data, to suit different use cases and different
     requirements.

 *   **Flexible compression layout**. The format supports more exotic frame
     layouts, such as having non-uniform frame sizes or having aligned frame
     starts, in order to accommodate future use cases that require more
     flexibility. For example, non-uniform frame sizes could be used to improve
     data locality by grouping together data that is read together into smaller
     or larger frames.

 *   **Comparable compression ratio to [zstd](https://facebook.github.io/zstd)**.
     Since zstd was the current default compression algorithm for BlobFS at the
     time of this document's writing, zstd's compression ratio is the baseline
     which the chunked-compression library is benchmarked against. The overhead
     due to framing and additional metadata must be minimal.

 *   **Configurable compression aggressiveness**. It must be possible to trade
     off slower compression speed in favor of better compression ratios.

 *   **Cross-platform library**. The library used to compress and decompress
     chunked compression archives must be usable both on Fuchsia and on the
     compilation host (e.g. Linux), to enable the use of the library in the build
     toolchain. This is necessary to compress files at build-time, for example
     when generating a base BlobFS image.

 The following are non-goals:

 *   **Format-level compatibility with zstd**. The chunked compression archive is
     not intended to be format-compatible with zstd, and regular zstd tooling
     will not work with chunked compression archives.

 ## Chunked compression

 ### Archive format

 A chunked archive consists of a header followed by zero or more
 [zstd](https://facebook.github.io/zstd/) frames.

 #### Header

 The header describes the format of the archive and contains a seek table that
 maps the compressed frames to decompressed space.

 ```
       0     1     2     3     4     5     6     7
    +-----+-----+-----+-----+-----+-----+-----+-----+
  0 |                 Magic Number                  |
    +-----+-----+-----+-----+-----+-----+-----+-----+
  8 |  Version  |  Reserved |       Num Frames      |  // Reserved must be zero.
    +-----+-----+-----+-----+-----+-----+-----+-----+
 16 |    Header CRC32       |        Reserved       |  // Reserved must be zero.
    +-----+-----+-----+-----+-----+-----+-----+-----+
 24 |                    Reserved                   |  // Reserved must be zero.
    +-----+-----+-----+-----+-----+-----+-----+-----+
 32 |                                               |
 40 |                   Seek Table                  |
 48 |                     Entry                     |
 56 |                                               |
    +-----+-----+-----+-----+-----+-----+-----+-----+
 .. |                                               |
 .. |                   Seek Table                  |
 .. |                     Entry                     |
 .. |                                               |
    +-----+-----+-----+-----+-----+-----+-----+-----+
 ```

 The "Header CRC32" is computed based on the entire header, including the seek
 table.

 ##### Seek table

 Each seek table entry describes a contiguous range of data in the compressed
 space, and where in the decompressed data it expands to.

 ```
    +-----+-----+-----+-----+-----+-----+-----+-----+
  0 |               Decompressed Offset             |
    +-----+-----+-----+-----+-----+-----+-----+-----+
  8 |                Decompressed Size              |
    +-----+-----+-----+-----+-----+-----+-----+-----+
 16 |                Compressed Offset              |
    +-----+-----+-----+-----+-----+-----+-----+-----+
 24 |                 Compressed Size               |
    +-----+-----+-----+-----+-----+-----+-----+-----+
 ```

 Seek table entries are *contiguous* in decompressed space, but may be
 *discontiguous* in compressed space. This is to support adding alignment and
 padding to output files to improve storage access efficiency.

 A seek table can hold up to 1023 entries (which results in a 32KiB header) and
 must contain at least 1 entry (which results in a 64 byte header). Typically,
 compressed frames immediately follow the seek table (but the format supports the
 compressed frames starting at any offset past the end of the seek table).

 ##### Seek table invariants

 *   I0: The first seek table entry must have decompressed offset 0.

 *   I1: The first seek table entry must have compressed offset greater than or
     equal to the size of the header.

 *   I2: Each entry's decompressed offset must be equal to the end of the
     previous frame (i.e. to the previous frame's decompressed offset+length).

 *   I3: Each entry's compressed offset must be greater than or equal to the end
     of the previous frame (i.e. to the previous frame's compressed
     offset+length).

 *   I4: Each entry must have a non-zero decompressed and compressed length.

 *   I5: No compressed frame may exceed the end of the file.

 #### Compressed frames

 Each compressed frame in the file is a regular zstd compressed frame. A given
 frame will map to some contiguous chunk of bytes in the decompressed file.

 Any ranges of bytes in the file not covered by the seek table are ignored.

 It is not a requirement that each frame of data is the same decompressed size,
 but the current implementation of compression splits the input data into
 equal-sized frames. The size of the frames is configurable during compression.

 ### Random-access decompression

 The seek table is used to look up the frames that contain a given decompressed
 range. When a given range in the decompressed file is requested, one or more
 compressed frames must be loaded and decompressed.

 For example, consider the following file with three frames:

 ```
 Decompressed Space            Compressed Space
 +-----------------+           +--------------+
 |                 | <-------- |              |
 |                 |           |     +--------|
 |                 |           +-----+        |
 +-----------------+   +------ |              |
 |                 | <-+       +--------------+
 |                 |       +-- |              |
 |                 |       |   +--------------=
 +-----------------+       |
 |                 | <-----+
 |       +---------|
 +-------+
 ```

 Accessing a byte range within a single frame only requires decompressing its
 corresponding compressed frame:

 ```
 Decompressed Space            Compressed Space
 +-----------------+           +--------------+
 |                 | <-------- |xxxxxxxxxxxxxx|
 |  xxxxxxxx       |           |xxxxx+--------|
 |                 |           +-----+        |
 +-----------------+   +------ |              |
 |                 | <-+       +--------------+
 |                 |       +-- |              |
 |                 |       |   +--------------=
 +-----------------+       |
 |                 | <-----+
 |       +---------|
 +-------+
 ```

 A range that spans several decompressed frames will require decompressing
 multiple compressed frames:

 ```
 Decompressed Space            Compressed Space
 +-----------------+           +--------------+
 |                 | <-------- |xxxxxxxxxxxxxx|
 |                 |           |xxxxx+--------|
 |              xxx|           +-----+xxxxxxxx|
 +-----------------+   +------ |xxxxxxxxxxxxxx|
 |xxxx             | <-+       +--------------+
 |                 |       +-- |              |
 |                 |       |   +--------------=
 +-----------------+       |
 |                 | <-----+
 |       +---------|
 +-------+
 ```

 ## Use in BlobFS

 In BlobFS, files are compressed using the chunked compression library to
 facilitate random access and demand paging.

 Currently, random access is only used when demand paging is enabled. With demand
 paging disabled, BlobFS will load and decompress the entire file on first
 access, buffering the file in memory while there are handles to the file.

 With demand paging enabled, BlobFS lazily loads in portions of files as they are
 accessed. BlobFS registers itself as the
 [pager](/docs/reference/kernel_objects/pager.md) for the VMO using the
 [zx_pager_create](/reference/syscalls/pager_create.md) syscall. When a
 non-present page is accessed in the VMO, a page fault occurs, which BlobFS
 handles.

 BlobFS looks up the decompressed frames containing the target page(s), and
 decompresses each frame. After decompressing each frame, the data is verified,
 and committed to the pager-backed VMO through the
 [zx_pager_supply_pages](/reference/syscalls/pager_supply_pages.md) syscall.
	# Random access compression in BlobFS

	The BlobFS filesystem in Fuchsia transparently compresses files in order to save
	disk space. BlobFS supports a number of different compression strategies, with
	[zstd](https://facebook.github.io/zstd/) being the default.

	One downside of file compression is that it can prevent random access into
	files. For most compression algorithms, the entire contents must be read from
	disk and decompressed to access a single byte.

	For BlobFS, this is a particular challenge when files are demand paged. Demand
	paging allows a file to be partially loaded into memory, which saves system
	memory. Full-file compression prevents a file from being partially loaded.

	The [chunked-compression](/src/lib/chunked-compression/) format and library in
	Fuchsia breaks compressed files up into frames that can be independently
	decompressed. This allows BlobFS to implement demand paging of compressed files,
	since the file can be partially loaded and decompressed.

	This document describes the design of the chunked-compression format and
	explains its use in Fuchsia.

	## Design goals and non-goals

	The following are the goals that motivate the design of the chunked-compression
	format and library:

	* Random access decompression. It must be possible to independently
	decompress frames of data without needing to decompress the entire file.

	* Flexible decompression API. The library was designed to give clients
	control over the seek table, so clients have fine grained control over which
	frames they decompress. This supports use cases like demand paging, where
	the client (BlobFS) has more information about access patterns and can
	control read-ahead and prefetch precisely by decompressing specific frames.

	This is in contrast to a more managed API that hides the details of which
	frames contain which decompressed bytes.

	* Configurable frame sizes. It must be possible to adjust the sizes of
	decompressed frames of data, to suit different use cases and different
	requirements.

	* Flexible compression layout. The format supports more exotic frame
	layouts, such as having non-uniform frame sizes or having aligned frame
	starts, in order to accommodate future use cases that require more
	flexibility. For example, non-uniform frame sizes could be used to improve
	data locality by grouping together data that is read together into smaller
	or larger frames.

	* Comparable compression ratio to [zstd](https://facebook.github.io/zstd).
	Since zstd was the current default compression algorithm for BlobFS at the
	time of this document's writing, zstd's compression ratio is the baseline
	which the chunked-compression library is benchmarked against. The overhead
	due to framing and additional metadata must be minimal.

	* Configurable compression aggressiveness. It must be possible to trade
	off slower compression speed in favor of better compression ratios.

	* Cross-platform library. The library used to compress and decompress
	chunked compression archives must be usable both on Fuchsia and on the
	compilation host (e.g. Linux), to enable the use of the library in the build
	toolchain. This is necessary to compress files at build-time, for example
	when generating a base BlobFS image.

	The following are non-goals:

	* Format-level compatibility with zstd. The chunked compression archive is
	not intended to be format-compatible with zstd, and regular zstd tooling
	will not work with chunked compression archives.

	## Chunked compression

	### Archive format

	A chunked archive consists of a header followed by zero or more
	[zstd](https://facebook.github.io/zstd/) frames.

	#### Header

	The header describes the format of the archive and contains a seek table that
	maps the compressed frames to decompressed space.

	```
	0 1 2 3 4 5 6 7
	+-----+-----+-----+-----+-----+-----+-----+-----+
	0 \| Magic Number \|
	+-----+-----+-----+-----+-----+-----+-----+-----+
	8 \| Version \| Reserved \| Num Frames \| // Reserved must be zero.
	+-----+-----+-----+-----+-----+-----+-----+-----+
	16 \| Header CRC32 \| Reserved \| // Reserved must be zero.
	+-----+-----+-----+-----+-----+-----+-----+-----+
	24 \| Reserved \| // Reserved must be zero.
	+-----+-----+-----+-----+-----+-----+-----+-----+
	32 \| \|
	40 \| Seek Table \|
	48 \| Entry \|
	56 \| \|
	+-----+-----+-----+-----+-----+-----+-----+-----+
	.. \| \|
	.. \| Seek Table \|
	.. \| Entry \|
	.. \| \|
	+-----+-----+-----+-----+-----+-----+-----+-----+
	```

	The "Header CRC32" is computed based on the entire header, including the seek
	table.

	##### Seek table

	Each seek table entry describes a contiguous range of data in the compressed
	space, and where in the decompressed data it expands to.

	```
	+-----+-----+-----+-----+-----+-----+-----+-----+
	0 \| Decompressed Offset \|
	+-----+-----+-----+-----+-----+-----+-----+-----+
	8 \| Decompressed Size \|
	+-----+-----+-----+-----+-----+-----+-----+-----+
	16 \| Compressed Offset \|
	+-----+-----+-----+-----+-----+-----+-----+-----+
	24 \| Compressed Size \|
	+-----+-----+-----+-----+-----+-----+-----+-----+
	```

	Seek table entries are contiguous in decompressed space, but may be
	discontiguous in compressed space. This is to support adding alignment and
	padding to output files to improve storage access efficiency.

	A seek table can hold up to 1023 entries (which results in a 32KiB header) and
	must contain at least 1 entry (which results in a 64 byte header). Typically,
	compressed frames immediately follow the seek table (but the format supports the
	compressed frames starting at any offset past the end of the seek table).

	##### Seek table invariants

	* I0: The first seek table entry must have decompressed offset 0.

	* I1: The first seek table entry must have compressed offset greater than or
	equal to the size of the header.

	* I2: Each entry's decompressed offset must be equal to the end of the
	previous frame (i.e. to the previous frame's decompressed offset+length).

	* I3: Each entry's compressed offset must be greater than or equal to the end
	of the previous frame (i.e. to the previous frame's compressed
	offset+length).

	* I4: Each entry must have a non-zero decompressed and compressed length.

	* I5: No compressed frame may exceed the end of the file.

	#### Compressed frames

	Each compressed frame in the file is a regular zstd compressed frame. A given
	frame will map to some contiguous chunk of bytes in the decompressed file.

	Any ranges of bytes in the file not covered by the seek table are ignored.

	It is not a requirement that each frame of data is the same decompressed size,
	but the current implementation of compression splits the input data into
	equal-sized frames. The size of the frames is configurable during compression.

	### Random-access decompression

	The seek table is used to look up the frames that contain a given decompressed
	range. When a given range in the decompressed file is requested, one or more
	compressed frames must be loaded and decompressed.

	For example, consider the following file with three frames:

	```
	Decompressed Space Compressed Space
	+-----------------+ +--------------+
	\| \| <-------- \| \|
	\| \| \| +--------\|
	\| \| +-----+ \|
	+-----------------+ +------ \| \|
	\| \| <-+ +--------------+
	\| \| +-- \| \|
	\| \| \| +--------------=
	+-----------------+ \|
	\| \| <-----+
	\| +---------\|
	+-------+
	```

	Accessing a byte range within a single frame only requires decompressing its
	corresponding compressed frame:

	```
	Decompressed Space Compressed Space
	+-----------------+ +--------------+
	\| \| <-------- \|xxxxxxxxxxxxxx\|
	\| xxxxxxxx \| \|xxxxx+--------\|
	\| \| +-----+ \|
	+-----------------+ +------ \| \|
	\| \| <-+ +--------------+
	\| \| +-- \| \|
	\| \| \| +--------------=
	+-----------------+ \|
	\| \| <-----+
	\| +---------\|
	+-------+
	```

	A range that spans several decompressed frames will require decompressing
	multiple compressed frames:

	```
	Decompressed Space Compressed Space
	+-----------------+ +--------------+
	\| \| <-------- \|xxxxxxxxxxxxxx\|
	\| \| \|xxxxx+--------\|
	\| xxx\| +-----+xxxxxxxx\|
	+-----------------+ +------ \|xxxxxxxxxxxxxx\|
	\|xxxx \| <-+ +--------------+
	\| \| +-- \| \|
	\| \| \| +--------------=
	+-----------------+ \|
	\| \| <-----+
	\| +---------\|
	+-------+
	```

	## Use in BlobFS

	In BlobFS, files are compressed using the chunked compression library to
	facilitate random access and demand paging.

	Currently, random access is only used when demand paging is enabled. With demand
	paging disabled, BlobFS will load and decompress the entire file on first
	access, buffering the file in memory while there are handles to the file.

	With demand paging enabled, BlobFS lazily loads in portions of files as they are
	accessed. BlobFS registers itself as the
	[pager](/docs/reference/kernel_objects/pager.md) for the VMO using the
	[zx_pager_create](/reference/syscalls/pager_create.md) syscall. When a
	non-present page is accessed in the VMO, a page fault occurs, which BlobFS
	handles.

	BlobFS looks up the decompressed frames containing the target page(s), and
	decompresses each frame. After decompressing each frame, the data is verified,
	and committed to the pager-backed VMO through the
	[zx_pager_supply_pages](/reference/syscalls/pager_supply_pages.md) syscall.