| # Random access compression in BlobFS |
| |
| The BlobFS filesystem in Fuchsia transparently compresses files in order to save |
| disk space. BlobFS supports a number of different compression strategies, with |
| [zstd](https://facebook.github.io/zstd/) being the default. |
| |
| One downside of file compression is that it can prevent *random access* into |
| files. For most compression algorithms, the entire contents must be read from |
| disk and decompressed to access a single byte. |
| |
| For BlobFS, this is a particular challenge when files are demand paged. Demand |
| paging allows a file to be partially loaded into memory, which saves system |
| memory. Full-file compression prevents a file from being partially loaded. |
| |
| The [chunked-compression](/src/lib/chunked-compression/) format and library in |
| Fuchsia breaks compressed files up into frames that can be independently |
| decompressed. This allows BlobFS to implement demand paging of compressed files, |
| since the file can be partially loaded and decompressed. |
| |
| This document describes the design of the chunked-compression format and |
| explains its use in Fuchsia. |
| |
| ## Design goals and non-goals |
| |
| The following are the goals that motivate the design of the chunked-compression |
| format and library: |
| |
| * **Random access decompression**. It must be possible to independently |
| decompress frames of data without needing to decompress the entire file. |
| |
| * **Flexible decompression API**. The library was designed to give clients |
| control over the seek table, so clients have fine grained control over which |
| frames they decompress. This supports use cases like demand paging, where |
| the client (BlobFS) has more information about access patterns and can |
| control read-ahead and prefetch precisely by decompressing specific frames. |
| |
| This is in contrast to a more managed API that hides the details of which |
| frames contain which decompressed bytes. |
| |
| * **Configurable frame sizes**. It must be possible to adjust the sizes of |
| decompressed frames of data, to suit different use cases and different |
| requirements. |
| |
| * **Flexible compression layout**. The format supports more exotic frame |
| layouts, such as having non-uniform frame sizes or having aligned frame |
| starts, in order to accommodate future use cases that require more |
| flexibility. For example, non-uniform frame sizes could be used to improve |
| data locality by grouping together data that is read together into smaller |
| or larger frames. |
| |
| * **Comparable compression ratio to [zstd](https://facebook.github.io/zstd)**. |
| Since zstd was the current default compression algorithm for BlobFS at the |
| time of this document's writing, zstd's compression ratio is the baseline |
| which the chunked-compression library is benchmarked against. The overhead |
| due to framing and additional metadata must be minimal. |
| |
| * **Configurable compression aggressiveness**. It must be possible to trade |
| off slower compression speed in favor of better compression ratios. |
| |
| * **Cross-platform library**. The library used to compress and decompress |
| chunked compression archives must be usable both on Fuchsia and on the |
| compilation host (e.g. Linux), to enable the use of the library in the build |
| toolchain. This is necessary to compress files at build-time, for example |
| when generating a base BlobFS image. |
| |
| The following are non-goals: |
| |
| * **Format-level compatibility with zstd**. The chunked compression archive is |
| not intended to be format-compatible with zstd, and regular zstd tooling |
| will not work with chunked compression archives. |
| |
| ## Chunked compression |
| |
| ### Archive format |
| |
| A chunked archive consists of a header followed by zero or more |
| [zstd](https://facebook.github.io/zstd/) frames. |
| |
| #### Header |
| |
| The header describes the format of the archive and contains a seek table that |
| maps the compressed frames to decompressed space. |
| |
| ``` |
| 0 1 2 3 4 5 6 7 |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 0 | Magic Number | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 8 | Version | Reserved | Num Frames | // Reserved must be zero. |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 16 | Header CRC32 | Reserved | // Reserved must be zero. |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 24 | Reserved | // Reserved must be zero. |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 32 | | |
| 40 | Seek Table | |
| 48 | Entry | |
| 56 | | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| .. | | |
| .. | Seek Table | |
| .. | Entry | |
| .. | | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| ``` |
| |
| The "Header CRC32" is computed based on the entire header, including the seek |
| table. |
| |
| ##### Seek table |
| |
| Each seek table entry describes a contiguous range of data in the compressed |
| space, and where in the decompressed data it expands to. |
| |
| ``` |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 0 | Decompressed Offset | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 8 | Decompressed Size | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 16 | Compressed Offset | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| 24 | Compressed Size | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| ``` |
| |
| Seek table entries are *contiguous* in decompressed space, but may be |
| *discontiguous* in compressed space. This is to support adding alignment and |
| padding to output files to improve storage access efficiency. |
| |
| A seek table can hold up to 1023 entries (which results in a 32KiB header) and |
| must contain at least 1 entry (which results in a 64 byte header). Typically, |
| compressed frames immediately follow the seek table (but the format supports the |
| compressed frames starting at any offset past the end of the seek table). |
| |
| ##### Seek table invariants |
| |
| * I0: The first seek table entry must have decompressed offset 0. |
| |
| * I1: The first seek table entry must have compressed offset greater than or |
| equal to the size of the header. |
| |
| * I2: Each entry's decompressed offset must be equal to the end of the |
| previous frame (i.e. to the previous frame's decompressed offset+length). |
| |
| * I3: Each entry's compressed offset must be greater than or equal to the end |
| of the previous frame (i.e. to the previous frame's compressed |
| offset+length). |
| |
| * I4: Each entry must have a non-zero decompressed and compressed length. |
| |
| * I5: No compressed frame may exceed the end of the file. |
| |
| #### Compressed frames |
| |
| Each compressed frame in the file is a regular zstd compressed frame. A given |
| frame will map to some contiguous chunk of bytes in the decompressed file. |
| |
| Any ranges of bytes in the file not covered by the seek table are ignored. |
| |
| It is not a requirement that each frame of data is the same decompressed size, |
| but the current implementation of compression splits the input data into |
| equal-sized frames. The size of the frames is configurable during compression. |
| |
| ### Random-access decompression |
| |
| The seek table is used to look up the frames that contain a given decompressed |
| range. When a given range in the decompressed file is requested, one or more |
| compressed frames must be loaded and decompressed. |
| |
| For example, consider the following file with three frames: |
| |
| ``` |
| Decompressed Space Compressed Space |
| +-----------------+ +--------------+ |
| | | <-------- | | |
| | | | +--------| |
| | | +-----+ | |
| +-----------------+ +------ | | |
| | | <-+ +--------------+ |
| | | +-- | | |
| | | | +--------------= |
| +-----------------+ | |
| | | <-----+ |
| | +---------| |
| +-------+ |
| ``` |
| |
| Accessing a byte range within a single frame only requires decompressing its |
| corresponding compressed frame: |
| |
| ``` |
| Decompressed Space Compressed Space |
| +-----------------+ +--------------+ |
| | | <-------- |xxxxxxxxxxxxxx| |
| | xxxxxxxx | |xxxxx+--------| |
| | | +-----+ | |
| +-----------------+ +------ | | |
| | | <-+ +--------------+ |
| | | +-- | | |
| | | | +--------------= |
| +-----------------+ | |
| | | <-----+ |
| | +---------| |
| +-------+ |
| ``` |
| |
| A range that spans several decompressed frames will require decompressing |
| multiple compressed frames: |
| |
| ``` |
| Decompressed Space Compressed Space |
| +-----------------+ +--------------+ |
| | | <-------- |xxxxxxxxxxxxxx| |
| | | |xxxxx+--------| |
| | xxx| +-----+xxxxxxxx| |
| +-----------------+ +------ |xxxxxxxxxxxxxx| |
| |xxxx | <-+ +--------------+ |
| | | +-- | | |
| | | | +--------------= |
| +-----------------+ | |
| | | <-----+ |
| | +---------| |
| +-------+ |
| ``` |
| |
| ## Use in BlobFS |
| |
| In BlobFS, files are compressed using the chunked compression library to |
| facilitate random access and demand paging. |
| |
| Currently, random access is only used when demand paging is enabled. With demand |
| paging disabled, BlobFS will load and decompress the entire file on first |
| access, buffering the file in memory while there are handles to the file. |
| |
| With demand paging enabled, BlobFS lazily loads in portions of files as they are |
| accessed. BlobFS registers itself as the |
| [pager](/docs/reference/kernel_objects/pager.md) for the VMO using the |
| [zx_pager_create](/reference/syscalls/pager_create.md) syscall. When a |
| non-present page is accessed in the VMO, a page fault occurs, which BlobFS |
| handles. |
| |
| BlobFS looks up the decompressed frames containing the target page(s), and |
| decompresses each frame. After decompressing each frame, the data is verified, |
| and committed to the pager-backed VMO through the |
| [zx_pager_supply_pages](/reference/syscalls/pager_supply_pages.md) syscall. |