blob: 2a880ac4d785b30a98af0b695afc88d9f7a82808 [file] [log] [blame] [view]
# Random access compression in BlobFS
The BlobFS filesystem in Fuchsia transparently compresses files in order to save
disk space. BlobFS supports a number of different compression strategies, with
[zstd](https://facebook.github.io/zstd/) being the default.
One downside of file compression is that it can prevent *random access* into
files. For most compression algorithms, the entire contents must be read from
disk and decompressed to access a single byte.
For BlobFS, this is a particular challenge when files are demand paged. Demand
paging allows a file to be partially loaded into memory, which saves system
memory. Full-file compression prevents a file from being partially loaded.
The [chunked-compression](/src/lib/chunked-compression/) format and library in
Fuchsia breaks compressed files up into frames that can be independently
decompressed. This allows BlobFS to implement demand paging of compressed files,
since the file can be partially loaded and decompressed.
This document describes the design of the chunked-compression format and
explains its use in Fuchsia.
## Design goals and non-goals
The following are the goals that motivate the design of the chunked-compression
format and library:
* **Random access decompression**. It must be possible to independently
decompress frames of data without needing to decompress the entire file.
* **Flexible decompression API**. The library was designed to give clients
control over the seek table, so clients have fine grained control over which
frames they decompress. This supports use cases like demand paging, where
the client (BlobFS) has more information about access patterns and can
control read-ahead and prefetch precisely by decompressing specific frames.
This is in contrast to a more managed API that hides the details of which
frames contain which decompressed bytes.
* **Configurable frame sizes**. It must be possible to adjust the sizes of
decompressed frames of data, to suit different use cases and different
requirements.
* **Flexible compression layout**. The format supports more exotic frame
layouts, such as having non-uniform frame sizes or having aligned frame
starts, in order to accommodate future use cases that require more
flexibility. For example, non-uniform frame sizes could be used to improve
data locality by grouping together data that is read together into smaller
or larger frames.
* **Comparable compression ratio to [zstd](https://facebook.github.io/zstd)**.
Since zstd was the current default compression algorithm for BlobFS at the
time of this document's writing, zstd's compression ratio is the baseline
which the chunked-compression library is benchmarked against. The overhead
due to framing and additional metadata must be minimal.
* **Configurable compression aggressiveness**. It must be possible to trade
off slower compression speed in favor of better compression ratios.
* **Cross-platform library**. The library used to compress and decompress
chunked compression archives must be usable both on Fuchsia and on the
compilation host (e.g. Linux), to enable the use of the library in the build
toolchain. This is necessary to compress files at build-time, for example
when generating a base BlobFS image.
The following are non-goals:
* **Format-level compatibility with zstd**. The chunked compression archive is
not intended to be format-compatible with zstd, and regular zstd tooling
will not work with chunked compression archives.
## Chunked compression
### Archive format
A chunked archive consists of a header followed by zero or more
[zstd](https://facebook.github.io/zstd/) frames.
#### Header
The header describes the format of the archive and contains a seek table that
maps the compressed frames to decompressed space.
```
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
0 | Magic Number |
+-----+-----+-----+-----+-----+-----+-----+-----+
8 | Version | Reserved | Num Frames | // Reserved must be zero.
+-----+-----+-----+-----+-----+-----+-----+-----+
16 | Header CRC32 | Reserved | // Reserved must be zero.
+-----+-----+-----+-----+-----+-----+-----+-----+
24 | Reserved | // Reserved must be zero.
+-----+-----+-----+-----+-----+-----+-----+-----+
32 | |
40 | Seek Table |
48 | Entry |
56 | |
+-----+-----+-----+-----+-----+-----+-----+-----+
.. | |
.. | Seek Table |
.. | Entry |
.. | |
+-----+-----+-----+-----+-----+-----+-----+-----+
```
The "Header CRC32" is computed based on the entire header, including the seek
table.
##### Seek table
Each seek table entry describes a contiguous range of data in the compressed
space, and where in the decompressed data it expands to.
```
+-----+-----+-----+-----+-----+-----+-----+-----+
0 | Decompressed Offset |
+-----+-----+-----+-----+-----+-----+-----+-----+
8 | Decompressed Size |
+-----+-----+-----+-----+-----+-----+-----+-----+
16 | Compressed Offset |
+-----+-----+-----+-----+-----+-----+-----+-----+
24 | Compressed Size |
+-----+-----+-----+-----+-----+-----+-----+-----+
```
Seek table entries are *contiguous* in decompressed space, but may be
*discontiguous* in compressed space. This is to support adding alignment and
padding to output files to improve storage access efficiency.
A seek table can hold up to 1023 entries (which results in a 32KiB header) and
must contain at least 1 entry (which results in a 64 byte header). Typically,
compressed frames immediately follow the seek table (but the format supports the
compressed frames starting at any offset past the end of the seek table).
##### Seek table invariants
* I0: The first seek table entry must have decompressed offset 0.
* I1: The first seek table entry must have compressed offset greater than or
equal to the size of the header.
* I2: Each entry's decompressed offset must be equal to the end of the
previous frame (i.e. to the previous frame's decompressed offset+length).
* I3: Each entry's compressed offset must be greater than or equal to the end
of the previous frame (i.e. to the previous frame's compressed
offset+length).
* I4: Each entry must have a non-zero decompressed and compressed length.
* I5: No compressed frame may exceed the end of the file.
#### Compressed frames
Each compressed frame in the file is a regular zstd compressed frame. A given
frame will map to some contiguous chunk of bytes in the decompressed file.
Any ranges of bytes in the file not covered by the seek table are ignored.
It is not a requirement that each frame of data is the same decompressed size,
but the current implementation of compression splits the input data into
equal-sized frames. The size of the frames is configurable during compression.
### Random-access decompression
The seek table is used to look up the frames that contain a given decompressed
range. When a given range in the decompressed file is requested, one or more
compressed frames must be loaded and decompressed.
For example, consider the following file with three frames:
```
Decompressed Space Compressed Space
+-----------------+ +--------------+
| | <-------- | |
| | | +--------|
| | +-----+ |
+-----------------+ +------ | |
| | <-+ +--------------+
| | +-- | |
| | | +--------------=
+-----------------+ |
| | <-----+
| +---------|
+-------+
```
Accessing a byte range within a single frame only requires decompressing its
corresponding compressed frame:
```
Decompressed Space Compressed Space
+-----------------+ +--------------+
| | <-------- |xxxxxxxxxxxxxx|
| xxxxxxxx | |xxxxx+--------|
| | +-----+ |
+-----------------+ +------ | |
| | <-+ +--------------+
| | +-- | |
| | | +--------------=
+-----------------+ |
| | <-----+
| +---------|
+-------+
```
A range that spans several decompressed frames will require decompressing
multiple compressed frames:
```
Decompressed Space Compressed Space
+-----------------+ +--------------+
| | <-------- |xxxxxxxxxxxxxx|
| | |xxxxx+--------|
| xxx| +-----+xxxxxxxx|
+-----------------+ +------ |xxxxxxxxxxxxxx|
|xxxx | <-+ +--------------+
| | +-- | |
| | | +--------------=
+-----------------+ |
| | <-----+
| +---------|
+-------+
```
## Use in BlobFS
In BlobFS, files are compressed using the chunked compression library to
facilitate random access and demand paging.
Currently, random access is only used when demand paging is enabled. With demand
paging disabled, BlobFS will load and decompress the entire file on first
access, buffering the file in memory while there are handles to the file.
With demand paging enabled, BlobFS lazily loads in portions of files as they are
accessed. BlobFS registers itself as the
[pager](/docs/reference/kernel_objects/pager.md) for the VMO using the
[zx_pager_create](/reference/syscalls/pager_create.md) syscall. When a
non-present page is accessed in the VMO, a page fault occurs, which BlobFS
handles.
BlobFS looks up the decompressed frames containing the target page(s), and
decompresses each frame. After decompressing each frame, the data is verified,
and committed to the pager-backed VMO through the
[zx_pager_supply_pages](/reference/syscalls/pager_supply_pages.md) syscall.