blob: c661a2d24518b2718a04c6e14366726cd384c69e [file] [log] [blame] [view] [edit]
# `chunked-compression`
`chunked-compression` is a C++ library for compressing and decompressing data in
chunks, which enables random-access decompression of subsets of the file.
## Building
This library should be automatically included in builds.
## Using
Include the `//src/lib/chunked-compression` build target to make the library
available in your C++ application.
### Compression
Either streaming or one-shot compression is supported by the library, using the
`StreamingChunkedCompressor` and `ChunkedCompressor` objects respectively.
#### Streaming
Streaming compression requires the size of the input stream to be known ahead of
time but allows data to be compressed in chunks (which allows clients to avoid
buffering the entire input data before compressing).
```c++
// Assuming these methods are provided:
// Returns the size of the input stream
size_t InputDataSize();
// Reads up to |buf_len| bytes of the stream into |buf|, returning bytes read
size_t ReadInput(uint8_t* buf, size_t buf_len);
// Returns the max size of bytes to read at a time from the stream
size_t ReadBufferSize();
// The following code implements streaming compression:
size_t input_data_sz = InputDataSize();
StreamingChunkedCompressor compressor;
size_t output_limit = compressor.ComputeOutputSizeLimit(input_data_sz);
fbl::Array<uint8_t> output(new uint8_t[output_limit], output_limit);
compressor.Init(input_data_sz, output.get(), output.size());
uint8_t input[ReadBufferSize()];
size_t bytes_in;
while ((bytes_in = ReadInput(input, sizeof(input)))) {
compressor.Update(input, bytes_in);
}
size_t compressed_size;
compressor.Final(&compressed_size);
```
#### One-Shot
For simplicity, a one-shot compression API also exists if the entire input can
be buffered.
```c++
const void* input = Input();
size_t input_len = InputLength();
ChunkedCompressor compressor;
size_t output_limit = compressor.ComputeOutputSizeLimit(input_len);
fbl::Array<uint8_t> output(new uint8_t[output_limit], output_limit);
size_t compressed_size;
compressor.Compress(input, input_len, output.get(), output.size(),
&compressed_size);
```
#### Tuning Compression
There are a number of tunable parameters available to control compression
behavior.
| Parameter | Explanation | Range |
| --------------- | ----------------------------- | -------------------------- |
| Level | Controls how aggressive | 1-21 (Default = 3) |
: : compression will : :
: : be.<br><br>Higher values : :
: : result in better compression : :
: : ratios, but result in an : :
: : exponential increase in : :
: : compression time. : :
| Chunk Size | Controls the size of input | 128k - 1M (Default = 128K) |
: : data frames.<br><br>Larger : :
: : frames result in better : :
: : compression ratios, but allow : :
: : for less granular access : :
: : during decompression. : :
| Header Checksum | Enables a checksum for the | 0/1 |
: : file header and seek table. : :
| Frame Checksum | Enables a per-frame checksum | 0/1 |
: : for each frame of data. : :
### Decompression
Specific chunks of data can be decompressed independently of other chunks. A
seek table is stored in the head of the file which allows either decompressed or
compressed offsets to be looked up.
#### Single-Frame Decompression
```c++
// Assumes the following methods are implemented:
constexpr size_t kHeaderLength = 8192;
// Load the first kHeaderLength bytes into memory. (Header may be <8192 bytes,
// needs to be parsed to get size.)
const void* InputDataHeader();
// Returns the length of the compressed file.
size_t InputLength();
// Returns the byte offset of the decompressed data we wish to read.
size_t TargetOffset();
// Loads |size| bytes from the compressed file at |offset|.
const uint8_t* LoadCompressedData(size_t offset, size_t size);
// The following implements random-access decompression:
const void* header = InputDataHeader();
size_t compressed_length = InputLength(); // Assume >kHeaderLength
HeaderReader reader;
SeekTable table = reader.Parse(header, kHeaderLength, compressed_length);
ChunkedDecompressor decompressor;
size_t target_offset = TargetOffset();
unsigned table_index = table.EntryForDecompressedOffset(TargetOffset()).get();
const SeekTableEntry& entry = table.Entries()[table_index];
fbl::Array<uint8_t> output(new uint8_t[entry.decompressed_size],
entry.decompressed_size);
size_t input_frame_size = entry.compressed_size;
const uint8_t *input_frame = LoadCompressedData(entry.compressed_offset, input_frame_size);
size_t bytes_written;
decompressor.DecompressFrame(table, table_index, input_frame, input_frame_size, output.get(),
output.size(), &bytes_written);
```
#### One-Shot Decompression
For simplicity, a one-shot decompression API also exists if the entire input can
be buffered.
```c++
const void* input = InputData();
size_t input_len = InputLength();
HeaderReader reader;
SeekTable table = reader.Parse(input, input_len, input_len);
size_t output_size = table.DecompressedSize();
fbl::Array<uint8_t> output(new uint8_t[output_size], output_size);
size_t bytes_witten;
ChunkedDecompresor decompressor;
decompressor.Decompress(table, input, input_len, output.get(), output.size(), &bytes_written);
```
## Testing
Include the `//src/lib/chunked-compression:tests` target in your build. (This
will automatically be included as part of `//src/lib:tests`).
To run the tests:
```
$ fx run-test chunked-compression-unittests
```