Storage implements persistent representation of data held in Ledger. Such data include:
All data and metadata added in storage are persisted using LevelDB. For each page a separate LevelDB instance is created in a dedicated filesystem path of the form: {repo_dir}/{serialization_version}/ledgers/{ledger_dir}/{page_id_base64}/leveldb
.
Additionally, metadata about all pages of a single user are persisted in a separate LevelDB instance. This includes information such as the last time a page was used.
The rest of the document describes the key and value representation for each row created in LevelDB to store each data type.
commits/{commit_id}
timestamp
, generation
(i.e. length of the longest path to the first commit created for this page), root_node_id
and parent_ids
, as serialized by Flatbuffers (see commit.fbs).In storage there is no difference in the representation of value and node objects. Since these two types of objects can be quite big, they might be split in multiple pieces. Each piece is serialized in LevelDB as follows:
objects/{object_digest}
{object_content}
For value objects, {object_content}
is the actual user defined content, while for tree nodes, it is the list of node entries (key, object_id and priority), references to child nodes, and the node level. Tree node content is serialized using Flatbuffers (see tree_node.fbs).
Note that when a Ledger user inserts a key-value pair, the key is stored in a tree node, while the value is stored separately, as described above.
Value or tree node objects might be split into smaller pieces that are stored separately. When processing a large object, its content is fed into a rolling hash which determines how the object should be split into chunks. For each such chunk, whose size is always between 4KiB and 64KiB, an identifier is computed and added in a list. Based on the rolling hash algorithm this list is split into index files, containing references (identifiers) towards either chunks of the original object's content, or other index files. At the end of the algorithm, the data chunks and index files form a tree, where the content of the object is stored on the leaves.
See also split.h and split.cc for more details.
The list of head commits is updated and maintained in storage. For each head a separate row is created:
heads/{commit_id}
{creation_timestamp}
For each pair of parent commits, the list of their merge commits is updated and maintained in storage. For each merge with id {merge_commit_id}
and parents {parent1_id}
and {parent2_id}
, a separate row is created. The ids of the parents are sorted so that {parent1_id}
is less than {parent2_id}
(this may be a different order than given the commit object). Then the row is:
merges/{parent1_id}/{parent2_id}/{merge_commit_id}
For the purpose of garbage collecting stale local objects, Ledger keeps a list of references between objects, as well as from commits to objects.
For each reference from a piece or object with digest source_id
to a piece or object with digest destination_id
, a separate row is created. We define type
as either lazy
for references from a BTree node to a lazy value and eager
otherwise. Then the row is:
refcounts/{destination_id}/object/{type}/{source_id}
For each reference from a commit with commit id source_id
to a BTree node with digest destination_id
, a separate row is created:
refcounts/{destination_id}/commit/{source_id}
A row is added for each commit that has been created locally, but is not yet synced to the cloud:
unsynced/commits/{commit_id}
{generation}
Each piece, i.e. part of a value or tree node object, can be in any of the following states:
For each piece, a status row is stored:
{status}/objects/{object_identifier}
Where status is one of transient
, local
, or synced
.
{object_identifier}
is serialized such that it has {object_digest}
as a prefix.
The cloud sync component persists in storage rows with some metadata.
sync_metadata/{metadata_type}
{metadata_value}
Currently, cloud sync only stores a single such line, which contains the server-side timestamp of the last commit fetched from the cloud.
Additionally to user-created content and metadata on this content, Ledger persists information on Page usage, such as the timestamp of when each page was last used. This information is used for page eviction, i.e. removing local copies of pages, in order to free up device storage when that is necessary.
Page usage information is stored in a dedicated path: {repo_dir}/page_usage_db
using LevelDB.
For each page that is locally stored on the device a row is created in the underlying database:
opened/{ledger_name}{page_id}
{timestamp}
or {0}
{timestamp}
is the timestamp from when the given page was last closed. If the page is currently open, the value is a 0 timestamp.
For more information see also: