| # Data in Storage |
| |
| Storage implements persistent representation of data held in Ledger. Such data |
| include: |
| |
| - commit, value and tree node objects |
| - journal entries and metadata |
| - information on head commits |
| - information on which objects have been synced to the cloud |
| - other synchronization metadata, such as the time of the last synchronization |
| to the cloud |
| |
| All data and metadata added in storage are persisted using LevelDB. For each |
| [page](data_organization.md#Pages) a separate LevelDB instance is created in a |
| dedicated filesystem path of the form: |
| `{repo_dir}/{serialization_version}/ledgers/{ledger_dir}/{page_id_base64}/leveldb`. |
| |
| Additionally, metadata about all pages of a single user are persisted in a |
| separate LevelDB instance. This includes information such as the last time a |
| page was used. |
| |
| The rest of the document describes the key and value representation for each row |
| created in LevelDB to store each data type. |
| |
| [TOC] |
| |
| # Per Page data storage |
| |
| ## Commit objects |
| |
| - Row key: `commits/{commit_id}` |
| - Row value: Contains the creation `timestamp`, `generation` (i.e. length of the |
| longest path to the first commit created for this page), `root_node_id` and |
| `parent_ids`, as serialized by Flatbuffers (see [commit.fbs]). |
| |
| ## Value and tree node objects |
| |
| In storage there is no difference in the representation of value and node |
| objects. Since these two types of objects can be quite big, they might be split |
| in multiple pieces. Each **piece** is serialized in LevelDB as follows: |
| |
| - Row key: `objects/{object_digest}` |
| - Row value: `{object_content}` |
| |
| For value objects, `{object_content}` is the actual user defined content, while |
| for tree nodes, it is the list of node entries (key, object_id and priority), |
| references to child nodes, and the node level. Tree node content is serialized |
| using Flatbuffers (see [tree_node.fbs]). |
| |
| Note that when a Ledger user inserts a key-value pair, the key is stored in a |
| tree node, while the value is stored separately, as described above. |
| |
| ### Splitting objects in pieces |
| |
| Value or tree node objects might be split into smaller pieces that are stored |
| separately. When processing a large object, its content is fed into a rolling |
| hash which determines how the object should be split into chunks. For each such |
| chunk, whose size is always between 4KiB and 64KiB, an identifier is computed |
| and added in a list. Based on the rolling hash algorithm this list is split into |
| index files, containing references (identifiers) towards either chunks of the |
| original object's content, or other index files. At the end of the algorithm, |
| the data chunks and index files form a tree, where the content of the object is |
| stored on the leaves. |
| |
| See also [split.h] and [split.cc] for more details. |
| |
| ## Journals |
| |
| Changes in Page (Put entry, Delete entry, Clear page) that have not yet been |
| committed are organized in [journals]. A journal can be explicit, when it is |
| part of an explicitly created transaction or part of a merge commit, or |
| implicit, for any other case. On a system crash all explicit journals are |
| considered invalid and once the system restarts they are removed from the |
| storage. Implicit ones on the other hand, are immediately committed on system |
| restart. |
| |
| A common prefix for all explicit journal entries (`journals/E`) helps remove |
| them all together when necessary, and an additional metadata row, for implicit |
| journals only, helps retrieve the ids of the not-yet-committed journals. |
| |
| Journal entry keys (for both implicit and explicit journals) are serialized in |
| LevelDB as: |
| |
| Row key: `journals/{journal_id}/entry/{user_defined_key}` |
| |
| `{journal_id}` has an `E` prefix if the journal is explicit or an `I` prefix if |
| it's implicit. |
| |
| - If the journal entry is about adding a new or updating an existing Ledger |
| key-value pair, then: |
| |
| Row value: `A{priority_byte}{object_identifier}` |
| |
| Where `{priority_byte}` is either `E` if the priority is Eager, or `L` if it's |
| Lazy. |
| |
| - If the journal entry is about removing and existing key-value pair, the value |
| is: |
| |
| Row value: `D` |
| |
| Moreover, if a journal contains a page clear operation, a row with an empty |
| value is added to the journal. If it is present, when the journal is commited, |
| the previous state of the page must be discarded. |
| |
| - Row key: `journals/{journal_id}/C` |
| - Row value: (empty value) |
| |
| ### Metadata row for implicit journals |
| |
| For every implicit journal an additional row is kept in LevelDB: |
| - Row key: `journals/implicit_metadata/{journal-id}` |
| - Row value: `{base_commit_id}` |
| |
| `{base_commit_id}` is the parent commit of this journal. Note that implicit |
| journals always have a single parent (merge commits cannot be implicit |
| journals). |
| |
| ## Head commits |
| |
| The list of head commits is updated and maintained in storage. For each head a |
| separate row is created: |
| - Row key: `heads/{commit_id}` |
| - Row value: `{creation_timestamp}` |
| |
| ## Synchronization status |
| |
| ### Commits |
| A row is added for each commit that has been created locally, but is not yet |
| synced to the cloud: |
| - Row key: `unsynced/commits/{commit_id}` |
| - Row value: `{generation}` |
| |
| ### Value and Tree Node Objects |
| Each piece, i.e. part of a value or tree node object, can be in any of the |
| following states: |
| |
| - transient: the object piece has been used in a journal that is not yet committed |
| - local: the object piece has been used in a commit, but is still not synced |
| - synced: the object piece has been synced to the cloud |
| |
| For each piece, a status row is stored: |
| |
| - Row key: `{status}/object_digests/{object_piece_identifier}` |
| - Row value: (empty value) |
| |
| Where status is one of `transient`, `local`, or `synced`. |
| |
| ## Cloud sync metadata |
| |
| The cloud sync component persists in storage rows with some metadata. |
| |
| - Row key: `sync_metadata/{metadata_type}` |
| - Row value: `{metadata_value}` |
| |
| Currently, cloud sync only stores a single such line, which contains the |
| server-side timestamp of the last commit fetched from the cloud. |
| |
| |
| # Pages metadata storage |
| |
| Additionally to user-created content and metadata on this content, Ledger |
| persists information on Page usage, such as the timestamp of when each page was |
| last used. This information is used for page eviction, i.e. removing local |
| copies of pages, in order to free up device storage when that is necessary. |
| |
| Page usage information is stored in a dedicated path: `{repo_dir}/page_usage_db` |
| using LevelDB. |
| |
| ## Timestamp of last usage |
| For each page that is locally stored on the device a row is created in the |
| underlying database: |
| |
| - Row key: `opened/{ledger_name}{page_id}` |
| - Row value: `{timestamp}` or `{0}` |
| |
| `{timestamp}` is the timestamp from when the given page was last closed. If the |
| page is currently open, the value is a 0 timestamp. |
| |
| # See also |
| |
| For more information see also: |
| - [Life of a Put](life_of_a_put.md) |
| - [Ledger Architecture - Storage](architecture.md#Storage) |
| |
| |
| [commit.fbs]: /bin/ledger/storage/impl/commit.fbs |
| [split.cc]: /bin/ledger/storage/impl/split.cc |
| [split.h]: /bin/ledger/storage/impl/split.h |
| [tree_node.fbs]: /bin/ledger/storage/impl/btree/tree_node.fbs |
| [journals]: life_of_a_put.md#Journals |