{% set rfcid = “RFC-0217” %} {% include “docs/contribute/governance/rfcs/_common/_rfc_header.md” %}
{# Fuchsia RFCs use templates to display various fields from _rfcs.yaml. View the #} {# fully rendered RFCs at https://fuchsia.dev/fuchsia-src/contribute/governance/rfcs #}
Improve the developer experience when running tests or restarting modified ephemeral components by improving package garbage collection (GC).
Common developer workflows, such as:
are frequently interrupted by package resolution errors due to insufficient storage space. This breaks developer concentration, lowers confidence in the platform, and requires developers to manually trigger GC, possibly after rebooting the device.
These interruptions and workarounds are necessary because the current GC implementation:
The goal is to improve GC so that these workflows can be made to just work without developers needing to think about GC or storage space at all.
Facilitator:
Reviewers:
Consulted:
Socialization:
An early draft of this RFC was shared with members of the Software Development and Component Framework teams.
GC as a whole must:
There is a strong preference for approaches that are:
Out of scope:
The Software Delivery stack has not yet been upgraded to conform to the definitions and behaviors detailed in the Package Sets RFC, so the description below uses the deprecated terms to more accurately describe the current and proposed behavior of the system.
data/static_packages
file.data/cache_packages.json
file.meta/package
file, usually the same as the path of the URL used to resolve the package) to the hash of the package that was most recently resolved for said path.Br
Bp
, which are the package blobs of the:Br - Bp
Create an “Open Package Index” that tracks which packages have [sub]directories with open fuchsia.io/[Node|Directory]
connections. See https://fxrev.dev/817432 for a possible implementation (incidentally, this implementation deduplicates the data structures used to serve package directories, which should save at least one MB of memory). Use the open package index in pkg-cache (the component that serves the package directories of all ephemeral packages).
Have pkg-resolver expose an additional fuchsia.pkg/PackageResolver
capability called fuchsia.pkg.PackageResolver-ota
. Route this capability to system-updater (and only to system-updater) instead of the current fuchsia.pkg.PackageResolver
capability. Packages resolved by this capability must be in the retained index prior to resolution and will be excluded from the open package index (by adding a flag to fuchsia.pkg/PackageCache.[Open|Get]
).
Create a “Writing Package Index” that tracks which packages are currently being written to storage. This is effectively the dynamic index except that it stops tracking packages once they are resolved (at which point they would be covered by the open package index or the retained index).
Use the same GC algorithm, but replace the dynamic index with the writing and open package indices, so the protected blobs are now the blobs of the:
This satisfies the requirements:
Maintain the storage usage and forward progress guarantees of the OTA process with the modifications from RFC 170.
All packages resolved during the OTA process are excluded from the open package index (similar to how OTA resolves are currently excluded from the dynamic index by first being added to the retained index), so the storage usage and forward progress requirements will still be met.
Allow running multiple tests (each of which individually fits on the device and that all come from different packages) consecutively without running out of space.
When running tests from multiple different packages consecutively, GC can now be triggered to prevent out-of-space errors. The dynamic index used to protect the most recently resolved version of each test package (packages are added by path when resolved and only removed on reboot) and so previously run tests would never get GC'd, but now the open package index will stop protecting test packages once their last connection is closed.
Not remove packages that are being consumed by components, which includes packages that contain running components.
Previously, if a component was launched and then a different version of the backing package was resolved, the component‘s package would be evicted from the dynamic index regardless of whether the component was still running. Now, since running components hold a connection to their package directory, the component’s package will be protected from GC by the open package index.
Index | Package Addition Action | Package Removal Action |
---|---|---|
Base | Product Assembly | never |
Cache | Product Assembly | never |
Retained | system-updater sets during OTA | system-updater updates/clears during OTA |
Writing | package resolution begins | package resolution ends |
Open | non-OTA package resolution ends | last connection to package directory closes |
The dynamic index uses memory proportional to the number of different ephemeral packages resolved since boot (grouped by package path). The open package index uses memory proportional to the number of ephemeral packages with open connections (grouped by package hash). These memory footprints are both small and similar in size due to how ephemeral packages are currently used (the open package index will be smaller in the case where many different test packages are run, since the dynamic index effectively leaks these entries). Any difference in memory footprint should be smaller than the memory savings unlocked by deduplicating the data structures used to serve package directories.
Replacing the dynamic index with the open package index makes the system easier to understand and operate:
meta/package
file). Users are generally not aware of package path as a concept (they are frequently aware of the path component of the package URL, but the meta/package
path can be different), and now GC behavior will no longer depend on it. This fixes the issue where unrelated packages from separate repositories but with the same package path would compete for GC protection (by evicting each other from the dynamic index). This also removes one of the last remaining dependencies on package path.No impact.
No impact.
There is extensive testing of the interplay between package resolution and GC in general and OTA and GC specifically. These tests will be checked to make sure they are still meaningful and complete.
The existing GC documentation will be updated.
The writing and open package indices will be exposed with Inspect. The base and cache packages and the retained index are already included in pkg-cache's Inspect data.
We believe this solution makes GC strictly more correct than it currently is, based on the requirements. However, there are some unknowns and drawbacks remaining.
The current implementation attempts to protect ephemeral packages that are not in use currently but are expected to be resolved again, to avoid redownloading the blobs later. The proposed implementation does not have any such protection. This should not break any workflows because even in the current implementation ephemeral resolution still requires network access to check the repository metadata (meaning that the device should still be able to redownload the blobs) and GC is triggered rarely so needing to redownload blobs should also be rare. Additionally, the proposed approach does not prevent re-adding predictive protection in the future.
A consequence of the previous drawback is that care must now be taken when triggering GC in the middle of a workflow that depends on multiple packages but that is not holding open connections to those package directories. In theory on-target workflows should be able to hold open all required packages, but workflows orchestrated on-host may find this more difficult.
As opposed to the current implementation, cache packages will still be protected when a different version of the package (as identified by the path in meta/package
) is resolved. This means that, after resolving a different version and triggering GC, the cache fallback (used if e.g. the network is no longer available) will still succeed. This is bad if e.g. the non-cache version edited config files in an unexpected way. This is acceptable because this problem can already occur today if GC is not triggered and GC is rarely triggered.
Instead of providing system-updater with a special fuchsia.pkg/PackageResolver
capability that is excluded from open package tracking, have the system-updater close the connection to the intermediate packages before it triggers GC.
The open package index is updated asynchronously (whenever it notices a connection was closed), and there is no way for system-updater to know when this has occurred. We could create an API to query the open package index, but the goal isn't for system-updater to unconditionally GC the intermediate packages, the goal is for system-updater to GC the intermediate packages if the only open connection was to the system-updater (consider the case where an intermediate package is also a base package of the current system) and the package serving machinery does not know who holds the client ends of the connections. Additionally, system-updater is already manually tracking its resolved packages via the retained index, so it is reasonable for its resolves to be excluded from automatic tracking.
Instead of providing system-updater with a special fuchsia.pkg/PackageResolver
capability that is excluded from open package tracking, continue to provide system-updater with the standard capability and exclude retained packages from open package tracking.
This approach can result in packages for running components getting GC'd. Consider the following:
Open package tracking protects any package with an open connection. There may be components holding on to package directory handles longer than we expect. This would cause issues similar to the ones seen by developers now when package resolution fails due to out-of-space errors. Any such instances will need to be found (which is generally straightforward using console commands like k zx ch
) and fixed. This will not be a problem for user devices because on user devices only the system-updater uses ephemeral resolution.