Memory Sampler

memory_sampler is a continuous fleetwide sampling heap profiler:

Continuous: continuously capture profiles from a running process.
Fleetwide: designed to run on devices in the field; profiles are attached to crash reports, and can be similarly accessed.
Sampling: uses a poisson process to randomly sample allocations based on size; this reduces the overhead of the profiler, and makes it suitable to get a view of the more interesting allocations without capturing every single allocation of a process.
Heap profiler: produces (pprof compatible) profiles of the heap allocations and deallocations of the instrumented process.

memory_sampler comes with three parts:

fuchsia.memory.sampler.Sampler: a FIDL API which describes an interface to communicate allocation metadata between a process and a profiler.
memory_sampler: a Rust component which provides an implementation for this API.
libsampler_instrumentation: a shared library that one can link against to automatically instrument the allocator and communicate allocation metadata through the FIDL API.

memory_sampler supports profiling arbitrary processes that use the platform default allocator, Scudo. This includes any C, C++ and Rust program built via the SDK that don't configure an alternative allocator.

Getting started

memory_sampler can be used to continuously profile in-tree components. Out-of-tree components are not supported yet, but will once the relevant parts have been added to the Fuchsia SDK.

In-tree components

To profile a component with memory_profiler:

Add the memory_sampler Fuchsia package target to your build (e.g. fx set ... --with //src/performance/memory/sampler:memory_sampler).
Add this package in an appropriate realm. See //src/performance/memory/sampler/meta/memory_sampler.core_shard.cml for an example shard to include in the core realm manifest, to include memory_sampler as a core component. memory_sampler's URL is fuchsia-pkg://fuchsia.com/memory_sampler#meta/memory_sampler.cm, and it depends on the fuchsia.feedback.CrashReporter and fuchsia.feedback.CrashReportingProductRegister capabilities, both offered by the feedback core component.
Route the fuchsia.memory.sampler.Sampler capability from memory_sampler to the instrumented process.
Add the //src/performance/memory/sampler/instrumentation:lib shared library as a dependency of your binary.

When your component starts, component_manager will ensure the configured instance of memory_sampler is running, and your process will regularly communicate allocation information to memory_sampler.

At most once an hour, memory_sampler will file a crash report with the feedback service that contains a pprof-compatible profile. This profile can be symbolized and visualized by running the fx pprof -flame <path_to_the_profile> command.

Note: on core, feedback will write the filed crash reports in a temporary storage on the device. They can be found via the fx shell "find /tmp /data -name <your_program_name>*" command.

Out-of-tree components

Currently, out-of-tree components won't have access to memory_sampler because it is entirely internal. However, we do plan to make it generally available via the SDK; stay tuned.

Memory Profiles

Content

The profiles produced by memory_sampler contain 6 types of samples:

residual_allocated_objects: a count of (sampled) allocations that are still alive at the time of the production of the profile.
residual_allocated_space: the size of (sampled) allocations that are still alive at the time of the production of the profile.
allocated_objects: a count of (sampled) allocations observed over the duration of the profile (both alive and dead).
allocated_space: the size of (sampled) allocations observed over the duration of the profile (both alive and dead).
deallocated_objects: a count of (sampled) deallocations observed over the duration of the profile.
deallocated_space: the size of (sampled) deallocations observed over the duration of the profile.

Note: because memory_sampler is a sampling profiler, it only observes a (randomly selected) subset of allocations. Moreover, the selection is based on average memory allocated between samples: it skews the distribution towards larger allocations. For this reason, counts and space are always underestimated, but how much depends on the allocation profile of the instrumented process (e.g. a process that only does large allocations is more likely to produce an accurate profile than a process that performs a lot of very small allocations). Nevertheless, outliers are still likelier to get sampled; chances are that if your process suffers from an unforeseen pathological allocation pattern, they will tend to show up on profiles.

Profile filing frequency

memory_sampler produces profiles under roughly three conditions:

A partial profile once the recorded allocation data reaches a certain size, to reduce memory use. This can be somewhat unpredictable, because it depends on the (de)allocation patterns and call sites.
A partial profile at least once every 12 hours.
A final profile when the instrumented process exits.

Partial vs final profiles

memory_sampler regularly files partial profiles during the lifetime of the instrumented process (depending on observed allocations), as well as a single final profile at the end of the process. Both kind of profiles have the same shape, but different semantics:

Partial profiles: residual allocations are either leaks or allocations that have simply not been deallocated yet. Dead allocations correspond to allocations that have been deallocated within the period covered by this profile. In particular, if an allocation was deallocated in profile X, it won't appear in the list of dead allocations of profile X+N. This is done to restrict the growth of partial profiles. Partial profiles also come with an iteration number; this number is meaningless, except that profiles come in numerical order (i.e. profile 1000 was captured earlier than profile 9999 regardless of when they were captured).
Final profiles: residual allocations are allocations that were never deallocated within the lifetime of the process. If an instrumented process has memory leaks, they are very likely to appear here (if they were sampled). Dead allocations correspond to allocations that have been deallocated between the last partial profile and the end of the process, not over the entire lifetime of the process. This limitation is the result of a memory optimization.

Note: If one needs a picture of the overall allocations of a process over its lifetime, one should look into every single profile captured during the lifetime of the process; it‘s possible to get an accurate summary by merging every single partial profile (discarding live allocations) with a final profile, but we don’t provide (yet) a script to perform this task.

Performance tuning

Currently, memory_sampler hard-codes all its performance parameters; on a private build, we encourage you to tune them to your liking. Some available performance knobs follow:

memory_sampler::Recorder::kSamplingIntervalBytes: the average count of bytes allocated between two samples. Reducing this value increases the accuracy of the profiler, at the expense of the performance of the instrumented process. Note that if the value becomes so small that memory_sampler is unable to handle the amount of messages it receives, this would cause the kernel to kill the instrumented process (because of a buffer exhaustion in the FIDL channel).
memory_sampler::sampler_service::DEAD_ALLOCATIONS_PROFILE_THRESHOLD: the amount of observed allocations before filing a partial report. Reducing this number decreases the memory footprint memory_sampler, at the expense of storage space and bandwidth (because more, smaller profiles get filed as a result).
memory_sampler::sampler_service::MAX_DURATION_BETWEEN_PROFILES: the maximum elpased time between two partial profiles. Reducing this duration will increase the frequency of partial profiles, which should in turn reduce the memory consumption of the profiler.

Note also that memory_sampler comes with built-in throttling of filed profiles, to limit the rate of filing crash reports; in an eng build, feedback does not upload any profile, so it is safe to modify memory_sampler::crash_reporter::setup_crash_reporter to file profiles more often (both to reduce latency between capture and consumption of profiles, as well as to increase the sampling rate without significantly increasing memory_sampler's memory footprint).