memory_sampler
is a continuous fleetwide sampling heap profiler:
pprof
compatible) profiles of the heap allocations and deallocations of the instrumented process.memory_sampler
comes with three parts:
fuchsia.memory.sampler.Sampler
: a FIDL API which describes an interface to communicate allocation metadata between a process and a profiler.memory_sampler
: a Rust component which provides an implementation for this API.libsampler_instrumentation
: a shared library that one can link against to automatically instrument the allocator and communicate allocation metadata through the FIDL API.memory_sampler
supports profiling arbitrary processes that use the platform default allocator, Scudo. This includes any C, C++ and Rust program built via the SDK that don't configure an alternative allocator.
memory_sampler
can be used to continuously profile in-tree components. Out-of-tree components are not supported yet, but will once the relevant parts have been added to the Fuchsia SDK.
To profile a component with memory_profiler
:
Add the memory_sampler
Fuchsia package target to your build (e.g. fx set ... --with //src/performance/memory/sampler:memory_sampler
).
Add this package in an appropriate realm. See //src/performance/memory/sampler/meta/memory_sampler.core_shard.cml
for an example shard to include in the core
realm manifest, to include memory_sampler
as a core
component. memory_sampler
's URL is fuchsia-pkg://fuchsia.com/memory_sampler#meta/memory_sampler.cm
, and it depends on the fuchsia.feedback.CrashReporter
and fuchsia.feedback.CrashReportingProductRegister
capabilities, both offered by the feedback
core
component.
Route the fuchsia.memory.sampler.Sampler
capability from memory_sampler
to the instrumented process.
Add the //src/performance/memory/sampler/instrumentation:lib
shared library as a dependency of your binary.
When your component starts, component_manager
will ensure the configured instance of memory_sampler
is running, and your process will regularly communicate allocation information to memory_sampler
.
At most once an hour, memory_sampler
will file a crash report with the feedback
service that contains a pprof
-compatible profile. This profile can be symbolized and visualized by running the fx pprof -flame <path_to_the_profile>
command.
Note: on core
, feedback
will write the filed crash reports in a temporary storage on the device. They can be found via the fx shell "find /tmp /data -name <your_program_name>*"
command.
Currently, out-of-tree components won't have access to memory_sampler
because it is entirely internal. However, we do plan to make it generally available via the SDK; stay tuned.
The profiles produced by memory_sampler
contain 6 types of samples:
residual_allocated_objects
: a count of (sampled) allocations that are still alive at the time of the production of the profile.residual_allocated_space
: the size of (sampled) allocations that are still alive at the time of the production of the profile.allocated_objects
: a count of (sampled) allocations observed over the duration of the profile (both alive and dead).allocated_space
: the size of (sampled) allocations observed over the duration of the profile (both alive and dead).deallocated_objects
: a count of (sampled) deallocations observed over the duration of the profile.deallocated_space
: the size of (sampled) deallocations observed over the duration of the profile.Note: because memory_sampler
is a sampling profiler, it only observes a (randomly selected) subset of allocations. Moreover, the selection is based on average memory allocated between samples: it skews the distribution towards larger allocations. For this reason, counts and space are always underestimated, but how much depends on the allocation profile of the instrumented process (e.g. a process that only does large allocations is more likely to produce an accurate profile than a process that performs a lot of very small allocations). Nevertheless, outliers are still likelier to get sampled; chances are that if your process suffers from an unforeseen pathological allocation pattern, they will tend to show up on profiles.
memory_sampler
produces profiles under roughly three conditions:
A partial profile once the recorded allocation data reaches a certain size, to reduce memory use. This can be somewhat unpredictable, because it depends on the (de)allocation patterns and call sites.
A partial profile at least once every 12 hours.
A final profile when the instrumented process exits.
memory_sampler
regularly files partial profiles during the lifetime of the instrumented process (depending on observed allocations), as well as a single final profile at the end of the process. Both kind of profiles have the same shape, but different semantics:
Note: If one needs a picture of the overall allocations of a process over its lifetime, one should look into every single profile captured during the lifetime of the process; it‘s possible to get an accurate summary by merging every single partial profile (discarding live allocations) with a final profile, but we don’t provide (yet) a script to perform this task.
Currently, memory_sampler
hard-codes all its performance parameters; on a private build, we encourage you to tune them to your liking. Some available performance knobs follow:
memory_sampler::Recorder::kSamplingIntervalBytes
: the average count of bytes allocated between two samples. Reducing this value increases the accuracy of the profiler, at the expense of the performance of the instrumented process. Note that if the value becomes so small that memory_sampler
is unable to handle the amount of messages it receives, this would cause the kernel to kill the instrumented process (because of a buffer exhaustion in the FIDL channel).memory_sampler::sampler_service::DEAD_ALLOCATIONS_PROFILE_THRESHOLD
: the amount of observed allocations before filing a partial report. Reducing this number decreases the memory footprint memory_sampler
, at the expense of storage space and bandwidth (because more, smaller profiles get filed as a result).memory_sampler::sampler_service::MAX_DURATION_BETWEEN_PROFILES
: the maximum elpased time between two partial profiles. Reducing this duration will increase the frequency of partial profiles, which should in turn reduce the memory consumption of the profiler.Note also that memory_sampler
comes with built-in throttling of filed profiles, to limit the rate of filing crash reports; in an eng
build, feedback
does not upload any profile, so it is safe to modify memory_sampler::crash_reporter::setup_crash_reporter
to file profiles more often (both to reduce latency between capture and consumption of profiles, as well as to increase the sampling rate without significantly increasing memory_sampler
's memory footprint).