src/performance/memory/sampler/README.md - fuchsia - Git at Google

 # Memory Sampler

 `memory_sampler` is a continuous fleetwide sampling heap profiler:

   * Continuous: continuously capture profiles from a running process.
   * Fleetwide: designed to run on devices in the field; profiles are attached to
     crash reports, and can be similarly accessed.
   * Sampling: uses a poisson process to randomly sample allocations based on
     size; this reduces the overhead of the profiler, and makes it suitable to
     get a view of the more interesting allocations without capturing every
     single allocation of a process.
   * Heap profiler: produces (`pprof` compatible) profiles of the heap
     allocations and deallocations of the instrumented process.

 `memory_sampler` comes with three parts:

   * `fuchsia.memory.sampler.Sampler`: a FIDL API which describes an interface to
     communicate allocation metadata between a process and a profiler.
   * `memory_sampler`: a Rust component which provides an implementation for this
     API.
   * `libsampler_instrumentation`: a shared library that one can link against to
     automatically instrument the allocator and communicate allocation metadata
     through the FIDL API.

 `memory_sampler` supports profiling arbitrary processes that use the platform
 default allocator, Scudo. This includes any C, C++ and Rust program built via
 the SDK that don't configure an alternative allocator.

 ## Getting started

 `memory_sampler` can be used to continuously profile in-tree
 components. Out-of-tree components are not supported yet, but will once the
 relevant parts have been added to the Fuchsia SDK.

 ### In-tree components

 To profile a component with `memory_profiler`:

   1. Add the `memory_sampler` Fuchsia package target to your build (e.g. `fx set
      ... --with //src/performance/memory/sampler:memory_sampler`).
   2. Add this package in an appropriate realm.  See
      `//src/performance/memory/sampler/meta/memory_sampler.core_shard.cml` for
      an example shard to include in the `core` realm manifest, to include
      `memory_sampler` as a `core` component. `memory_sampler`'s URL is
      `fuchsia-pkg://fuchsia.com/memory_sampler#meta/memory_sampler.cm`, and it
      depends on the `fuchsia.feedback.CrashReporter` and
      `fuchsia.feedback.CrashReportingProductRegister` capabilities, both offered
      by the `feedback` `core` component.

   3. Route the `fuchsia.memory.sampler.Sampler` capability from `memory_sampler`
      to the instrumented process.

   4. Add the `//src/performance/memory/sampler/instrumentation:lib` shared
      library as a dependency of your binary.

 When your component starts, `component_manager` will ensure the configured
 instance of `memory_sampler` is running, and your process will regularly
 communicate allocation information to `memory_sampler`.

 At most once an hour, `memory_sampler` will file a crash report with the
 `feedback` service that contains a `pprof`-compatible profile. This profile can
 be symbolized and visualized by running the `fx pprof -flame
 <path_to_the_profile>` command.

 Note: on `core`, `feedback` will write the filed crash reports in a temporary
 storage on the device. They can be found via the `fx shell "find /tmp /data
 -name <your_program_name>*"` command.

 ### Out-of-tree components

 Currently, out-of-tree components won't have access to `memory_sampler` because
 it is entirely internal. However, we do plan to make it generally available via
 the SDK; stay tuned.

 ## Memory Profiles

 ### Content

 The profiles produced by `memory_sampler` contain 6 types of samples:

   * `residual_allocated_objects`: a count of (sampled) allocations that are
     still alive at the time of the production of the profile.
   * `residual_allocated_space`: the size of (sampled) allocations that are still
     alive at the time of the production of the profile.
   * `allocated_objects`: a count of (sampled) allocations observed over the
     duration of the profile (both alive and dead).
   * `allocated_space`: the size of (sampled) allocations observed over the
     duration of the profile (both alive and dead).
   * `deallocated_objects`: a count of (sampled) deallocations observed over the
     duration of the profile.
   * `deallocated_space`: the size of (sampled) deallocations observed over the
     duration of the profile.

 Note: because `memory_sampler` is a sampling profiler, it only observes a
 (randomly selected) subset of allocations. Moreover, the selection is based on
 average memory allocated between samples: it skews the distribution towards
 larger allocations. For this reason, counts and space are always underestimated,
 but how much depends on the allocation profile of the instrumented process
 (e.g. a process that only does large allocations is more likely to produce an
 accurate profile than a process that performs a lot of very small
 allocations). Nevertheless, outliers are still likelier to get sampled; chances
 are that if your process suffers from an unforeseen pathological allocation
 pattern, they will tend to show up on profiles.

 ### Profile filing frequency

 `memory_sampler` produces profiles under roughly three conditions:

   1. A partial profile once the recorded allocation data reaches a
      certain size, to reduce memory use. This can be somewhat
      unpredictable, because it depends on the (de)allocation patterns
      and call sites.

   2. A partial profile at least once every 12 hours.

   3. A final profile when the instrumented process exits.

 ### Partial vs final profiles

 `memory_sampler` regularly files partial profiles during the lifetime of the
 instrumented process (depending on observed allocations), as well as a single
 final profile at the end of the process. Both kind of profiles have the same shape, but different semantics:

   * Partial profiles: residual allocations are either leaks *or* allocations
     that have simply not been deallocated yet. Dead allocations correspond to
     allocations that have been deallocated *within the period covered by this
     profile*. In particular, if an allocation was deallocated in profile X, it
     won't appear in the list of dead allocations of profile X+N. This is done to
     restrict the growth of partial profiles.  Partial profiles also come with an
     iteration number; this number is meaningless, except that profiles come in
     numerical order (i.e. profile 1000 was captured earlier than profile 9999
     regardless of when they were captured).
   * Final profiles: residual allocations are allocations that were never
     deallocated within the lifetime of the process. If an instrumented process
     has memory leaks, they are very likely to appear here (if they were
     sampled). Dead allocations correspond to allocations that have been
     deallocated *between the last partial profile and the end of the process*,
     not over the entire lifetime of the process. This limitation is the result
     of a memory optimization.

 Note: If one needs a picture of the overall allocations of a process over its
 lifetime, one should look into every single profile captured during the lifetime
 of the process; it's possible to get an accurate summary by merging every single
 partial profile (discarding live allocations) with a final profile, but we don't
 provide (yet) a script to perform this task.

 ## Performance tuning

 Currently, `memory_sampler` hard-codes all its performance parameters; on a
 private build, we encourage you to tune them to your liking. Some available
 performance knobs follow:

   * `memory_sampler::Recorder::kSamplingIntervalBytes`: the average count of
     bytes allocated between two samples. Reducing this value increases the
     accuracy of the profiler, at the expense of the performance of the
     instrumented process. Note that if the value becomes so small that
     `memory_sampler` is unable to handle the amount of messages it receives,
     this would cause the kernel to kill the instrumented process (because of a
     buffer exhaustion in the FIDL channel).
   * `memory_sampler::sampler_service::DEAD_ALLOCATIONS_PROFILE_THRESHOLD`: the
     amount of observed allocations before filing a partial report. Reducing this
     number decreases the memory footprint `memory_sampler`, at the expense of
     storage space and bandwidth (because more, smaller profiles get filed as a
     result).
   * `memory_sampler::sampler_service::MAX_DURATION_BETWEEN_PROFILES`:
     the maximum elpased time between two partial profiles. Reducing
     this duration will increase the frequency of partial profiles,
     which should in turn reduce the memory consumption of the
     profiler.

 Note also that `memory_sampler` comes with built-in throttling of filed
 profiles, to limit the rate of filing crash reports; in an `eng` build,
 `feedback` does not upload any profile, so it is safe to modify
 `memory_sampler::crash_reporter::setup_crash_reporter` to file profiles more
 often (both to reduce latency between capture and consumption of profiles, as
 well as to increase the sampling rate without significantly increasing
 `memory_sampler`'s memory footprint).
	# Memory Sampler

	`memory_sampler` is a continuous fleetwide sampling heap profiler:

	* Continuous: continuously capture profiles from a running process.
	* Fleetwide: designed to run on devices in the field; profiles are attached to
	crash reports, and can be similarly accessed.
	* Sampling: uses a poisson process to randomly sample allocations based on
	size; this reduces the overhead of the profiler, and makes it suitable to
	get a view of the more interesting allocations without capturing every
	single allocation of a process.
	* Heap profiler: produces (`pprof` compatible) profiles of the heap
	allocations and deallocations of the instrumented process.

	`memory_sampler` comes with three parts:

	* `fuchsia.memory.sampler.Sampler`: a FIDL API which describes an interface to
	communicate allocation metadata between a process and a profiler.
	* `memory_sampler`: a Rust component which provides an implementation for this
	API.
	* `libsampler_instrumentation`: a shared library that one can link against to
	automatically instrument the allocator and communicate allocation metadata
	through the FIDL API.

	`memory_sampler` supports profiling arbitrary processes that use the platform
	default allocator, Scudo. This includes any C, C++ and Rust program built via
	the SDK that don't configure an alternative allocator.

	## Getting started

	`memory_sampler` can be used to continuously profile in-tree
	components. Out-of-tree components are not supported yet, but will once the
	relevant parts have been added to the Fuchsia SDK.

	### In-tree components

	To profile a component with `memory_profiler`:

	1. Add the `memory_sampler` Fuchsia package target to your build (e.g. `fx set
	... --with //src/performance/memory/sampler:memory_sampler`).
	2. Add this package in an appropriate realm. See
	`//src/performance/memory/sampler/meta/memory_sampler.core_shard.cml` for
	an example shard to include in the `core` realm manifest, to include
	`memory_sampler` as a `core` component. `memory_sampler`'s URL is
	`fuchsia-pkg://fuchsia.com/memory_sampler#meta/memory_sampler.cm`, and it
	depends on the `fuchsia.feedback.CrashReporter` and
	`fuchsia.feedback.CrashReportingProductRegister` capabilities, both offered
	by the `feedback` `core` component.

	3. Route the `fuchsia.memory.sampler.Sampler` capability from `memory_sampler`
	to the instrumented process.

	4. Add the `//src/performance/memory/sampler/instrumentation:lib` shared
	library as a dependency of your binary.

	When your component starts, `component_manager` will ensure the configured
	instance of `memory_sampler` is running, and your process will regularly
	communicate allocation information to `memory_sampler`.

	At most once an hour, `memory_sampler` will file a crash report with the
	`feedback` service that contains a `pprof`-compatible profile. This profile can
	be symbolized and visualized by running the `fx pprof -flame
	<path_to_the_profile>` command.

	Note: on `core`, `feedback` will write the filed crash reports in a temporary
	storage on the device. They can be found via the `fx shell "find /tmp /data
	-name <your_program_name>*"` command.

	### Out-of-tree components

	Currently, out-of-tree components won't have access to `memory_sampler` because
	it is entirely internal. However, we do plan to make it generally available via
	the SDK; stay tuned.

	## Memory Profiles

	### Content

	The profiles produced by `memory_sampler` contain 6 types of samples:

	* `residual_allocated_objects`: a count of (sampled) allocations that are
	still alive at the time of the production of the profile.
	* `residual_allocated_space`: the size of (sampled) allocations that are still
	alive at the time of the production of the profile.
	* `allocated_objects`: a count of (sampled) allocations observed over the
	duration of the profile (both alive and dead).
	* `allocated_space`: the size of (sampled) allocations observed over the
	duration of the profile (both alive and dead).
	* `deallocated_objects`: a count of (sampled) deallocations observed over the
	duration of the profile.
	* `deallocated_space`: the size of (sampled) deallocations observed over the
	duration of the profile.

	Note: because `memory_sampler` is a sampling profiler, it only observes a
	(randomly selected) subset of allocations. Moreover, the selection is based on
	average memory allocated between samples: it skews the distribution towards
	larger allocations. For this reason, counts and space are always underestimated,
	but how much depends on the allocation profile of the instrumented process
	(e.g. a process that only does large allocations is more likely to produce an
	accurate profile than a process that performs a lot of very small
	allocations). Nevertheless, outliers are still likelier to get sampled; chances
	are that if your process suffers from an unforeseen pathological allocation
	pattern, they will tend to show up on profiles.

	### Profile filing frequency

	`memory_sampler` produces profiles under roughly three conditions:

	1. A partial profile once the recorded allocation data reaches a
	certain size, to reduce memory use. This can be somewhat
	unpredictable, because it depends on the (de)allocation patterns
	and call sites.

	2. A partial profile at least once every 12 hours.

	3. A final profile when the instrumented process exits.

	### Partial vs final profiles

	`memory_sampler` regularly files partial profiles during the lifetime of the
	instrumented process (depending on observed allocations), as well as a single
	final profile at the end of the process. Both kind of profiles have the same shape, but different semantics:

	* Partial profiles: residual allocations are either leaks or allocations
	that have simply not been deallocated yet. Dead allocations correspond to
	allocations that have been deallocated *within the period covered by this
	profile*. In particular, if an allocation was deallocated in profile X, it
	won't appear in the list of dead allocations of profile X+N. This is done to
	restrict the growth of partial profiles. Partial profiles also come with an
	iteration number; this number is meaningless, except that profiles come in
	numerical order (i.e. profile 1000 was captured earlier than profile 9999
	regardless of when they were captured).
	* Final profiles: residual allocations are allocations that were never
	deallocated within the lifetime of the process. If an instrumented process
	has memory leaks, they are very likely to appear here (if they were
	sampled). Dead allocations correspond to allocations that have been
	deallocated between the last partial profile and the end of the process,
	not over the entire lifetime of the process. This limitation is the result
	of a memory optimization.

	Note: If one needs a picture of the overall allocations of a process over its
	lifetime, one should look into every single profile captured during the lifetime
	of the process; it's possible to get an accurate summary by merging every single
	partial profile (discarding live allocations) with a final profile, but we don't
	provide (yet) a script to perform this task.

	## Performance tuning

	Currently, `memory_sampler` hard-codes all its performance parameters; on a
	private build, we encourage you to tune them to your liking. Some available
	performance knobs follow:

	* `memory_sampler::Recorder::kSamplingIntervalBytes`: the average count of
	bytes allocated between two samples. Reducing this value increases the
	accuracy of the profiler, at the expense of the performance of the
	instrumented process. Note that if the value becomes so small that
	`memory_sampler` is unable to handle the amount of messages it receives,
	this would cause the kernel to kill the instrumented process (because of a
	buffer exhaustion in the FIDL channel).
	* `memory_sampler::sampler_service::DEAD_ALLOCATIONS_PROFILE_THRESHOLD`: the
	amount of observed allocations before filing a partial report. Reducing this
	number decreases the memory footprint `memory_sampler`, at the expense of
	storage space and bandwidth (because more, smaller profiles get filed as a
	result).
	* `memory_sampler::sampler_service::MAX_DURATION_BETWEEN_PROFILES`:
	the maximum elpased time between two partial profiles. Reducing
	this duration will increase the frequency of partial profiles,
	which should in turn reduce the memory consumption of the
	profiler.

	Note also that `memory_sampler` comes with built-in throttling of filed
	profiles, to limit the rate of filing crash reports; in an `eng` build,
	`feedback` does not upload any profile, so it is safe to modify
	`memory_sampler::crash_reporter::setup_crash_reporter` to file profiles more
	often (both to reduce latency between capture and consumption of profiles, as
	well as to increase the sampling rate without significantly increasing
	`memory_sampler`'s memory footprint).