docs/development/performance/use_cases_for_tests.md - fuchsia - Git at Google

 # Potential use cases for performance tests

 ## Background

 This page describes various use cases that people may have for
 automated performance tests.

 One reason for describing these use cases is to document how well they
 are served by Fuchsia's tools. Some of these use cases are better
 served by Fuchsia's current tools than others.

 A broader reason for describing these use cases is to help us
 communicate about testing. The use cases one person has in mind for
 testing may be different from the use cases another person has in
 mind.

 If a person has some use cases in mind when writing a test, it may be
 useful to state the use cases explicitly when sending the test to a
 reviewer for code review, or when asking for help.

 The use cases for a test are related to how we treat regressions in
 its results, which in turn is affected by limitations in how we
 measure performance and limitations of the statistical tests we use
 for detecting regressions and improvements.

 For example, a large set of tests that is useful for *comparing cases*
 (see below) may produce a high rate of false regression warnings due
 to the [**multiple comparisons
 problem**](https://en.wikipedia.org/wiki/Multiple_comparisons_problem)
 in statistics. This set of tests might therefore not be very useful
 for *detecting regressions*. We might find that only large regressions
 are actionable, while small regression warnings are usually spurious
 and can be ignored.

 Some of the use cases described here overlap -- they are not meant to
 be mutually exclusive.

 ## Use cases

 *   **Detecting regressions**, either post-commit or pre-commit: We
     mostly do this post-commit using [Chromeperf and the culprit
     finder tools](fuchsiaperf_consumers.md). Pre-commit detection is
     opt-in only, via [perfcompare](perfcompare.md) -- it is not
     applied by default.

     *   **Detecting gradual regressions (creep)**, resulting from the
         cumulative effect of many changes. We currently do not have
         any automated tools for doing this. Chromeperf's detection
         algorithm only looks for regressions introduced by a single
         revision or at a single point in time.

 *   **Testing potential improvements**: That is, testing the effect of
     changes that attempt to improve performance. This can be done
     using [perfcompare](perfcompare.md).

 *   **Comparing cases**: That is, comparing the relative performance
     of related test cases.

     This can be used to look for cases that perform unexpectedly badly
     relative to others, because we may want to fix those cases. As an
     example, the performance tests for FIDL encoding and decoding have
     this as a use case.

     This can also be used to measure the costs of operations or
     subsystems without using profiling. For example, the [IPC round
     trip microbenchmarks] measure the round trip time between threads
     or processes using various different kernel and userland IPC
     operations. By testing this with and without using FIDL, we can
     estimate the overhead that FIDL and other userland libraries add
     on top of the kernel IPC primitives. Similarly, by testing IPC
     between processes and between threads within a process, we can
     estimate the cost of a context switch that switches between
     address spaces.

 *   **Providing clues about other regressions**: A regression in
     metric A might not be something we care about as such, but it
     might be useful in providing a clue about the cause of a
     regression in metric B. This use case is similar to profiling, but
     more general.

     For example, if the *frames per second* metric has regressed, we
     can look at the *frame build time* metric to see whether that also
     changed.

 *   **Profiling**: That is, analyzing the breakdown of time or memory
     usage within a test.

     While Linux has tools such as [perf] and [OProfile] for doing
     statistical profiling of CPU time usage, Fuchsia currently has no
     equivalent tools.

     It is common to use Fuchsia's tracing system for examining the
     breakdown of time usage, for either automated tests or manual
     tests. (For this, automated tests have the benefit over manual
     tests of being more reproducible and less work to run.) However,
     Fuchsia's tracing system has two differences from statistical
     profiling tools like perf and OProfile:

     *   Tracing only records time usage for code that has annotated to
         produce trace events.
     *   The typical uses for traces are to inspect them manually or to
         extract from them a fixed set of metrics (such as *frames per
         second* and *frame build times*). We don't yet have tools for
         generating more open-ended sets of statistics of the kind
         usually produced by profiling tools.

     Note that the infrastructure around `fuchsiaperf` files is not
     well suited to recording profiling data. It is not well suited to
     recording large numbers of metrics describing the breakdown of
     time or memory usage.

 *   **Informing design decisions**: The performance characteristics of
     a subsystem inform how we use it. If the subsystem is slow, we
     might avoid it, build a layer on top of it (such as caching), or
     work around it in some other way.

     An example of this use case is the "Latency numbers every
     programmer should know" table. See [this recent
     version][latency-numbers-pdf] of the table.

     An early version of this table appears in a [talk by Jeff
     Dean][talk] ([Stanford CS295 class lecture, spring,
     2007][talk-link-context]) in which he advocates writing
     microbenchmarks for building intuition about performance and for
     using as a basis for performance estimates. Various updated
     versions of this table exist; see [this Stack Exchange
     question][stack-exchange-page] for further discussion.

 [IPC round trip microbenchmarks]: /src/tests/microbenchmarks/round_trips.cc
 [perf]: https://en.wikipedia.org/wiki/Perf_(Linux)
 [OProfile]: https://en.wikipedia.org/wiki/OProfile
 [latency-numbers-pdf]: https://static.googleusercontent.com/media/sre.google/en//static/pdf/rule-of-thumb-latency-numbers-letter.pdf
 [talk]: https://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/stanford-295-talk.pdf#page=13
 [talk-link-context]: https://research.google/people/jeff/
 [stack-exchange-page]: https://softwareengineering.stackexchange.com/questions/312485/how-can-jeff-deans-latency-numbers-every-programmer-should-know-be-accurate-i
	# Potential use cases for performance tests

	## Background

	This page describes various use cases that people may have for
	automated performance tests.

	One reason for describing these use cases is to document how well they
	are served by Fuchsia's tools. Some of these use cases are better
	served by Fuchsia's current tools than others.

	A broader reason for describing these use cases is to help us
	communicate about testing. The use cases one person has in mind for
	testing may be different from the use cases another person has in
	mind.

	If a person has some use cases in mind when writing a test, it may be
	useful to state the use cases explicitly when sending the test to a
	reviewer for code review, or when asking for help.

	The use cases for a test are related to how we treat regressions in
	its results, which in turn is affected by limitations in how we
	measure performance and limitations of the statistical tests we use
	for detecting regressions and improvements.

	For example, a large set of tests that is useful for comparing cases
	(see below) may produce a high rate of false regression warnings due
	to the [**multiple comparisons
	problem**](https://en.wikipedia.org/wiki/Multiple_comparisons_problem)
	in statistics. This set of tests might therefore not be very useful
	for detecting regressions. We might find that only large regressions
	are actionable, while small regression warnings are usually spurious
	and can be ignored.

	Some of the use cases described here overlap -- they are not meant to
	be mutually exclusive.

	## Use cases

	* Detecting regressions, either post-commit or pre-commit: We
	mostly do this post-commit using [Chromeperf and the culprit
	finder tools](fuchsiaperf_consumers.md). Pre-commit detection is
	opt-in only, via [perfcompare](perfcompare.md) -- it is not
	applied by default.

	* Detecting gradual regressions (creep), resulting from the
	cumulative effect of many changes. We currently do not have
	any automated tools for doing this. Chromeperf's detection
	algorithm only looks for regressions introduced by a single
	revision or at a single point in time.

	* Testing potential improvements: That is, testing the effect of
	changes that attempt to improve performance. This can be done
	using [perfcompare](perfcompare.md).

	* Comparing cases: That is, comparing the relative performance
	of related test cases.

	This can be used to look for cases that perform unexpectedly badly
	relative to others, because we may want to fix those cases. As an
	example, the performance tests for FIDL encoding and decoding have
	this as a use case.

	This can also be used to measure the costs of operations or
	subsystems without using profiling. For example, the [IPC round
	trip microbenchmarks] measure the round trip time between threads
	or processes using various different kernel and userland IPC
	operations. By testing this with and without using FIDL, we can
	estimate the overhead that FIDL and other userland libraries add
	on top of the kernel IPC primitives. Similarly, by testing IPC
	between processes and between threads within a process, we can
	estimate the cost of a context switch that switches between
	address spaces.

	* Providing clues about other regressions: A regression in
	metric A might not be something we care about as such, but it
	might be useful in providing a clue about the cause of a
	regression in metric B. This use case is similar to profiling, but
	more general.

	For example, if the frames per second metric has regressed, we
	can look at the frame build time metric to see whether that also
	changed.

	* Profiling: That is, analyzing the breakdown of time or memory
	usage within a test.

	While Linux has tools such as [perf] and [OProfile] for doing
	statistical profiling of CPU time usage, Fuchsia currently has no
	equivalent tools.

	It is common to use Fuchsia's tracing system for examining the
	breakdown of time usage, for either automated tests or manual
	tests. (For this, automated tests have the benefit over manual
	tests of being more reproducible and less work to run.) However,
	Fuchsia's tracing system has two differences from statistical
	profiling tools like perf and OProfile:

	* Tracing only records time usage for code that has annotated to
	produce trace events.
	* The typical uses for traces are to inspect them manually or to
	extract from them a fixed set of metrics (such as *frames per
	second* and frame build times). We don't yet have tools for
	generating more open-ended sets of statistics of the kind
	usually produced by profiling tools.

	Note that the infrastructure around `fuchsiaperf` files is not
	well suited to recording profiling data. It is not well suited to
	recording large numbers of metrics describing the breakdown of
	time or memory usage.

	* Informing design decisions: The performance characteristics of
	a subsystem inform how we use it. If the subsystem is slow, we
	might avoid it, build a layer on top of it (such as caching), or
	work around it in some other way.

	An example of this use case is the "Latency numbers every
	programmer should know" table. See [this recent
	version][latency-numbers-pdf] of the table.

	An early version of this table appears in a [talk by Jeff
	Dean][talk] ([Stanford CS295 class lecture, spring,
	2007][talk-link-context]) in which he advocates writing
	microbenchmarks for building intuition about performance and for
	using as a basis for performance estimates. Various updated
	versions of this table exist; see [this Stack Exchange
	question][stack-exchange-page] for further discussion.

	[IPC round trip microbenchmarks]: /src/tests/microbenchmarks/round_trips.cc
	[perf]: https://en.wikipedia.org/wiki/Perf_(Linux)
	[OProfile]: https://en.wikipedia.org/wiki/OProfile
	[latency-numbers-pdf]: https://static.googleusercontent.com/media/sre.google/en//static/pdf/rule-of-thumb-latency-numbers-letter.pdf
	[talk]: https://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/stanford-295-talk.pdf#page=13
	[talk-link-context]: https://research.google/people/jeff/
	[stack-exchange-page]: https://softwareengineering.stackexchange.com/questions/312485/how-can-jeff-deans-latency-numbers-every-programmer-should-know-be-accurate-i