docs/development/performance/perfcompare.md - fuchsia - Git at Google

 # Perfcompare: Performance comparison try builder

 The "perfcompare" try builder is an optional CQ try builder for
 measuring the performance impact of a change without landing it
 (i.e. for pre-submit performance testing). It runs performance tests
 both with and without a CL applied and compares their results to see
 if there were any performance regressions or improvements.

 Googlers can refer to the [Google-internal perfcompare
 docs][internal-doc] for some additional documentation.

 ## How to use it

 ### For fuchsia.git CLs

 To run perfcompare on a Gerrit CL, do the following:

 *   **Start a build:** Select "Choose tryjobs" in the Gerrit Web UI,
     and select one or more of the perfcompare builders from the list
     of builders. A quick way to do that is to type "perfcompare" into
     the search field, which will filter the list to display the
     available perfcompare builders.
 *   **Get the results:** A link to the try builder's results page will
     appear on the CL in Gerrit. When the builder run is finished, the
     results will be under **"compare perf test results without and
     with CL" -> "stdout" (or "raw")** on the build page.

 These perfcompare builders are currently available and supported for
 running `fuchsia.git`'s performance tests:

 *   `terminal.x64-release-perfcompare`
     ([recent builds](https://ci.chromium.org/p/fuchsia/builders/try/terminal.x64-release-perfcompare)):
     This runs `fuchsia.git`'s performance tests on Intel NUCs
     (x64). This is the perfcompare version of the
     `terminal.x64-release` builder (i.e. it runs the same set of
     performance tests as that builder).

 *   `terminal.vim3-release-perfcompare`
     ([recent builds](https://ci.chromium.org/p/fuchsia/builders/try/terminal.vim3-release-perfcompare)):
     This runs `fuchsia.git`'s performance tests on VIM3s (ARM64). This
     is the perfcompare version of the `terminal.vim3-release`
     builder. Note that `terminal.vim3-release` is not run by the CQ by
     default, so it is more likely to be broken or have higher flake
     rates than other builders.

 ### For integration.git CLs

 **Perfcompare is not supported yet for `integration.git` CLs.**

 Specifically, CLs that change dependencies in Jiri manifest files or
 `jiri.lock` files or that use `patches.json` are not yet supported by
 perfcompare. This includes CLs that change prebuilt packages, such as
 toolchain roll CLs.

 Perfcompare does not know how to check out the source and prebuilt
 binaries before and after the CL in these cases, so it will give wrong
 results in these cases. It will produce a finding that there is no
 change in performance, even if the CL does change performance.

 ## Example output

 Here is part of the output from a perfcompare run on a simple [test
 CL]:

 [test CL]: <https://fuchsia-review.googlesource.com/c/fuchsia/+/482567>

 ```none
 Summary counts:
   2939 test cases in total
   2938 test cases had no significant difference (no_sig_diff)
   1 test case got faster
   0 test cases got slower
   0 test cases added
   0 test cases removed

 Results from test cases with differences:

 Test case                                 Improve/regress?  Factor change  Mean before         Mean after
 ----------------------------------------  ----------------  -------------  ------------------  -----------------
 fuchsia.microbenchmarks: ExampleNoOpLoop  faster            0.143-0.145    405.36 +/- 0.39 ns  58.49 +/- 0.30 ns

 Results from all test cases:

 Test case                                      Improve/regress?  Factor change  Mean before        Mean after
 ---------------------------------------------  ----------------  -------------  -----------------  -----------------
 ...
 fuchsia.microbenchmarks: Syscall/ManyArgs      no_sig_diff       0.986-1.008    92.94 +/- 0.66 ns  92.65 +/- 0.40 ns
 fuchsia.microbenchmarks: Syscall/Null          no_sig_diff       0.993-1.007    84.33 +/- 0.40 ns  84.31 +/- 0.19 ns
 fuchsia.microbenchmarks: Thread/CreateAndJoin  no_sig_diff       0.950-1.034    34229 +/- 711 ns   33935 +/- 739 ns
 fuchsia.microbenchmarks: TicksGet              no_sig_diff       0.981-1.022    19.77 +/- 0.19 ns  19.81 +/- 0.21 ns
 ...
 ```

 ## Interpreting the results

 *   `no_sig_diff` means that no [statistically
     significant](https://en.wikipedia.org/wiki/Statistical_significance)
     difference was found in the metric.  It does not mean that there
     was no difference, just that any difference was too small
     (relative to the amount of variation in the metric) to be
     detected.

 *   `ci_too_wide` is shown in the "Factor change" column if the
     confidence intervals in "Mean before" and "Mean after" are so wide
     that the lower bound went negative.  This will happen for a metric
     if it has a large amount of variation.

 ## Testing CL stacks versus individual CLs

 The perfcompare builder measures the performance impact of individual
 CLs, **not** stacks of CLs.

 As an example, suppose you have a series of CLs: P1, P2, P3, P4, P5,
 where P1 is the oldest (that is, all the other CLs depend on it). If
 you run perfcompare on P3, the "with CL" build will include P1+P2+P3,
 while the "without CL" build will include just P1+P2.

 *   This provides a way to measure effects on test cases that haven't
     been landed yet. You can have one CL that adds a new performance
     test, and a follow-on CL that changes the
     software-under-test. Running perfcompare on the second CL will
     show how that CL affects the new test.
 *   If you do want to measure the overall effect of a patch stack, one
     way to do that is to squash the changes into a single Git commit
     (such as with `git merge --squash`), upload that to Gerrit, and
     run perfcompare on that.

 ### The "with CL" and "without CL" builds

 The perfcompare builder applies the following steps sequentially to
 produce the "with CL" and "without CL" builds:

 1.  Check out Fuchsia from the current tip-of-tree revision of
     `integration.git`.
 2.  Apply the CL series to the checkout, up to and including the CL
     being tested. This uses `jiri patch`, which uses `git rebase`.
 3.  Build Fuchsia. This gives the "with CL" build.
 4.  Unapply the topmost CL from the checkout (leaving earlier CLs in
     the CL series, if any, applied). This works by running `git
     checkout HEAD^` in the Git repo where the CL series was applied.
 5.  Build Fuchsia again, doing an incremental build. This gives the
     "without CL" build.

 Steps 1-3 are the same as for non-perfcompare Fuchsia try builders.

 ## Limitations

 *   CLs that use `patches.json` or that change dependencies in Jiri
     manifest files are not supported yet, as mentioned above.

 ## How to run performance comparisons locally

 The perfcompare builders use
 [`perfcompare.py`](/src/testing/perfcompare/perfcompare.py) to compare
 performance results. It is possible to use `perfcompare.py` to run
 performance tests locally (that is, not using Fuchsia Infra) and
 compare their results. See the
 [documentation](/src/testing/perfcompare/README.md).

 ## How to download the raw performance results

 <!-- Allow lines to be wrapped in the code blocks below. For single
      shell commands, wrapping can be more convenient for
      copy-and-pasting than using backslashes for splitting up
      lines. -->
 <style>
 pre.wrapped {
   white-space: pre-wrap;
 }
 </style>

 It is possible to download the raw performance test results produced
 by a perfcompare try builder run. This is useful if you want to modify
 the analysis that `perfcompare.py` performs. To do that, use the
 following steps:

 1.  Find the values of the `cas_instance` and
     `perfcompare_dataset_digest` fields from the output properties of
     the perfcompare build. These can be found on the build page for
     the build (which is reachable from the "Checks" tab in the Gerrit
     code review). Examples of typical values are:

     *   `cas_instance="projects/chromium-swarm/instances/default_instance"`
     *   `perfcompare_dataset_digest="3ff389154e02490f29e379564f7e70b3df66f74c3116ed50172cceec1e9d9888/165"`

     For downloading results data from non-perfcompare builds, the
     field name to use is `perf_dataset_digest` rather than
     `perfcompare_dataset_digest`.

 2.  Download the dataset by running the following command (using the
     prebuilt `cas` tool from the Fuchsia checkout):

     ```shell {:.wrapped}
     ./prebuilt/tools/cas/cas download -cas-instance $CAS_INSTANCE -digest $DIGEST -dir $DEST_DIR
     ```

 3.  Run `perfcompare.py` on the downloaded dataset:

     ```shell {:.wrapped}
     python3 src/testing/perfcompare/perfcompare.py compare_perf $DEST_DIR/without_cl/ $DEST_DIR/with_cl/
     ```

 Note that the RBE-CAS system keeps the data for only about 2-3 months,
 so the download command will fail if the build was not run
 recently. (The current default for the time-to-live (TTL) in
 RBE-CAS is 90 days.)


 [internal-doc]: <https://goto.google.com/fuchsia-perfcompare-internal>
	# Perfcompare: Performance comparison try builder

	The "perfcompare" try builder is an optional CQ try builder for
	measuring the performance impact of a change without landing it
	(i.e. for pre-submit performance testing). It runs performance tests
	both with and without a CL applied and compares their results to see
	if there were any performance regressions or improvements.

	Googlers can refer to the [Google-internal perfcompare
	docs][internal-doc] for some additional documentation.

	## How to use it

	### For fuchsia.git CLs

	To run perfcompare on a Gerrit CL, do the following:

	* Start a build: Select "Choose tryjobs" in the Gerrit Web UI,
	and select one or more of the perfcompare builders from the list
	of builders. A quick way to do that is to type "perfcompare" into
	the search field, which will filter the list to display the
	available perfcompare builders.
	* Get the results: A link to the try builder's results page will
	appear on the CL in Gerrit. When the builder run is finished, the
	results will be under **"compare perf test results without and
	with CL" -> "stdout" (or "raw")** on the build page.

	These perfcompare builders are currently available and supported for
	running `fuchsia.git`'s performance tests:

	* `terminal.x64-release-perfcompare`
	([recent builds](https://ci.chromium.org/p/fuchsia/builders/try/terminal.x64-release-perfcompare)):
	This runs `fuchsia.git`'s performance tests on Intel NUCs
	(x64). This is the perfcompare version of the
	`terminal.x64-release` builder (i.e. it runs the same set of
	performance tests as that builder).

	* `terminal.vim3-release-perfcompare`
	([recent builds](https://ci.chromium.org/p/fuchsia/builders/try/terminal.vim3-release-perfcompare)):
	This runs `fuchsia.git`'s performance tests on VIM3s (ARM64). This
	is the perfcompare version of the `terminal.vim3-release`
	builder. Note that `terminal.vim3-release` is not run by the CQ by
	default, so it is more likely to be broken or have higher flake
	rates than other builders.

	### For integration.git CLs

	Perfcompare is not supported yet for `integration.git` CLs.

	Specifically, CLs that change dependencies in Jiri manifest files or
	`jiri.lock` files or that use `patches.json` are not yet supported by
	perfcompare. This includes CLs that change prebuilt packages, such as
	toolchain roll CLs.

	Perfcompare does not know how to check out the source and prebuilt
	binaries before and after the CL in these cases, so it will give wrong
	results in these cases. It will produce a finding that there is no
	change in performance, even if the CL does change performance.

	## Example output

	Here is part of the output from a perfcompare run on a simple [test
	CL]:

	[test CL]: <https://fuchsia-review.googlesource.com/c/fuchsia/+/482567>

	```none
	Summary counts:
	2939 test cases in total
	2938 test cases had no significant difference (no_sig_diff)
	1 test case got faster
	0 test cases got slower
	0 test cases added
	0 test cases removed

	Results from test cases with differences:

	Test case Improve/regress? Factor change Mean before Mean after
	---------------------------------------- ---------------- ------------- ------------------ -----------------
	fuchsia.microbenchmarks: ExampleNoOpLoop faster 0.143-0.145 405.36 +/- 0.39 ns 58.49 +/- 0.30 ns

	Results from all test cases:

	Test case Improve/regress? Factor change Mean before Mean after
	--------------------------------------------- ---------------- ------------- ----------------- -----------------
	...
	fuchsia.microbenchmarks: Syscall/ManyArgs no_sig_diff 0.986-1.008 92.94 +/- 0.66 ns 92.65 +/- 0.40 ns
	fuchsia.microbenchmarks: Syscall/Null no_sig_diff 0.993-1.007 84.33 +/- 0.40 ns 84.31 +/- 0.19 ns
	fuchsia.microbenchmarks: Thread/CreateAndJoin no_sig_diff 0.950-1.034 34229 +/- 711 ns 33935 +/- 739 ns
	fuchsia.microbenchmarks: TicksGet no_sig_diff 0.981-1.022 19.77 +/- 0.19 ns 19.81 +/- 0.21 ns
	...
	```

	## Interpreting the results

	* `no_sig_diff` means that no [statistically
	significant](https://en.wikipedia.org/wiki/Statistical_significance)
	difference was found in the metric. It does not mean that there
	was no difference, just that any difference was too small
	(relative to the amount of variation in the metric) to be
	detected.

	* `ci_too_wide` is shown in the "Factor change" column if the
	confidence intervals in "Mean before" and "Mean after" are so wide
	that the lower bound went negative. This will happen for a metric
	if it has a large amount of variation.

	## Testing CL stacks versus individual CLs

	The perfcompare builder measures the performance impact of individual
	CLs, not stacks of CLs.

	As an example, suppose you have a series of CLs: P1, P2, P3, P4, P5,
	where P1 is the oldest (that is, all the other CLs depend on it). If
	you run perfcompare on P3, the "with CL" build will include P1+P2+P3,
	while the "without CL" build will include just P1+P2.

	* This provides a way to measure effects on test cases that haven't
	been landed yet. You can have one CL that adds a new performance
	test, and a follow-on CL that changes the
	software-under-test. Running perfcompare on the second CL will
	show how that CL affects the new test.
	* If you do want to measure the overall effect of a patch stack, one
	way to do that is to squash the changes into a single Git commit
	(such as with `git merge --squash`), upload that to Gerrit, and
	run perfcompare on that.

	### The "with CL" and "without CL" builds

	The perfcompare builder applies the following steps sequentially to
	produce the "with CL" and "without CL" builds:

	1. Check out Fuchsia from the current tip-of-tree revision of
	`integration.git`.
	2. Apply the CL series to the checkout, up to and including the CL
	being tested. This uses `jiri patch`, which uses `git rebase`.
	3. Build Fuchsia. This gives the "with CL" build.
	4. Unapply the topmost CL from the checkout (leaving earlier CLs in
	the CL series, if any, applied). This works by running `git
	checkout HEAD^` in the Git repo where the CL series was applied.
	5. Build Fuchsia again, doing an incremental build. This gives the
	"without CL" build.

	Steps 1-3 are the same as for non-perfcompare Fuchsia try builders.

	## Limitations

	* CLs that use `patches.json` or that change dependencies in Jiri
	manifest files are not supported yet, as mentioned above.

	## How to run performance comparisons locally

	The perfcompare builders use
	[`perfcompare.py`](/src/testing/perfcompare/perfcompare.py) to compare
	performance results. It is possible to use `perfcompare.py` to run
	performance tests locally (that is, not using Fuchsia Infra) and
	compare their results. See the
	[documentation](/src/testing/perfcompare/README.md).

	## How to download the raw performance results

	<!-- Allow lines to be wrapped in the code blocks below. For single
	shell commands, wrapping can be more convenient for
	copy-and-pasting than using backslashes for splitting up
	lines. -->
	<style>
	pre.wrapped {
	white-space: pre-wrap;
	}
	</style>

	It is possible to download the raw performance test results produced
	by a perfcompare try builder run. This is useful if you want to modify
	the analysis that `perfcompare.py` performs. To do that, use the
	following steps:

	1. Find the values of the `cas_instance` and
	`perfcompare_dataset_digest` fields from the output properties of
	the perfcompare build. These can be found on the build page for
	the build (which is reachable from the "Checks" tab in the Gerrit
	code review). Examples of typical values are:

	* `cas_instance="projects/chromium-swarm/instances/default_instance"`
	* `perfcompare_dataset_digest="3ff389154e02490f29e379564f7e70b3df66f74c3116ed50172cceec1e9d9888/165"`

	For downloading results data from non-perfcompare builds, the
	field name to use is `perf_dataset_digest` rather than
	`perfcompare_dataset_digest`.

	2. Download the dataset by running the following command (using the
	prebuilt `cas` tool from the Fuchsia checkout):

	```shell {:.wrapped}
	./prebuilt/tools/cas/cas download -cas-instance $CAS_INSTANCE -digest $DIGEST -dir $DEST_DIR
	```

	3. Run `perfcompare.py` on the downloaded dataset:

	```shell {:.wrapped}
	python3 src/testing/perfcompare/perfcompare.py compare_perf $DEST_DIR/without_cl/ $DEST_DIR/with_cl/
	```

	Note that the RBE-CAS system keeps the data for only about 2-3 months,
	so the download command will fail if the build was not run
	recently. (The current default for the time-to-live (TTL) in
	RBE-CAS is 90 days.)


	[internal-doc]: <https://goto.google.com/fuchsia-perfcompare-internal>