blob: 1ad02f6532fb8bca473cb00df2c5e1f20f2415a0 [file] [log] [blame] [view] [edit]
# Perfcompare: Performance comparison try builder
The "perfcompare" try builder is an optional CQ try builder for
measuring the performance impact of a change without landing it
(i.e. for pre-submit performance testing). It runs performance tests
both with and without a CL applied and compares their results to see
if there were any performance regressions or improvements.
Googlers can refer to the [Google-internal perfcompare
docs][internal-doc] for some additional documentation.
## How to use it
### For fuchsia.git CLs
To run perfcompare on a Gerrit CL, do the following:
* **Start a build:** Select "Choose tryjobs" in the Gerrit Web UI,
and select one or more of the perfcompare builders from the list
of builders. A quick way to do that is to type "perfcompare" into
the search field, which will filter the list to display the
available perfcompare builders.
* **Get the results:** A link to the try builder's results page will
appear on the CL in Gerrit. When the builder run is finished, the
results will be under **"compare perf test results without and
with CL" -> "stdout" (or "raw")** on the build page.
These perfcompare builders are currently available and supported for
running `fuchsia.git`'s performance tests:
* `terminal.x64-release-perfcompare`
([recent builds](https://ci.chromium.org/p/fuchsia/builders/try/terminal.x64-release-perfcompare)):
This runs `fuchsia.git`'s performance tests on Intel NUCs
(x64). This is the perfcompare version of the
`terminal.x64-release` builder (i.e. it runs the same set of
performance tests as that builder).
* `terminal.vim3-release-perfcompare`
([recent builds](https://ci.chromium.org/p/fuchsia/builders/try/terminal.vim3-release-perfcompare)):
This runs `fuchsia.git`'s performance tests on VIM3s (ARM64). This
is the perfcompare version of the `terminal.vim3-release`
builder. Note that `terminal.vim3-release` is not run by the CQ by
default, so it is more likely to be broken or have higher flake
rates than other builders.
### For integration.git CLs
**Perfcompare is not supported yet for `integration.git` CLs.**
Specifically, CLs that change dependencies in Jiri manifest files or
`jiri.lock` files or that use `patches.json` are not yet supported by
perfcompare. This includes CLs that change prebuilt packages, such as
toolchain roll CLs.
Perfcompare does not know how to check out the source and prebuilt
binaries before and after the CL in these cases, so it will give wrong
results in these cases. It will produce a finding that there is no
change in performance, even if the CL does change performance.
## Example output
Here is part of the output from a perfcompare run on a simple [test
CL]:
[test CL]: <https://fuchsia-review.googlesource.com/c/fuchsia/+/482567>
```none
Summary counts:
2939 test cases in total
2938 test cases had no significant difference (no_sig_diff)
1 test case got faster
0 test cases got slower
0 test cases added
0 test cases removed
Results from test cases with differences:
Test case Improve/regress? Factor change Mean before Mean after
---------------------------------------- ---------------- ------------- ------------------ -----------------
fuchsia.microbenchmarks: ExampleNoOpLoop faster 0.143-0.145 405.36 +/- 0.39 ns 58.49 +/- 0.30 ns
Results from all test cases:
Test case Improve/regress? Factor change Mean before Mean after
--------------------------------------------- ---------------- ------------- ----------------- -----------------
...
fuchsia.microbenchmarks: Syscall/ManyArgs no_sig_diff 0.986-1.008 92.94 +/- 0.66 ns 92.65 +/- 0.40 ns
fuchsia.microbenchmarks: Syscall/Null no_sig_diff 0.993-1.007 84.33 +/- 0.40 ns 84.31 +/- 0.19 ns
fuchsia.microbenchmarks: Thread/CreateAndJoin no_sig_diff 0.950-1.034 34229 +/- 711 ns 33935 +/- 739 ns
fuchsia.microbenchmarks: TicksGet no_sig_diff 0.981-1.022 19.77 +/- 0.19 ns 19.81 +/- 0.21 ns
...
```
## Interpreting the results
* `no_sig_diff` means that no [statistically
significant](https://en.wikipedia.org/wiki/Statistical_significance)
difference was found in the metric. It does not mean that there
was no difference, just that any difference was too small
(relative to the amount of variation in the metric) to be
detected.
* `ci_too_wide` is shown in the "Factor change" column if the
confidence intervals in "Mean before" and "Mean after" are so wide
that the lower bound went negative. This will happen for a metric
if it has a large amount of variation.
## Testing CL stacks versus individual CLs
The perfcompare builder measures the performance impact of individual
CLs, **not** stacks of CLs.
As an example, suppose you have a series of CLs: P1, P2, P3, P4, P5,
where P1 is the oldest (that is, all the other CLs depend on it). If
you run perfcompare on P3, the "with CL" build will include P1+P2+P3,
while the "without CL" build will include just P1+P2.
* This provides a way to measure effects on test cases that haven't
been landed yet. You can have one CL that adds a new performance
test, and a follow-on CL that changes the
software-under-test. Running perfcompare on the second CL will
show how that CL affects the new test.
* If you do want to measure the overall effect of a patch stack, one
way to do that is to squash the changes into a single Git commit
(such as with `git merge --squash`), upload that to Gerrit, and
run perfcompare on that.
### The "with CL" and "without CL" builds
The perfcompare builder applies the following steps sequentially to
produce the "with CL" and "without CL" builds:
1. Check out Fuchsia from the current tip-of-tree revision of
`integration.git`.
2. Apply the CL series to the checkout, up to and including the CL
being tested. This uses `jiri patch`, which uses `git rebase`.
3. Build Fuchsia. This gives the "with CL" build.
4. Unapply the topmost CL from the checkout (leaving earlier CLs in
the CL series, if any, applied). This works by running `git
checkout HEAD^` in the Git repo where the CL series was applied.
5. Build Fuchsia again, doing an incremental build. This gives the
"without CL" build.
Steps 1-3 are the same as for non-perfcompare Fuchsia try builders.
## Limitations
* CLs that use `patches.json` or that change dependencies in Jiri
manifest files are not supported yet, as mentioned above.
## How to run performance comparisons locally
The perfcompare builders use
[`perfcompare.py`](/src/testing/perfcompare/perfcompare.py) to compare
performance results. It is possible to use `perfcompare.py` to run
performance tests locally (that is, not using Fuchsia Infra) and
compare their results. See the
[documentation](/src/testing/perfcompare/README.md).
## How to download the raw performance results
<!-- Allow lines to be wrapped in the code blocks below. For single
shell commands, wrapping can be more convenient for
copy-and-pasting than using backslashes for splitting up
lines. -->
<style>
pre.wrapped {
white-space: pre-wrap;
}
</style>
It is possible to download the raw performance test results produced
by a perfcompare try builder run. This is useful if you want to modify
the analysis that `perfcompare.py` performs. To do that, use the
following steps:
1. Find the values of the `cas_instance` and
`perfcompare_dataset_digest` fields from the output properties of
the perfcompare build. These can be found on the build page for
the build (which is reachable from the "Checks" tab in the Gerrit
code review). Examples of typical values are:
* `cas_instance="projects/chromium-swarm/instances/default_instance"`
* `perfcompare_dataset_digest="3ff389154e02490f29e379564f7e70b3df66f74c3116ed50172cceec1e9d9888/165"`
For downloading results data from non-perfcompare builds, the
field name to use is `perf_dataset_digest` rather than
`perfcompare_dataset_digest`.
2. Download the dataset by running the following command (using the
prebuilt `cas` tool from the Fuchsia checkout):
```shell {:.wrapped}
./prebuilt/tools/cas/cas download -cas-instance $CAS_INSTANCE -digest $DIGEST -dir $DEST_DIR
```
3. Run `perfcompare.py` on the downloaded dataset:
```shell {:.wrapped}
python3 src/testing/perfcompare/perfcompare.py compare_perf $DEST_DIR/without_cl/ $DEST_DIR/with_cl/
```
Note that the RBE-CAS system keeps the data for only about 2-3 months,
so the download command will fail if the build was not run
recently. (The current default for the time-to-live (TTL) in
RBE-CAS is 90 days.)
[internal-doc]: <https://goto.google.com/fuchsia-perfcompare-internal>