docs/development/testing/testing_for_flakiness_in_cq.md - fuchsia - Git at Google

 # Testing for flakiness in CQ

 To test for flakiness in CQ, the infrastructure can run a test multiple
 times and fail the overall build if there is a single failure. This
 happens automatically when the infrastructure determines there's a small
 number of tests affected by the commit being tested (according to the build
 graph).

 ## Format

 A change author can tell the infrastructure to run a specific test many times by
 adding a `Multiply` footer to the commit message:

 ```txt
 Multiply: test_selector
 ```

 `test_selector` can be a test name, a substring of a test name, or an
 [re2 regular expression](https://github.com/google/re2/wiki/Syntax) that matches
 a test name.

 Fuchsia component tests are referenced by package URL:

 ```txt
 Multiply: fuchsia-pkg://fuchsia.com/foo_tests#meta/foo_tests.cm
 ```

 Host tests are referenced by path:

 ```txt
 Multiply: host_x64/obj/src/bar_tests.sh
 ```

 Substrings of test names are also accepted:

 ```txt
 Multiply: foo_tests
 Multiply: bar_tests
 ```

 Multipliers may be combined into a single comma-separated line:

 ```txt
 Multiply: foo_tests, bar_tests
 ```

 All-caps `MULTIPLY` is also accepted.

 Example uses of `Multiply` from real changes:

 - [`Multiply: driver_development_test`](https://fuchsia-review.googlesource.com/c/fuchsia/+/677686)
 - [`Multiply: ffx_daemon_target_lib_test`](https://fuchsia-review.googlesource.com/c/fuchsia/+/678622)
 - [`Multiply: virtual-keyboard-test`](https://fuchsia-review.googlesource.com/c/fuchsia/+/677607)
 - [`Multiply: text_manager_integration_test: 10`](https://fuchsia-review.googlesource.com/c/fuchsia/+/671465)

 ## Run count

 By default, the infrastructure uses historical test duration data to calculate a
 number of runs. The number of runs is chosen to produce a single multiplied test
 shard whose duration is similar to the expected duration of the other shards, up
 to a maximum of 2000 test runs. Slower tests will run fewer times, while faster
 tests will run more times.

 It's sometimes desirable to override the default number of runs (for example,
 because the default is too high and causes timeouts). In this case you can
 explicitly specify a number of runs.  For example:

 ```txt
 Multiply: foo_tests: 100
 ```

 ## Limitations

 {% dynamic if user.is_googler %}

 ### Internal tests

 Multiplying internal tests on public changes is not allowed, to avoid leaking
 confidential information. If you want to multiply an internal test, use `fx
 make-integration-patch` to create an internal CL that patches your CL into the
 integration repository. Then add the necessary `Multiply` line to the
 integration CL instead of the original public CL, and CQ+1 the integration CL.

 {% dynamic endif %}

 ### Validation

 If there is a typo in your `Multiply` clause, or if your `Multiply` selector
 doesn't match any tests on any builders, it will silently fail to multiply any
 tests.

 Therefore, it's important to manually verify that the `Multiply` took effect.
 For every builder on which your `Multiply` takes effect, a comment of the
 following form will be added to your change in Gerrit:

 ```txt
 A builder created multiplier shards. Click the following link for more details:
 ```

 The comment will include a link to the build that runs the multiplied tests
 ([example](https://fuchsia-review.googlesource.com/c/fuchsia/+/671465/5#message-e1f2b9db0dfcd1bf8436205c9eff6da0735e10b7)).

 If no such comment appears, then there probably is an error with the syntax or
 the test does not run in any of the regular CQ builders. In this case, you have
 to either add it to the build graph so that it is run by one of the builders or
 manually choose the tryjob that runs the test if it's run in an optional
 builder.

 If the linked build is completed, you should see a step like `multiplied:<shard
 name>-<test name>` under one of the `passes`, `flakes`, or `failures` steps. If
 the build is not yet completed, you can click on the link under the `build` step
 named `<builder name>-subbuild`, which will take you to the subbuild build page
 where you should see a similar `multiplied` step. Since the comment doesn't
 specify which tests were multiplied, you can look at the build pages to confirm
 (in case you multiplied more than one test).

 For example:

 ![multiplied shard screenshot](multiplied-shard-screenshot.png)

 ### No more than five matching tests

 A single multiplier is not allowed to match more than five tests, to prevent
 change authors from accidentally multiplying a huge number of tests and
 overwhelming the testing infrastructure.

 If you get a tryjob failure as a result of a `Multiply` statement that matches
 too many tests, simply edit your commit message locally or in the Gerrit UI to
 make your test selector more specific. Then retry CQ.

 ### Changing `Multiply` after a CQ dry run passes

 If all tryjobs have already passed a CQ dry run and you add or edit a `Multiply`
 clause without making any code changes, subsequent CQ+1 or CQ+2 attempts within
 24 hours of the dry run will not re-run the builders and the updated `Multiply`
 clause will not be respected.

 This is because the CQ service treats commit message updates as "trivial" and
 does not invalidate past CQ attempts on the patchset.

 To work around this, you can either:

 - Manually retry a subset of tryjobs using the **Choose Tryjobs** menu and wait
   for them to pass before submitting.
 - OR retry all tryjobs by making a non-functional code change (e.g. add a
   comment to some code) and uploading a new patchset to invalidate the old
   tryjob results. Then retry CQ with the `Multiply` footer present.

 ### Timeouts

 The default run count for a multiplied test is based on the historical duration
 of the test. If your change increases the duration of a multiplied test, the
 default run count may be too high and cause the task running the test to time
 out and not report any results.

 In this case, you should override the default run count by manually specifying a
 lower run count, e.g.:

 ```txt
 Multiply: foo_tests: 30
 ```

 ### No test case multipliers

 `Multiply` only supports multiplication of top-level suites (Fuchsia test
 packages and host test executables). All test cases within a multiplied test
 suite will be multiplied.

 There is no way to multiply a single test case within a test suite.
	# Testing for flakiness in CQ

	To test for flakiness in CQ, the infrastructure can run a test multiple
	times and fail the overall build if there is a single failure. This
	happens automatically when the infrastructure determines there's a small
	number of tests affected by the commit being tested (according to the build
	graph).

	## Format

	A change author can tell the infrastructure to run a specific test many times by
	adding a `Multiply` footer to the commit message:

	```txt
	Multiply: test_selector
	```

	`test_selector` can be a test name, a substring of a test name, or an
	[re2 regular expression](https://github.com/google/re2/wiki/Syntax) that matches
	a test name.

	Fuchsia component tests are referenced by package URL:

	```txt
	Multiply: fuchsia-pkg://fuchsia.com/foo_tests#meta/foo_tests.cm
	```

	Host tests are referenced by path:

	```txt
	Multiply: host_x64/obj/src/bar_tests.sh
	```

	Substrings of test names are also accepted:

	```txt
	Multiply: foo_tests
	Multiply: bar_tests
	```

	Multipliers may be combined into a single comma-separated line:

	```txt
	Multiply: foo_tests, bar_tests
	```

	All-caps `MULTIPLY` is also accepted.

	Example uses of `Multiply` from real changes:

	- [`Multiply: driver_development_test`](https://fuchsia-review.googlesource.com/c/fuchsia/+/677686)
	- [`Multiply: ffx_daemon_target_lib_test`](https://fuchsia-review.googlesource.com/c/fuchsia/+/678622)
	- [`Multiply: virtual-keyboard-test`](https://fuchsia-review.googlesource.com/c/fuchsia/+/677607)
	- [`Multiply: text_manager_integration_test: 10`](https://fuchsia-review.googlesource.com/c/fuchsia/+/671465)

	## Run count

	By default, the infrastructure uses historical test duration data to calculate a
	number of runs. The number of runs is chosen to produce a single multiplied test
	shard whose duration is similar to the expected duration of the other shards, up
	to a maximum of 2000 test runs. Slower tests will run fewer times, while faster
	tests will run more times.

	It's sometimes desirable to override the default number of runs (for example,
	because the default is too high and causes timeouts). In this case you can
	explicitly specify a number of runs. For example:

	```txt
	Multiply: foo_tests: 100
	```

	## Limitations

	{% dynamic if user.is_googler %}

	### Internal tests

	Multiplying internal tests on public changes is not allowed, to avoid leaking
	confidential information. If you want to multiply an internal test, use `fx
	make-integration-patch` to create an internal CL that patches your CL into the
	integration repository. Then add the necessary `Multiply` line to the
	integration CL instead of the original public CL, and CQ+1 the integration CL.

	{% dynamic endif %}

	### Validation

	If there is a typo in your `Multiply` clause, or if your `Multiply` selector
	doesn't match any tests on any builders, it will silently fail to multiply any
	tests.

	Therefore, it's important to manually verify that the `Multiply` took effect.
	For every builder on which your `Multiply` takes effect, a comment of the
	following form will be added to your change in Gerrit:

	```txt
	A builder created multiplier shards. Click the following link for more details:
	```

	The comment will include a link to the build that runs the multiplied tests
	([example](https://fuchsia-review.googlesource.com/c/fuchsia/+/671465/5#message-e1f2b9db0dfcd1bf8436205c9eff6da0735e10b7)).

	If no such comment appears, then there probably is an error with the syntax or
	the test does not run in any of the regular CQ builders. In this case, you have
	to either add it to the build graph so that it is run by one of the builders or
	manually choose the tryjob that runs the test if it's run in an optional
	builder.

	If the linked build is completed, you should see a step like `multiplied:<shard
	name>-<test name>` under one of the `passes`, `flakes`, or `failures` steps. If
	the build is not yet completed, you can click on the link under the `build` step
	named `<builder name>-subbuild`, which will take you to the subbuild build page
	where you should see a similar `multiplied` step. Since the comment doesn't
	specify which tests were multiplied, you can look at the build pages to confirm
	(in case you multiplied more than one test).

	For example:

	![multiplied shard screenshot](multiplied-shard-screenshot.png)

	### No more than five matching tests

	A single multiplier is not allowed to match more than five tests, to prevent
	change authors from accidentally multiplying a huge number of tests and
	overwhelming the testing infrastructure.

	If you get a tryjob failure as a result of a `Multiply` statement that matches
	too many tests, simply edit your commit message locally or in the Gerrit UI to
	make your test selector more specific. Then retry CQ.

	### Changing `Multiply` after a CQ dry run passes

	If all tryjobs have already passed a CQ dry run and you add or edit a `Multiply`
	clause without making any code changes, subsequent CQ+1 or CQ+2 attempts within
	24 hours of the dry run will not re-run the builders and the updated `Multiply`
	clause will not be respected.

	This is because the CQ service treats commit message updates as "trivial" and
	does not invalidate past CQ attempts on the patchset.

	To work around this, you can either:

	- Manually retry a subset of tryjobs using the Choose Tryjobs menu and wait
	for them to pass before submitting.
	- OR retry all tryjobs by making a non-functional code change (e.g. add a
	comment to some code) and uploading a new patchset to invalidate the old
	tryjob results. Then retry CQ with the `Multiply` footer present.

	### Timeouts

	The default run count for a multiplied test is based on the historical duration
	of the test. If your change increases the duration of a multiplied test, the
	default run count may be too high and cause the task running the test to time
	out and not report any results.

	In this case, you should override the default run count by manually specifying a
	lower run count, e.g.:

	```txt
	Multiply: foo_tests: 30
	```

	### No test case multipliers

	`Multiply` only supports multiplication of top-level suites (Fuchsia test
	packages and host test executables). All test cases within a multiplied test
	suite will be multiplied.

	There is no way to multiply a single test case within a test suite.