| # Testing best practices |
| |
| As you write tests for Fuchsia, you want to make sure that you are familiarized |
| with the [testing principles][testing-principles] and the |
| [testing scope][test-scope] for writing tests. |
| |
| ## Desirable properties of tests |
| |
| The properties below are generally good to have, provided that they don't |
| conflict with other goals of the test. |
| |
| - **Isolated**: tests should be isolated from code, systems, and details that |
| are outside the scope of the tests. Different test cases should be isolated |
| from each other. On Fuchsia we use the isolation guarantees given by the |
| Component Framework to isolate tests. A useful outcome of isolated tests is |
| that tests can be run in parallel or in a different order and their result is |
| the same. |
| - **Hermetic**: the result of a test is defined by the contents of the test. On |
| Fuchsia, a unit test or an integration test is hermetic if its result is |
| defined by the contents of the test’s package. If a test’s package hasn’t |
| changed then it can be assumed that its behavior hasn’t changed, and it’s |
| still passing or still failing. This property can be used to select which |
| tests to run in order to validate a given change. In the absence of |
| hermeticity guarantees, the next best alternative is to run all the tests, |
| which is costly. |
| - **Reproducible**: re-running the same test should produce the same result. |
| Isolation and hermeticity improve reproducibility. The larger the scope of the |
| test, the more difficult it is for the test to be reproducible. |
| - **Proximity to the code under test**: tests should focus on a particular unit |
| and test, and control for what’s not under test. |
| - **Resilient**: tests shouldn’t need to change when the code under test |
| changes, unless the change is important to the code’s purpose. Tests that |
| continue to work after benign changes to the code are more resilient. Tests |
| that exercise the code’s public APIs or other forms of contracts tend to be |
| more resilient. This also happens when you focus on testing behavior, not |
| implementation. |
| - **Easy to troubleshoot**: when tests fail, they should produce clear |
| actionable errors that identify the defect. Tests that isolate errors, or |
| otherwise have a smaller scope, are usually easier to troubleshoot when they |
| fail. |
| - **Fast**: tests are often part of the developer feedback loop. A faster |
| feedback loop is a better feedback loop that makes developers more productive. |
| - **Reliable**: test failure should indicate a real defect with the code under |
| test. Reliable tests give more confidence when they pass and produce fewer |
| false failures that are costly to maintain. |
| - **Flexible**: the fewer constraints there are on running tests, the easier it |
| is to run them. On Fuchsia we particularly appreciate if tests can run on |
| emulators, when possible. |
| |
| ## Undesirable properties of tests |
| |
| The properties below are generally bad to have. Depending on circumstances they |
| may be the downsides of a tradeoff that was made for the purpose of the test, |
| which as a whole is a net positive. |
| |
| - **Flaky**: tests that produce false failures, then pass when they’re retried, |
| are flaky. Flaky tests are costlier to maintain, slower to run due to retries, |
| and provide lower confidence results. |
| - **Slow**: tests that take longer to run create less efficient feedback loops |
| and make developers less productive. The bigger the scope of the test, the |
| slower it is usually to run. |
| - **Difficult to troubleshoot**: tests that fail with errors that are not |
| immediately actionable or otherwise don’t indicate the root cause of the |
| failure are more difficult to troubleshoot. Developers have to look elsewhere |
| other than the test failure itself, such as at system logs or internal system |
| state, to troubleshoot the test failure. |
| - **Change detectors**: tests that are coupled too closely with implementation |
| details that aren’t important for functionality will often fail when the code |
| under test changes in ways that are benign to the external observer. Change |
| detector tests are more costly to maintain. |
| |
| ## Test against interfaces and contracts |
| |
| <span class="compare-better">Recommended</span>: Test using public APIs and |
| other interfaces and contracts offered to the client of the code under test. |
| These tests are more resilient to benign changes. |
| |
| <span class="compare-worse">Not recommended</span>: Don’t test implementation |
| details that are not important to the client. Such tests often break when the |
| code under test changes in benign ways. |
| |
| Further reading: |
| |
| - [Change-Detector Tests Considered Harmful](https://testing.googleblog.com/2015/01/testing-on-toilet-change-detector-tests.html){:.external} |
| - [Prefer Testing Public APIs Over Implementation-Detail Classes](https://testing.googleblog.com/2015/01/testing-on-toilet-prefer-testing-public.html){:.external} |
| - [Test Behavior, Not Implementation](https://testing.googleblog.com/2013/08/testing-on-toilet-test-behavior-not.html){:.external} |
| - [Testing State vs. Testing Interactions](https://testing.googleblog.com/2013/03/testing-on-toilet-testing-state-vs.html){:.external} |
| |
| ## Write readable test code |
| |
| Consider readability as you write tests, the same as you do when you write the |
| code under test. |
| |
| - A test is **complete** if the body of the test contains all the information |
| you need to know in order to understand it. |
| - A test is **concise** if the test doesn’t contain any other distracting |
| information. |
| |
| <span class="compare-better">Recommended</span>: Write test cases that are |
| complete and concise. Prefer writing more test individual test cases, each with |
| a narrow focus on specific circumstances and concerns. |
| |
| <span class="compare-worse">Not recommended</span>: Don’t combine multiple |
| scenarios into fewer test cases in order to produce shorter tests with fewer |
| test cases. |
| |
| Further reading: |
| |
| - [What Makes a Good Test?](https://testing.googleblog.com/2014/03/testing-on-toilet-what-makes-good-test.html){:.external} |
| - [Keep Tests Focused](https://testing.googleblog.com/2018/06/testing-on-toilet-keep-tests-focused.html){:.external} |
| |
| ## Write reproducible, deterministic tests |
| |
| Tests should be deterministic, meaning every run of the test against the same |
| revision of code produces the same result. If not, the test may become costly |
| to maintain. |
| |
| Threaded or time-dependent code, random number generators (RNGs), and |
| cross-component communication are common sources of nondeterminism. |
| |
| <span class="compare-better">Recommended</span>: Use these tips to write |
| deterministic tests: |
| |
| - For time-dependent tests, use fake or mocked clocks to provide |
| determinism. See [`fuchsia_async::Executor::new_with_fake_time`] and |
| [fake-clock]. |
| - Threaded code must always use the proper synchronization primitives to |
| avoid flakes. Whenever possible, prefer single-threaded tests. |
| - Always provide a mechanism to inject seeds for RNGs and use them in |
| tests. |
| - Use mocks in component integration tests. See [Realm Builder][realm-builder]. |
| - When working with tests that are sensitive to flaky behavior, |
| consider running tests multiple times to ensure that they consistently pass. |
| You can use repeat flags, such as[`--gtest_repeat`][gtest_test_flags]{:.external} |
| in GoogleTest and [`--test.count`][go_test_flags]{:.external} in Go, to do this. |
| Aim for at least 100-1000 runs locally if your test is prone to flakes before |
| merging. |
| |
| <span class="compare-worse">Not recommended</span>: Never use `sleep` in tests |
| as a means of weak synchronization. You may use short sleeps when polling in a |
| loop, between loop iterations. |
| |
| ## Test doubles: stubs, mocks, fakes |
| |
| Test doubles stand in for a real dependency of the code under test during a |
| test. |
| |
| - A **stub** is a test double that returns a given value and contains no logic. |
| - A **mock** is a test double that has expectations about how it should be |
| called. Mocks are useful for testing interactions. |
| - A **fake** is a lightweight implementation of the real object. |
| |
| <span class="compare-better">Recommended</span>: Create fakes for code that you |
| own so that your clients can use that as a test double in their own tests. For |
| integration testing, consider making it possible to run an instance of your real |
| component in a test realm in isolation from the rest of the system, and document |
| this behavior. |
| |
| <span class="compare-worse">Not recommended</span>: Don’t overuse mocks in your |
| tests, as you might create lower-quality tests that are less readable and more |
| costly to maintain while providing less confidence when they pass. Avoid mocking |
| dependencies that you don’t own. |
| |
| Further reading: |
| |
| - [Know Your Test Doubles](https://testing.googleblog.com/2013/07/testing-on-toilet-know-your-test-doubles.html){:.external} |
| - [Don’t Overuse Mocks](https://testing.googleblog.com/2013/05/testing-on-toilet-dont-overuse-mocks.html){:.external} |
| - [Don’t Mock Types You Don’t Own](https://testing.googleblog.com/2020/07/testing-on-toilet-dont-mock-types-you.html){:.external} |
| |
| ## Use end-to-end tests appropriately |
| |
| <span class="compare-better">Recommended</span>: Use end-to-end tests to test |
| critical user journeys. Such tests should exercise the journey as a user, for |
| instance by automating user interactions and examining user interface state |
| changes. |
| |
| <span class="compare-worse">Not recommended</span>: Don’t use end-to-end tests |
| to cover for missing tests at other layers or smaller scopes, since when those |
| end-to-end tests catch errors they will be very difficult to troubleshoot. |
| |
| <span class="compare-better">Recommended</span>: Use end-to-end tests sparingly, |
| as part of a balanced testing strategy that leans more heavily on smaller-scoped |
| tests that run quickly and produce precise and actionable results. |
| |
| <span class="compare-worse">Not recommended</span>: Don’t rely on end-to-end |
| tests in your development feedback cycle, because they typically take a long |
| time to run and often produce more flaky results than smaller-scoped tests. |
| |
| Further reading: |
| |
| - [Testing UI Logic? Follow the User!](https://testing.googleblog.com/2020/10/testing-on-toilet-testing-ui-logic.html){:.external} |
| - [Just Say No to More End-to-End Tests](https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html){:.external} |
| - [Test Flakiness - One of the main challenges of automated testing](https://testing.googleblog.com/2020/12/test-flakiness-one-of-main-challenges.html){:.external} |
| - [Test Flakiness - One of the main challenges of automated testing (Part II)](https://testing.googleblog.com/2021/03/test-flakiness-one-of-main-challenges.html){:.external} |
| - [Avoiding Flakey Tests](https://testing.googleblog.com/2008/04/tott-avoiding-flakey-tests.html){:.external} |
| - [Where do our flaky tests come from?](https://testing.googleblog.com/2017/04/where-do-our-flaky-tests-come-from.html){:.external} |
| |
| [test-scope]: /docs/contribute/testing/scope.md |
| [testing-principles]: /docs/contribute/testing/principles.md |
| [realm-builder]: /docs/development/testing/components/realm_builder.md |
| [`fuchsia_async::Executor::new_with_fake_time`]: https://fuchsia.googlesource.com/fuchsia/+/a874276/src/lib/fuchsia-async/src/executor.rs#345 |
| [fake-clock]: https://fuchsia.googlesource.com/fuchsia/+/a874276/src/lib/fake-clock |
| [go_test_flags]: https://golang.org/cmd/go/#hdr-Testing_flags |
| [gtest_test_flags]: https://github.com/google/googletest/blob/main/docs/advanced.md#repeating-the-tests |