Status: Approved
Authors: payamm@google.com, rlb@google.com
Last Updated: 2020-01-30
Ensure that Zircon display drivers are robust, fast, and modifiable.
A robust driver is:
A fast driver is:
A modifiable driver:
The diagram below shows the overall structure of components involved in draw pixels on a given display. The components that are relevant to this document are colored in blue.
The display stack can be divided into two main components: Core Display and Device Driver. Core Display is the hardware-independent layer sitting between display clients and display drivers. Device driver is the actual driver that communicates with actual hardware to drive pixels onto a screen.
Each device driver can be further divided into “control” code that organizes independent hardware functions into useful features and “driver” code that programs those independent hardware functions.
Most software projects with good testing discipline organize tests into phases, with increasing cost and (typically) decreasing precision at each phase. All tests are a tradeoff in maintenance effort, accuracy (false-positive rates), and precision (size of code under test). Tests for drivers are no different.
That said, unit testing device drivers is notoriously hard because hardware has limited specifications and misbehaves in myriad ways, making reproducibility hard and simulation effectively impossible. If a test directly verifies that code is issuing the correct MMIO sequence, that test neither covers a large fraction of the failure modes nor does it continue to work in the presence of small changes to the codebase.
To address these concerns:
Only “control” code should be exercised in unit tests.
In addition to the MMIO/register poking work of a driver, displays and GPUs also have a large amount of code for OS functions. They are responsible for managing power, video modes, and OS resources (e.g. zx::event signaling). They also handle firmware loading, CPU-side state tracking, etc.
This “control” code is the source of many bugs and can benefit from unit tests with the accompanying ASAN/TSAN coverage. Separating this code from hardware interaction improves code coverage, makes tests deterministic, and tightens the feedback loop on most bugs.
We will follow the same strategy for the common display controller code in src/graphics/display/drivers/coordinator
.
Sometimes it is not possible to separate these two types of code, e.g. when testing self-contained hardware functionality like TLB management. In those cases, a test fixture can reset the hardware in between each test case. For now, this can be achieved by creating an in-driver test-suite that runs after Bind
but before MakeVisible
.
“Driver” code will be exercised in conformance tests running on target hardware.
src/graphics/display/drivers/coordinator/test
currently contains test fixtures and helper classes for exercising the display core and a driver. At the moment, only the fake-display
driver can be used.
In order to reduce the scope of tests and improve their accuracy and precision, we will create a conformance test suite in the core display controller that verifies that the display-impl is working correctly. This allows us to test display-core separately with high confidence.
Integration [...] tests will be used to ensure that drivers are not depending on undefined behavior.
Normally integration tests focus entirely on making sure that the component under test is using APIs correctly. A Fuchsia system is effectively a distributed system in a box, so integration tests have an additional function: FIDL and Banjo services can be used as points of fault injection.
Single-process integration tests are well-contained and thus offer a good starting point for aggregate performance tests. We can build benchmarks for common client workflows by profiling test execution and restricting samples to display driver code.
Stress tests are a form of integration test that is helpful for kernel-adjacent code with many complex interactions. Test accuracy must be weighed against test latency, but most tests are not optimal. Accuracy and latency can be improved at the same time by increasing the stresses per second -- deliberately injecting faults, process crashes, and load can turn a 5 minute test into an accurate release qualifier.
Concretely, the core display controller and the various display-impl drivers will be subjected to integration tests that delay messages and pretend that other processes have died or produced invalid inputs. Once there are established patterns, a shared set of FlakyFoo
classes will be created as testing fakes.
Resource leaks will be detected by introspecting the process during test shutdown.
End-to-end tests will verify that applications continue to work.
All of the aforementioned tests will provide high confidence, but there will be missed cases and imperfect tests. Here we list applications that are either directly involved in the Fuchsia UX or are simple enough to treat them as test cases for the whole graphics stack.
Inspect:
The manual tests above rely on human judgment or actuation to validate the stack. The large variety of target devices means we cannot rely on OEM-style camera captures. For now, we will not have automated end-to-end tests.
In the future, tests can be automated by using Chamelium.
Fuzz tests will ensure that drivers are robust to misbehaving clients.
TBD. Once there are fuzz tests for sysmem, we can build upon them. For now, there are some integration tests verifying that the display layer doesn't crash in the face of naive client mistakes.