This document explains how to write a stress test using the rust stress test library. The library can be found at //src/sys/lib/stress-test
. It implements the test loop and the concurrency and synchronization primitives required for running these tests.
Define rustc_binary
, fuchsia_component
and fuchsia_test_package
GN build targets for your test:
rustc_binary("filesystem-stressor-bin") { deps = [ ... "//src/sys/lib/stress-test", ... ] sources = [ ... ] } fuchsia_component("filesystem-stressor") { deps = [ ":filesystem-stressor-bin" ] manifest = "meta/filesystem-stressor.cmx" testonly = true } fuchsia_test_package("filesystem-stress-tests") { test_components = [ ":filesystem-stressor" ] }
Every actor must implement the Actor
trait. The actor trait is one method perform()
that is invoked by an ActorRunner
. When invoked, the actor must perform exactly one operation and return its result to the runner. An actor must stores all the connections necessary to perform operations.
pub trait Actor: Sync + Send + 'static { // ActorRunner invokes this function, instructing the actor // to perform exactly one operation and return result. async fn perform(&mut self) -> Result<(), ActorError>; }
An actor can indicate the following with the return result:
Ok(())
: Operation succeeded and is added to the global operation count.
Err(ActorError::DoNotCount)
: The operation must not be counted towards the global operation count.
Err(ActorError::ResetEnvironment)
: The environment must be reset and the operation must not be counted towards the global operation count.
When an actor encounters an unexpected error, it should panic, thus stopping the test.
Since actors are operating on the same environment, it is possible that their operations will collide. For example, for a filesystem stress test, actors may operate on the same set of files. If such collisions are desirable, you must setup actors to handle such collisions gracefully. If not, the actor should panic, causing the test to stop.
An actor can intentionally break the system-under-test, requiring the environment to be reset. For example, for a filesystem stress test, an actor can randomly sever the connection between the filesystem and the underlying block device. In this example, other actors should request a new environment with ActorError::ResetEnvironment
, and the environment will re-establish connections for all of the actors.
Note: The mutable connections of the actor should be marked public, so that the environment can update them during reset.
pub struct FilesystemActor { /// Store a connection to the root of filesystem here pub root_directory: Directory ... } impl FilesystemActor { pub fn new(root_directory: Directory) -> Self { ... } } #[async_trait] impl Actor for FilesystemActor { async pub fn perform(&mut self) -> Result<(), ActorError> { // Choose exactly one operation to do on the filesystem // using the root_directory self.root_directory.delete_all_files(); } }
The environment provides the basic configuration for the stress test - the exit criteria, the actors and a reset method.
pub trait Environment: Send + Sync + Debug { /// Returns the target number of operations to complete before exiting fn target_operations(&self) -> Option<u64>; /// Returns the number of seconds to wait before exiting fn timeout_seconds(&self) -> Option<u64>; /// Return the runners for all the actors fn actor_runners(&mut self) -> Vec<ActorRunner>; /// Reset the environment, when an actor requests one async fn reset(&mut self); }
An environment can store additional configuration for the test. You can provide this configuration through the command line with the argh
crate.
An actor is shared between a runner and the environment and hence it must be wrapped as an Arc<Mutex<dyn Actor>>
. Runners hold the lock while an actor is performing an operation. This means that the environment can only acquire an actor's lock between operations.
An environment is instructed to reset when an actor determines that the current instance of the system-under-test has been broken. The environment is expected to create a new instance for the system-under-test and lock on the actors to update their connections to the new instance.
The environment must also implement the Debug
trait. Stress tests log the environment when the test starts and if the test panics. It is common practice to print out parameters that are valuable for reproducing the test, such as the random seed used.
#[derive(Debug)] pub struct FilesystemEnvironment { fs_actor: Arc<Mutex<FilesystemActor>>, seed: u64, ... } impl Environment { pub fn new() -> Self { ... } } #[async_trait] impl Environment for FilesystemEnvironment { fn target_operations(&self) -> Option<u64> { // By specifying None here, the test will run without an operation limit None } fn timeout_seconds(&self) -> Option<u64> { // By specifying None here, the test will run without a time limit None } fn actor_runners(&mut self) -> Vec<ActorRunner> { vec![ ActorRunner::new( "filesystem_actor", // debug name 60, // delay (in seconds) between each operation (0 means no delay) self.fs_actor.clone()), // actor ) ] } async fn reset(&mut self) { // If the actor is performing an operation, this will remain // locked until the operation is complete. let actor = self.fs_actor.lock().await; // Now the environment can update the actor before it is run again. actor.root_directory = ...; // Releasing the lock will resume the runner. } }
The main function of a stress test is straightforward, since most of the logic is implemented in the Environment and Actors. Use the main function to collect command-line arguments (if any), initialize logging and set log severity.
Note: The stress test library offers a StdoutLogger
that prints all logs to stdout. This functionality can be used by any stress test that runs as a legacy (cmx) component.
#[fuchsia_async::run_singlethreaded] async fn main() { // Print all logs to stdout. stress_test::StdoutLogger::init(); // Create the environment let env = FilesystemEnvironment::new(); // Run the test. // Depending on the exit criteria, this may never return. stress_test::run_test(env).await; }
Since a stress test is a part of a fuchsia_test_package
, one of the easiest ways to run it is with the fx test
command:
fx test filesystem-stress-tests
Note: The stress test runs with the command line arguments defined in the component's manifest.
To run the test with custom command line arguments, use fx shell run
:
fx shell run fuchsia-pkg://fuchsia.com/filesystem-stress-tests#meta/filesystem-stressor.cmx <args>
A stress test is identified by infrastructure through the stress-tests
tag that is attached to the fuchsia_test_package
or fuchsia_unittest_package
GN Build Target.
fuchsia_test_package("filesystem-stress-tests") { test_components = [ ":filesystem-stressor" ] test_specs = { environments = [ { dimensions = { device_type = "QEMU" } tags = [ "stress-tests" ] }, ] } }
A dedicated core.qemu-x64-stress
builder identifies these tests and runs each test component in the package for a maximum of 22 hours.
Note: On infra bots, a stress test is required to show “signs of life” which is usually some form of output to show that the test is still running and has not hung.
Note: Stress tests are currently restricted to the QEMU
device type, since they run for long periods of time.
The framework uses the rust log
crate to log messages. The test logs the environment object at start and if the test panics.
--------------------- stressor is starting ----------------------- Environment { seed: 268479717856254664270968796173957499835, filesystem_actor: { ... } ... } ------------------------------------------------------------------
If debug logging is enabled, individual actor operations and operation counts are also logged.
DEBUG: [0][filesystem_actor][389] Sleeping for 2 seconds DEBUG: [0][filesystem_actor][389] Performing... DEBUG: [0][filesystem_actor][389] Done! DEBUG: Counters -> [total:403] {"filesystem_actor": 403}