| # Rust stress test library |
| |
| This document explains how to write a [stress test](stress_tests.md) using the rust stress test |
| library. The library can be found at `//src/sys/lib/stress-test`. It implements the test loop and |
| the concurrency and synchronization primitives required for running these tests. |
| |
| ### Writing a stress test |
| |
| #### Define the GN build targets |
| |
| Define `rustc_binary`, `fuchsia_component` and `fuchsia_test_package` GN build targets |
| for your test: |
| |
| ``` |
| rustc_binary("filesystem-stressor-bin") { |
| deps = [ |
| ... |
| "//src/sys/lib/stress-test", |
| ... |
| ] |
| sources = [ |
| ... |
| ] |
| } |
| |
| fuchsia_component("filesystem-stressor") { |
| deps = [ ":filesystem-stressor-bin" ] |
| manifest = "meta/filesystem-stressor.cml" |
| testonly = true |
| } |
| |
| fuchsia_test_package("filesystem-stress-tests") { |
| test_components = [ |
| ":filesystem-stressor" |
| ] |
| } |
| ``` |
| |
| #### Write an actor |
| |
| Every actor must implement the `Actor` trait. The actor trait is one method `perform()` that is |
| invoked by an `ActorRunner`. When invoked, the actor must perform exactly one operation and return |
| its result to the runner. An actor must stores all the connections necessary to perform operations. |
| |
| ```rust |
| pub trait Actor: Sync + Send + 'static { |
| // ActorRunner invokes this function, instructing the actor |
| // to perform exactly one operation and return result. |
| async fn perform(&mut self) -> Result<(), ActorError>; |
| } |
| ``` |
| |
| An actor can indicate the following with the return result: |
| |
| * `Ok(())`: Operation succeeded and is added to the global operation count. |
| |
| * `Err(ActorError::DoNotCount)`: The operation must not be counted towards the global operation |
| count. |
| |
| * `Err(ActorError::ResetEnvironment)`: The environment must be reset and the operation must not be |
| counted towards the global operation count. |
| |
| When an actor encounters an unexpected error, it should panic, thus stopping the test. |
| |
| Since actors are operating on the same environment, it is possible that their operations will |
| collide. For example, for a filesystem stress test, actors may operate on the same set of files. |
| If such collisions are desirable, you must setup actors to handle such collisions gracefully. |
| If not, the actor should panic, causing the test to stop. |
| |
| An actor can intentionally break the system-under-test, requiring the environment to be reset. For |
| example, for a filesystem stress test, an actor can randomly sever the connection between the |
| filesystem and the underlying block device. In this example, other actors should request a new |
| environment with `ActorError::ResetEnvironment`, and the environment will re-establish connections |
| for all of the actors. |
| |
| Note: The mutable connections of the actor should be marked public, so that the environment can |
| update them during reset. |
| |
| ```rust |
| pub struct FilesystemActor { |
| /// Store a connection to the root of filesystem here |
| pub root_directory: Directory |
| ... |
| } |
| |
| impl FilesystemActor { |
| pub fn new(root_directory: Directory) -> Self { |
| ... |
| } |
| } |
| |
| #[async_trait] |
| impl Actor for FilesystemActor { |
| async pub fn perform(&mut self) -> Result<(), ActorError> { |
| // Choose exactly one operation to do on the filesystem |
| // using the root_directory |
| self.root_directory.delete_all_files(); |
| } |
| } |
| ``` |
| |
| #### Write an Environment |
| |
| The environment provides the basic configuration for the stress test - the exit criteria, |
| the actors and a reset method. |
| |
| ```rust |
| pub trait Environment: Send + Sync + Debug { |
| /// Returns the target number of operations to complete before exiting |
| fn target_operations(&self) -> Option<u64>; |
| |
| /// Returns the number of seconds to wait before exiting |
| fn timeout_seconds(&self) -> Option<u64>; |
| |
| /// Return the runners for all the actors |
| async fn actor_runners(&mut self) -> Vec<ActorRunner>; |
| |
| /// Reset the environment, when an actor requests one |
| async fn reset(&mut self); |
| } |
| ``` |
| |
| An environment can store additional configuration for the test. You can provide this configuration |
| through the command line with the `argh` crate. |
| |
| An actor is shared between a runner and the environment and hence it must be wrapped as an |
| `Arc<Mutex<dyn Actor>>`. Runners hold the lock while an actor is performing an operation. |
| This means that the environment can only acquire an actor's lock between operations. |
| |
| An environment is instructed to reset when an actor determines that the current instance of the |
| system-under-test has been broken. The environment is expected to create a new instance for the |
| system-under-test and lock on the actors to update their connections to the new instance. |
| |
| The environment must also implement the `Debug` trait. Stress tests log the environment |
| when the test starts and if the test panics. It is common practice to print out parameters that are |
| valuable for reproducing the test, such as the random seed used. |
| |
| ```rust |
| #[derive(Debug)] |
| pub struct FilesystemEnvironment { |
| fs_actor: Arc<Mutex<FilesystemActor>>, |
| seed: u64, |
| ... |
| } |
| |
| impl Environment { |
| pub fn new() -> Self { |
| ... |
| } |
| } |
| |
| #[async_trait] |
| impl Environment for FilesystemEnvironment { |
| fn target_operations(&self) -> Option<u64> { |
| // By specifying None here, the test will run without an operation limit |
| None |
| } |
| |
| fn timeout_seconds(&self) -> Option<u64> { |
| // By specifying None here, the test will run without a time limit |
| None |
| } |
| |
| async fn actor_runners(&mut self) -> Vec<ActorRunner> { |
| vec![ |
| ActorRunner::new( |
| "filesystem_actor", // debug name |
| 60, // delay (in seconds) between each operation (0 means no delay) |
| self.fs_actor.clone()), // actor |
| ) |
| ] |
| } |
| |
| async fn reset(&mut self) { |
| // If the actor is performing an operation, this will remain |
| // locked until the operation is complete. |
| let actor = self.fs_actor.lock().await; |
| |
| // Now the environment can update the actor before it is run again. |
| actor.root_directory = ...; |
| |
| // Releasing the lock will resume the runner. |
| } |
| } |
| ``` |
| |
| #### Write the main function |
| |
| The main function of a stress test is straightforward, since most of the logic is |
| implemented in the Environment and Actors. Use the main function to collect command-line |
| arguments (if any), initialize logging and set log severity. |
| |
| ```rust |
| #[fuchsia::main] |
| async fn main() { |
| // Create the environment |
| let env = FilesystemEnvironment::new(); |
| |
| // Run the test. |
| // Depending on the exit criteria, this may never return. |
| stress_test::run_test(env).await; |
| } |
| ``` |
| |
| #### Running stress tests locally |
| |
| Since a stress test is a part of a `fuchsia_test_package`, one of the easiest ways to run it |
| is with the `fx test` command: |
| |
| ``` |
| fx test filesystem-stress-tests |
| ``` |
| |
| Note: The stress test runs with the command line arguments defined in the component's manifest. |
| |
| To run the test with custom command line arguments, use `fx shell run`: |
| |
| ``` |
| fx shell run fuchsia-pkg://fuchsia.com/filesystem-stress-tests#meta/filesystem-stressor.cm <args> |
| ``` |
| |
| #### Running stress tests on infrastructure |
| |
| A stress test is identified by infrastructure through the `stress-tests` tag that is attached to |
| the `fuchsia_test_package` or `fuchsia_unittest_package` GN Build Target. |
| |
| ``` |
| fuchsia_test_package("filesystem-stress-tests") { |
| test_components = [ |
| ":filesystem-stressor" |
| ] |
| test_specs = { |
| environments = [ |
| { |
| dimensions = { |
| device_type = "QEMU" |
| } |
| tags = [ "stress-tests" ] |
| }, |
| ] |
| } |
| } |
| ``` |
| |
| A dedicated `core.x64-stress` builder identifies these tests and runs each test component in the |
| package for a maximum of 22 hours. |
| |
| Note: On infra bots, a stress test is required to show "signs of life" which is usually some form of |
| output to show that the test is still running and has not hung. |
| |
| Note: Stress tests are currently restricted to the `QEMU` device type, since they run for long |
| periods of time. |
| |
| |
| ## Debugging a stress test |
| |
| The framework uses the rust `log` crate to log messages. The test logs the environment object at |
| start and if the test panics. |
| |
| ``` |
| --------------------- stressor is starting ----------------------- |
| Environment { |
| seed: 268479717856254664270968796173957499835, |
| filesystem_actor: { ... } |
| ... |
| } |
| ------------------------------------------------------------------ |
| ``` |
| |
| If debug logging is enabled, individual actor operations and operation counts are also logged. |
| |
| ``` |
| DEBUG: [0][filesystem_actor][389] Sleeping for 2 seconds |
| DEBUG: [0][filesystem_actor][389] Performing... |
| DEBUG: [0][filesystem_actor][389] Done! |
| DEBUG: Counters -> [total:403] {"filesystem_actor": 403} |
| ``` |