| # Thread safe asynchronous code |
| |
| Writing correct asynchronous programs with multiple threads requires care in |
| C++. Here we describe a particular pattern that helps avoid errors, and which |
| will integrate well with the C++ FIDL bindings and component runtime. |
| |
| ## Background |
| |
| ### Asynchronous runtimes |
| |
| The [async][async-readme] library defines the *interface* |
| for initiating asynchronous operations on Fuchsia. It defines an opaque |
| `async_dispatcher_t` type, and associated functions. |
| |
| There are several *implementations* of this dispatcher interface. A popular one |
| is [`async_loop_t`][async-loop] and its C++ wrapper |
| [`async::Loop`][async-loop-cpp]. Libraries that performs asynchronous work |
| generally should not know what is the concrete implementation. Instead they |
| would call functions over the `async_dispatcher_t*` interface. |
| |
| ### Thread safety |
| |
| The reader should familiarize themselves with the terminology around |
| [thread safety][thread-safety] if needed. |
| |
| A program that upholds thread safety avoids data races: broadly, reading and |
| writing the same data without a defined ordering between those operations (see |
| precise definition of a [data race][data-race] in the C++ standard). These races |
| are a source of errors because they lead to undefined behavior at run-time. |
| |
| An individual C++ type also has categorizations around thread-safety. Referring |
| common practice interpretations from [abseil][abseil-thread-safety]: |
| |
| - A C++ object is *thread-safe* if concurrent usages does not cause data races. |
| - A C++ object is *thread-unsafe* if any concurrent usage may cause data races. |
| |
| One may wrap a thread-unsafe type with *synchronization primitives* e.g. mutexes |
| to make it thread-safe. This is called adding *external synchronization*. Doing |
| so adds overhead, and not all users will use that type concurrently. Hence it's |
| common for a library to be thread-unsafe by default, and require the user to add |
| synchronization if desired. Such types may have comments like the following: |
| |
| ```c++ |
| // This class is thread-unsafe. Methods require external synchronization. |
| class SomeUnsafeType { /* ... */ }; |
| ``` |
| |
| ## Achieving thread safety in asynchronous code |
| |
| Achieving thread safety gets more subtle in asynchronous code due to the |
| presence of callbacks. Consider the following snippet: |
| |
| ```c++ |
| // |CsvParser| asynchronously reads from a file, and parses the contents as |
| // comma separated values. |
| class CsvParser{ |
| public: |
| void Load() { |
| reader_.AsyncRead([this] (std::string data) { |
| values_ = Parse(data); |
| }); |
| } |
| |
| std::vector<std::string> Parse(const std::string& data); |
| |
| private: |
| FileReader reader_; |
| std::vector<std::string> values_; |
| }; |
| ``` |
| |
| `AsyncRead` will complete the work in the background, then call the lambda |
| specified as the callback function when the work completes. Because the lambda |
| captures `this`, it is commonly referred to as an "upcall": the `reader_` that |
| is owned by an instance of `CsvParser` makes a call to the owner. |
| |
| Let's consider how to avoid races between this callback and the destruction of |
| `CsvParser`. Adding a mutex in `CsvParser` won't help, because the mutex would |
| be destroyed if `CsvParser` is destroyed. One may require that `CsvParser` must |
| always be reference counted, but that results in an opinionated API and tends to |
| recursively cause everything referenced by `CsvParser` to also be reference |
| counted. |
| |
| If we ensure that there is always a defined ordering between the destruction of |
| `CsvParser` and the invocation of the callback, then the race condition is |
| avoided. On Fuchsia, the callback is typically scheduled on an |
| `async_dispatcher_t` object. A common pattern is to use a single threaded |
| dispatcher: |
| |
| - Use an `async::Loop` as the dispatcher implementation. |
| - Only run one thread to service the loop. |
| - Only destroy upcall targets on that thread. For example, destroy the |
| `CsvParser` within a task posted to that dispatcher. |
| |
| Since the same thread invokes asynchronous callbacks and destroys the instance, |
| there must be a defined ordering between those operations. |
| |
| This scenario is common across Fuchsia C++ because FIDL server components are |
| strongly encouraged to be concurrent and asynchronous. See |
| [C++ FIDL threading guide][cpp-threading-guide] for a concrete discussion of |
| this scenario when using FIDL bindings. |
| |
| ### Sequences |
| |
| More generally, if a dispatcher promises that tasks posted on that dispatcher |
| always run with a defined ordering, it is safe to destroy upcall targets on a |
| dispatcher task, and synchronization with upcalls is guaranteed. Such |
| dispatchers are said to support *sequences*: sequential execution domains which |
| runs a series of tasks with strict mutual exclusion, but where the underlying |
| execution may hop from one thread to another. |
| |
| Synchronized driver dispatchers ([`fdf::Dispatcher`][fdf-dispatcher] created |
| with the `FDF_DISPATCHER_OPTION_SYNCHRONIZED` option) are an example of sequence |
| supporting dispatchers. |
| |
| When using dispatchers supporting sequences, a common pattern for ensuring |
| thread safety is to use the object from a single sequence. |
| |
| ### Enforce with runtime checks {#mutual-exclusion-guarantee} |
| |
| We provide libraries for enforcing the above common patterns at runtime. In |
| particular, thread-unsafe types may check that an instance is always used from |
| an asynchronous dispatcher that ensures ordering between operations. Here "used |
| from" covers construction, destruction, and calling instance methods. We call |
| this *mutual exclusion guarantee*. Specifically: |
| |
| - If the `async_dispatcher_t` supports [*sequences*](#sequences), then code |
| running on tasks posted to that dispatcher are ordered with one another. |
| - If the `async_dispatcher_t` does not support sequences, then code running on |
| tasks posted to that dispatcher are ordered if that dispatcher is only |
| serviced by a single thread, for example, a single-threaded `async::Loop`. |
| |
| In short, either the dispatcher supports sequences in which case the object must |
| be used on that sequence, or the code runs on a single dispatcher thread and the |
| object must be used on that thread. |
| |
| The async library offers a BasicLockable type, |
| [`async::synchronization_checker`](/zircon/system/ulib/async/include/lib/async/cpp/sequence_checker.h). |
| You may call `.lock()` or lock the checker using a `std::lock_guard` whenever a |
| function requires mutual exclusion. Doing so checks that the function is called |
| from a dispatcher with such a guarantee, without actually taking any locks. Here |
| is a full example: |
| |
| ```cpp |
| {% includecode gerrit_repo="fuchsia/fuchsia" gerrit_path="examples/cpp/synchronization_checker/main.cc" region_tag="synchronization_checker" adjust_indentation="auto" exclude_regexp="^TEST|^}" %} |
| ``` |
| |
| `fidl::Client` is another example of types that check for mutual exclusion |
| guarantee at runtime: destroying a `fidl::Client` on a non-dispatcher thread |
| will lead to a panic. |
| |
| ### Discard callbacks during destruction |
| |
| You may have noticed that for the `ChannelReader` example above to work, the |
| callback passed to `wait_.Begin(...)` must be silently discarded, instead of |
| called with some error, if `ChannelReader` is destroyed. Indeed the |
| [documentation][async-wait] on `async::WaitOnce` mentions that it "automatically |
| cancels the wait when it goes out of scope". |
| |
| During destruction, some C++ objects would discard the registered callbacks if |
| those have yet to be called. These kind of APIs are said to guarantee *at most |
| once delivery*. `async::Wait` and `async::Task` are examples of such objects. |
| This style works well when the callback references a single receiver that owns |
| the wait/task, i.e. the callback is an upcall. These APIs are typically also |
| thread-unsafe and requires the aforementioned mutual exclusion guarantee. |
| |
| Other objects will always call the the registered callback exactly once, even |
| during destruction. Those calls would typically provide an error or status |
| indicating cancellation. They are said to guarantee *exactly once delivery*. |
| |
| One should consult the corresponding documentation when using an |
| asynchronous API to understand the cancellation semantics. |
| |
| It is possible to convert an *exactly once* API into an *at most once* API by |
| discarding the upcall if the object making the upcalls is already destroyed. |
| [`closure-queue`][closure-queue] is a library that implements this idea; |
| destroying a `ClosureQueue` will discard unexecuted callbacks scheduled on that |
| queue. |
| |
| ### Use an object with different mutual exclusion requirements |
| |
| To maintain the mutual exclusion guarantees, one may manage and use a group of |
| objects on the same sequence (if supported) or single threaded dispatcher. Those |
| objects can synchronously call into one another without breaking the mutual |
| exclusion runtime checks. A special case of this is an application that runs |
| everything on a single `async::Loop` with a single thread, typically called the |
| main thread. |
| |
| More complex applications may have multiple sequences or multiple single |
| threaded dispatchers. When individual objects must be used from their |
| corresponding sequence or single threaded dispatcher, a question arises: how |
| does one object call another object if they are associated with different |
| dispatchers? |
| |
| A time-tested approach is to have the objects send messages between one another, |
| as opposed to synchronously calling their instance methods. Concretely, this |
| could mean that if object `A` needs to do something to object `B`, `A` would |
| post an asynchronous task to `B`'s dispatcher using `async::PostTask`. The task |
| (usually a lambda function) may then synchronously use `B` because it is already |
| running under `B`'s mutual exclusion guarantee. |
| |
| When tasks are posted to a different dispatcher, it's harder to safely discard |
| them when the receiver object goes out of scope. Here are some approaches: |
| |
| - One may shutdown the dispatcher before destroying the object, if that |
| dispatcher serves exactly that object. For example, `B` may own an |
| `async::Loop` as the last member field. When `B` destructs, the `async::Loop` |
| would be destroyed, which silently discards any unexecuted tasks posted to |
| `B`. |
| - One may reference count the objects, and pass a weak pointer to the posted |
| task. The posted task should do nothing if the pointer is expired. |
| |
| Golang is a popular [example][golang] that baked this principle into their |
| language design. |
| |
| ## Prior arts |
| |
| Lightweight mechanisms of ensuring a set of tasks execute one after the other, |
| without necessarily starting operating system threads, is a recurring theme: |
| |
| - The Chromium project defines a similar sequence concept: [Threading and |
| tasks in Chrome][chrome]. |
| - The Java Platform added [virtual threads][java]. |
| |
| [async-readme]: /zircon/system/ulib/async/README.md |
| [async-loop]: /zircon/system/ulib/async-loop/include/lib/async-loop/loop.h |
| [async-loop-cpp]: /zircon/system/ulib/async-loop/include/lib/async-loop/cpp/loop.h |
| [async-wait]: /zircon/system/ulib/async/include/lib/async/cpp/wait.h |
| [fdf-dispatcher]: /sdk/lib/driver/runtime/include/lib/fdf/cpp/dispatcher.h |
| [thread-safety]: https://en.wikipedia.org/wiki/Thread_safety |
| [data-race]: http://eel.is/c++draft/intro.races#21 |
| [abseil-thread-safety]: https://abseil.io/blog/20180531-regular-types#data-races-and-thread-safety-properties |
| [cpp-threading-guide]: /docs/development/languages/fidl/tutorials/cpp/topics/threading.md |
| [closure-queue]: /zircon/system/ulib/closure-queue/include/lib/closure-queue/closure_queue.h |
| [chrome]: https://chromium.googlesource.com/chromium/src/+/master/docs/threading_and_tasks.md |
| [java]: https://openjdk.org/jeps/425 |
| [golang]: https://go.dev/blog/codelab-share |