blob: 727acd095f73a6651fda7dea1ecdcdcb5dcd323c [file] [log] [blame] [view]
# Thread safe asynchronous code
Writing correct asynchronous programs with multiple threads requires care in
C++. Here we describe a particular pattern that helps avoid errors, and which
will integrate well with the C++ FIDL bindings and component runtime.
## Background
### Asynchronous runtimes
The [async][async-readme] library defines the *interface*
for initiating asynchronous operations on Fuchsia. It defines an opaque
`async_dispatcher_t` type, and associated functions.
There are several *implementations* of this dispatcher interface. A popular one
is [`async_loop_t`][async-loop] and its C++ wrapper
[`async::Loop`][async-loop-cpp]. Libraries that performs asynchronous work
generally should not know what is the concrete implementation. Instead they
would call functions over the `async_dispatcher_t*` interface.
### Thread safety
The reader should familiarize themselves with the terminology around
[thread safety][thread-safety] if needed. See [CppCon 2018: Geoffrey Romer
“What do you mean "thread-safe"?”] for a definition of thread safety that is
endorsed by Google's C++ team.
A program that upholds thread safety avoids data races: broadly, reading and
writing the same data without a defined ordering between those operations (see
precise definition of a [data race][data-race] in the C++ standard). These races
are a source of errors because they lead to undefined behavior at run-time.
An individual C++ type also has categorizations around thread-safety. Referring
common practice interpretations from [abseil][abseil-thread-safety]:
- A C++ object is *thread-safe* if concurrent usages does not cause data races.
- A C++ object is *thread-unsafe* if any concurrent usage may cause data races.
One may wrap a thread-unsafe type with *synchronization primitives* e.g. mutexes
to make it thread-safe. This is called adding *external synchronization*. Doing
so adds overhead, and not all users will use that type concurrently. Hence it's
common for a library to be thread-unsafe by default, and require the user to add
synchronization if desired. Such types may have comments like the following:
```c++
// This class is thread-unsafe. Methods require external synchronization.
class SomeUnsafeType { /* ... */ };
```
## Thread safety in asynchronous code
Achieving thread safety gets more subtle in asynchronous code due to the
presence of callbacks. Consider the following snippet:
```c++
// |CsvParser| asynchronously reads from a file, and parses the contents as
// comma separated values.
class CsvParser {
public:
void Load() {
reader_.AsyncRead([this] (std::string data) {
values_ = Parse(data);
});
}
std::vector<std::string> Parse(const std::string& data);
private:
FileReader reader_;
std::vector<std::string> values_;
};
```
`AsyncRead` will complete the work in the background, then call the lambda
specified as the callback function when the work completes. Because the lambda
captures `this`, it is commonly referred to as an "upcall": the `reader_` that
is owned by an instance of `CsvParser` makes a call to the owner.
Let's consider how to avoid races between this callback and the destruction of
`CsvParser`. Adding a mutex in `CsvParser` won't help, because the mutex would
be destroyed if `CsvParser` is destroyed. One may require that `CsvParser` must
always be reference counted, but that results in an opinionated API and tends to
recursively cause everything referenced by `CsvParser` to also be reference
counted.
If we ensure that there is always a defined ordering between the destruction of
`CsvParser` and the invocation of the callback, then the race condition is
avoided. On Fuchsia, the callback is typically scheduled on an
`async_dispatcher_t` object (termed *dispatcher* in short). A common pattern is
to use a single threaded dispatcher:
- Use an `async::Loop` as the dispatcher implementation.
- Only run one thread to service the loop.
- Only destroy upcall targets on that thread, and cancel future upcalls at the
same time. For example, destroy the `CsvParser` within a task posted to that
dispatcher.
Since the same thread invokes asynchronous callbacks and destroys the instance,
there must be a defined ordering between those operations.
The general case of the above pattern is to ensure *synchronized access*: every
access (including construction and destruction) of an object will observe the
side-effects of previous accesses. In other literature about threading, you may
see the term *synchronized access* always associated with locking, for example
taking a mutex lock before accessing an object. In Fuchsia C++, locks alone
would not be sufficient as discussed above, and we use properties of the async
dispatcher to achieve synchronized access, such that user code does not have to
take locks. The next section will go into detail.
## Synchronized dispatchers {#synchronized-dispatcher}
A *synchronized dispatcher* is one where posted tasks are run in order, and each
task will observe the side-effects from previous tasks.
Because objects dealing with asynchronous logic are accessed from dispatchers,
one cannot also access the object from arbitrary threads, as the dispatcher
might be concurrently accessing the same object, resulting in data races. In
fact, one must always access the object from a single dispatcher associated with
that object. The dispatcher must also ensure ordering between operations. We
call such dispatchers *synchronized dispatchers*. There are two ways for a
dispatcher to qualify as *synchronized*:
### Support sequences
A dispatcher may promise that tasks posted on that dispatcher always run with a
strict ordering. Such dispatchers are said to support *sequences*: sequential
execution domains which runs a series of tasks where one task will observe all
side-effects from previous tasks, but where the underlying execution may hop
from one thread to another.
Synchronized driver dispatchers (e.g.
[`fdf::SynchronizedDispatcher`][fdf-dispatcher]) are an example of dispatchers
that support sequences. See [driver threading model][driver-threading-model]. On
the other hand, `async::Loop` does not support sequences, as the user may call
`Loop::StartThread` many times to introduce multiple threads that race to
execute tasks in that loop.
### Stay single threaded
If the dispatcher does not support sequences, then code running on
tasks posted to that dispatcher are ordered if that dispatcher is only serviced
by a single thread, for example, a single-threaded `async::Loop`.
In summary, either the dispatcher supports sequences in which case the object
must be used on that sequence, or the code runs on a single dispatcher thread
and the object must be used on that thread. *Use* covers construction,
destruction, and calling instance methods.
Synchronized dispatchers are a unit of [concurrency][concurrency]: tasks posted
to the same synchronized dispatcher are never run concurrently alongside one
another. Tasks posted to different synchronized dispatchers may potentially run
concurrently alongside one another.
### Check for synchronized dispatchers {#check-synchronized}
The `async` library offers a `BasicLockable` type,
[`async::synchronization_checker`][synchronization-checker]. You may call
`.lock()` or lock the checker using a `std::lock_guard` whenever a function
requires *synchronized access*. Doing so checks that the function is called from
a dispatcher with such a guarantee, without actually taking any locks. If the
check fails, the program will panic. It is recommended that thread-unsafe types
check for synchronization at runtime by carrying a checker. Here is a full
example:
```cpp
{% includecode gerrit_repo="fuchsia/fuchsia" gerrit_path="examples/cpp/synchronization_checker/main.cc" region_tag="synchronization_checker" adjust_indentation="auto" exclude_regexp="^TEST|^}" %}
```
`fidl::Client` is another example of types that check for synchronized access at
runtime: destroying a `fidl::Client` on a non-dispatcher thread will lead to a
panic. There are other C++ classes in the Fuchsia code base that do the same.
They will usually highlight this with a comment such as the following:
```c++
// This class is thread-unsafe. Instances must be used and managed from a
// synchronized dispatcher.
class SomeAsyncType { /* ... */ };
```
See [C++ FIDL threading guide][cpp-threading-guide] for a concrete discussion of
this scenario when using FIDL bindings.
### Discard callbacks during destruction
You may have noticed that for the `ChannelReader` example above to work, the
callback passed to `wait_.Begin(...)` must be silently discarded, instead of
called with some error, if `ChannelReader` is destroyed. Indeed the
[documentation][async-wait] on `async::WaitOnce` mentions that it "automatically
cancels the wait when it goes out of scope".
During destruction, some C++ objects would discard the registered callbacks if
those have yet to be called. These kind of APIs are said to guarantee *at most
once delivery*. `async::Wait` and `async::Task` are examples of such objects.
This style works well when the callback references a single receiver that owns
the wait/task, i.e. the callback is an upcall. These APIs are typically also
thread-unsafe and requires the aforementioned *synchronized access*.
Other objects will always call the the registered callback exactly once, even
during destruction. Those calls would typically provide an error or status
indicating cancellation. They are said to guarantee *exactly once delivery*.
One should consult the corresponding documentation when using an
asynchronous API to understand the cancellation semantics.
It is possible to convert an *exactly once* API into an *at most once* API by
discarding the upcall if the object making the upcalls is already destroyed.
[`closure-queue`][closure-queue] is a library that implements this idea;
destroying a `ClosureQueue` will discard unexecuted callbacks scheduled on that
queue.
### Use an object belonging to a different synchronized dispatcher
To maintain synchronized access, one may manage and use a group of objects on
the same synchronized dispatcher. Those objects can synchronously call into one
another without breaking the synchronization checks. A special case of this is
an application that runs everything on a single `async::Loop` with a single
thread, typically called the "main thread".
More complex applications may have multiple synchronized dispatchers. When
individual objects must be used from their corresponding synchronized
dispatcher, a question arises: how does one object call another object if they
are associated with different dispatchers?
A time-tested approach is to have the objects send messages between one another,
as opposed to synchronously calling their instance methods. Concretely, this
could mean that if object `A` needs to do something to object `B`, `A` would
post an asynchronous task to `B`'s dispatcher. The task (usually a lambda
function) may then synchronously use `B` because it is already running under
`B`'s dispatcher and will be synchronized with other tasks that use `B`.
When tasks are posted to a different dispatcher, it's harder to safely discard
them when the receiver object goes out of scope. Here are some approaches:
- One may use [`async_patterns::DispatcherBound`][dispatcher-bound] to both own
and make calls to a child object that lives on a different synchronized
dispatcher.
- One may use [`async_patterns::Receiver`][receiver] to let other objects make
calls on their objects, without forcing an ownership relationship. The calls
are silently canceled if the receiver is destroyed.
<!-- TODO(fxbug.dev/119641): Document other async_patterns helpers when they
land. -->
- One may reference count the objects, and pass a weak pointer to the posted
task. The posted task should do nothing if the pointer is expired.
Golang is a popular [example][golang] that baked this principle into their
language design.
## Prior arts
Lightweight mechanisms of ensuring a set of tasks execute one after the other,
without necessarily starting operating system threads, is a recurring theme:
- The Chromium project defines a similar sequence concept: [Threading and
tasks in Chrome][chrome].
- The Java Platform added [virtual threads][java].
[async-readme]: /zircon/system/ulib/async/README.md
[async-loop]: /zircon/system/ulib/async-loop/include/lib/async-loop/loop.h
[async-loop-cpp]: /zircon/system/ulib/async-loop/include/lib/async-loop/cpp/loop.h
[async-wait]: /zircon/system/ulib/async/include/lib/async/cpp/wait.h
[concurrency]: https://slikts.github.io/concurrency-glossary/?id=concurrent-order-independent-vs-sequential
[dispatcher-bound]: /sdk/lib/async_patterns/cpp/dispatcher_bound.h
[receiver]: /sdk/lib/async_patterns/cpp/receiver.h
[driver-threading-model]: /docs/concepts/drivers/driver-dispatcher-and-threads.md#threading-model
[fdf-dispatcher]: /sdk/lib/driver/runtime/include/lib/fdf/cpp/dispatcher.h
[thread-safety]: https://en.wikipedia.org/wiki/Thread_safety
[data-race]: http://eel.is/c++draft/intro.races#21
[abseil-thread-safety]: https://abseil.io/blog/20180531-regular-types#data-races-and-thread-safety-properties
[cpp-threading-guide]: /docs/development/languages/fidl/tutorials/cpp/topics/threading.md
[closure-queue]: /zircon/system/ulib/closure-queue/include/lib/closure-queue/closure_queue.h
[chrome]: https://chromium.googlesource.com/chromium/src/+/master/docs/threading_and_tasks.md
[java]: https://openjdk.org/jeps/425
[golang]: https://go.dev/blog/codelab-share
[synchronization-checker]: /zircon/system/ulib/async/include/lib/async/cpp/sequence_checker.h
[CppCon 2018: Geoffrey Romer “What do you mean "thread-safe"?”]: https://youtube.com/watch?v=s5PCh_FaMfM