| # LLCPP threading guide |
| |
| LLCPP bindings accommodate a diverse set of threading models. Depending on the |
| architecture of your application, there are different classes and usage styles |
| to choose from. This document covers the tools and techniques to use FIDL in |
| non-trivial threading environments. |
| |
| ## Background: life of a FIDL connection |
| |
| Within the lifetime of a FIDL connection, these occurrences are significant from |
| the perspective of thread-safety and preventing use-after-free: |
| |
| {: width="80%"} |
| |
| - **To-binding calls**: these are calls made by user code on a FIDL messaging |
| object, i.e. inbound from the perspective of the FIDL runtime. For example: |
| |
| - Making a FIDL method call on a client is a to-binding call. |
| - Making a reply from a server implementation using completers is also a |
| to-binding call. |
| |
| - **To-user calls**: these are calls made by the FIDL runtime on user objects |
| (including callbacks provided by the user), i.e. outbound from the perspective |
| of the FIDL runtime. For example: |
| |
| - A server message dispatcher invoking FIDL method handlers on a server |
| implementation are to-user calls. |
| - A FIDL client delivering the response to a two-way FIDL method to the user |
| via a callback is also a to-user call. |
| - Error handlers are also to-user calls. |
| |
| To-user calls are also sometimes called "upcalls" since the user objects are |
| one layer above the FIDL bindings from the bindings' perspective. |
| |
| - **Teardown**: actions that stop the message dispatch. In particular, when |
| teardown is complete, no more to-user calls will be made by the bindings; |
| to-binding calls will fail or produce void/trivial effects. Examples: |
| |
| - An error happening during dispatch. |
| - Destroying a `fidl::WireClient`. |
| - Calling `fidl::WireSharedClient::AsyncTeardown()`. |
| |
| Teardown usually involves closing the client/server endpoint. The exception is |
| `fidl::WireSharedClient` where the client endpoint is only closed after |
| destroying the client object itself. |
| |
| - **Unbind**: actions that stop the message dispatch, and additionally recover |
| the client/server endpoint that was used to send and receive messages. Doing |
| so necessarily involves teardown. Examples: |
| |
| - Calling `fidl::ServerBindingRef::Unbind()`. |
| |
| ### Use-after-free risks during teardown |
| |
| Unless otherwise explicitly stated, all **to-binding** calls in LLCPP are safe |
| to invoke from multiple threads, including threads that do not belong to the |
| async dispatcher associated with the particular FIDL binding. |
| |
| However, this thread-safety property does not prevent all forms of memory |
| corruption when using FIDL bindings. When destroying a set of related objects |
| including FIDL clients or servers, care must be taken to order their destruction |
| such that **to-user** calls made by the FIDL bindings runtime do not end up |
| calling into a destroyed object. |
| |
| To give a concrete example, suppose a `MyDevice` object owns a FIDL client and |
| makes a number of two-way FIDL calls, passing a lambda that captures `this` as |
| the result callback every time. It is unsafe to destroy `MyDevice` while the |
| client could still be dispatching messages in the meantime. This is often the |
| case when the user destroys `MyDevice` (or other business objects) from a |
| non-dispatcher thread, i.e. not the thread that is monitoring and dispatching |
| messages for the current FIDL binding. |
| |
| Similar use-after-free risks exist at destruction time when handling events and |
| when handling method calls from a server. |
| |
| There are a few solutions to this problem, all in the spirit of adding mutual |
| exclusion between the destruction of user objects and **to-user** calls: |
| |
| <ol> |
| <li id="solution_1_scheduling"> |
| <b>Scheduling</b>: ensure that the destruction of relevant user objects |
| is never scheduled in parallel with any to-user calls. |
| </li> |
| <li id="solution_2_ref_counting"> |
| <b>Reference-counting</b>: reference-count the user objects such that |
| they are not destroyed until the binding teardown is complete. |
| </li> |
| <li id="solution_3_two_phase_shutdown"> |
| <b>Two-phase shutdown</b>: provide a notification when binding teardown |
| is complete, such that the user could arrange the business objects to |
| destruct after that. |
| </li> |
| </ol> |
| |
| LLCPP natively supports all above approaches. Ref-counting is inappropriate in |
| some situations, so it is an opt-in functionality when using the bindings. |
| |
| ## Client-side threading |
| |
| There are two client types that supports async operations: `fidl::WireClient` |
| and `fidl::WireSharedClient`. For a precise reference of their semantics, refer |
| to their documentation in the [client header][client-header]. |
| |
| ### WireClient |
| |
| `fidl::WireClient` supports [solution #1 (scheduling)](#solution_1_scheduling) |
| by checking that it is bound and destroyed on the same dispatcher thread that |
| reads and handles messages from the channel. You may make FIDL method calls on |
| it from multiple threads, but the client object itself cannot be moved to |
| another object which is then destroyed on another thread. This ensures that the |
| containing user object is not destroyed while a FIDL message or error event is |
| being dispatched. It is suitable for single-threaded and object oriented usage |
| styles. |
| |
| `fidl::WireClient` can only be used with a single-threaded async dispatcher. |
| |
| `fidl::WireClient` reports errors via the `on_fidl_error` virtual method of the |
| event handler. User-initiated teardown (e.g. by destroying the client) is not |
| reported as an error to the event handler. |
| |
| `fidl::WireClient` does not own the event handler. Instead, the user object |
| which owns the client may implement the event handling interface, and pass a |
| borrowed pointer to the client object. |
| |
| A typical usage of `fidl::WireClient` may look like the following: |
| |
| ```cpp |
| class MyDevice : fidl::WireAsyncEventHandler<MyProtocol> { |
| public: |
| MyDevice() { |
| client_.Bind(std::move(client_end), dispatcher, /* event_handler */ this); |
| } |
| |
| void on_fidl_error(fidl::UnbindInfo error) { |
| // Handle errors... |
| } |
| |
| void DoThing() { |
| // Capture |this| such that the |MyDevice| instance may be |
| // accessed in the callback. This is safe because destroying |
| // |client_| cancels all in-flight calls. |
| client_.Foo(args, [this] (fidl::WireResponse<Foo>*) { ... }); |
| |
| // Alternatively, bind a member function. |
| client_.Foo(args, fit::bind_member(this, &MyDevice::FooReplyHandler)); |
| } |
| |
| private: |
| fidl::WireClient<MyProtocol> client_; |
| }; |
| ``` |
| |
| Notice that there's nothing in particular that is needed when `MyDevice` is |
| destroyed - the client binding will be torn down as part of the process, and |
| the threading checks performed by `WireClient` are sufficient to prevent this |
| class of use-after-frees. |
| |
| ### WireSharedClient |
| |
| `fidl::WireSharedClient` implements [solution #3 (two-phase |
| shutdown)](#solution_3_two_phase_shutdown). Unlike `WireClient` where destroying |
| a client immediately guarantees that there are no more **to-user** calls, |
| destroying a `WireSharedClient` merely initiates asynchronous bindings teardown. |
| The user may observe the completion of the teardown asynchronously. In turn, |
| this allows moving or cloning a `WireSharedClient` to a different thread |
| than the dispatcher thread, and destroying/calling teardown on a client while |
| there are parallel **to-user** calls (e.g. a response callback). Those two |
| actions will race (the response callback might be canceled if the client is |
| destroyed early enough), but `WireSharedClient` will never make any more to-user |
| calls once it notifies its teardown completion. |
| |
| There are two ways to observe teardown completion: |
| |
| * [Owned event handler](#owned_event_handler) |
| * [Custom teardown observer](#custom_teardown_observer) |
| |
| #### Owned event handler |
| |
| Transfer the ownership of an event handler to the client as an implementation of |
| `std::unique_ptr<fidl::WireAsyncEventHandler<Protocol>>` when binding the |
| client. After teardown is complete, the event handler will be destroyed. It is |
| safe to destroy the user objects referenced by any client callbacks from within |
| the event handler destructor. |
| |
| Here is an example showing this pattern: |
| |
| ```cpp |
| {% includecode gerrit_repo="fuchsia/fuchsia" gerrit_path="examples/fidl/llcpp/client_shared/main.cc" region_tag="owned_event_handler" %} |
| ``` |
| |
| #### Custom teardown observer |
| |
| Provide an instance of `fidl::AnyTeardownObserver` to the bindings. |
| The observer will be notified when teardown is complete. There are several |
| ways to create a teardown observer: |
| |
| - `fidl::ObserveTeardown` takes an arbitrary callable and wraps it in a |
| teardown observer: |
| |
| ```cpp |
| {% includecode gerrit_repo="fuchsia/fuchsia" gerrit_path="examples/fidl/llcpp/client_shared/main.cc" region_tag="custom_callback" %} |
| ``` |
| |
| - `fidl::ShareUntilTeardown` takes a `std::shared_ptr<T>`, and arranges the |
| binding to destroy its shared reference after teardown: |
| |
| ```cpp |
| {% includecode gerrit_repo="fuchsia/fuchsia" gerrit_path="examples/fidl/llcpp/client_shared/main.cc" region_tag="share_until_teardown" %} |
| ``` |
| |
| Users may create custom teardown observers that work with other pointer types |
| e.g. `fbl::RefPtr<T>`. |
| |
| `WireSharedClient` caters to systems where business logic states are managed by |
| a framework (drivers are one example, where the driver runtime is the managing |
| framework). In this case, the bindings runtime and the framework will co-own the |
| user objects: the bindings runtime will inform the framework it has surrendered |
| all user object references, at which point the framework can schedule the |
| destruction of the user objects, modulo other ongoing asynchronous teardown |
| processes happening to the same group of objects. An asynchronous teardown does |
| not require synchronizing across arbitrary to-user calls, and helps to prevent |
| deadlocks. |
| |
| The pattern of initiating teardown first, then destroying the user objects after |
| teardown complete is sometimes called *two-phase shutdown*. |
| |
| ### Simple decision tree |
| |
| When in doubt, here are some rules of thumb to follow when deciding which client |
| type to use: |
| |
| - If your app is single-threaded, use `WireClient`. |
| |
| - If your app is multi-threaded but consists of multiple single-threaded |
| dispatchers, and you can guarantee that each client is only bound and |
| destroyed on their respective single dispatcher thread: still able to use |
| `WireClient`. |
| |
| - If your app is multi-threaded and the FIDL clients are not guaranteed to be |
| destroyed on their respective dispatcher threads: use `WireSharedClient` and |
| take on the two-phase shutdown complexity. |
| |
| ## Server-side threading |
| |
| `fidl::WireClient` and `fidl::WireSharedClient` both teardown the binding when |
| they destruct. Different from clients, there is no RAII type on the server side |
| that teardown the binding. The rationale is that servers in simpler applications |
| are created in response to a connection attempt made by a client, and often stay |
| around continuing processing client requests until the client closes their |
| endpoint. When the application is shutting down, the user may shutdown the async |
| dispatcher which then synchronously tears down all server bindings associated |
| with it. |
| |
| As applications grow more complex however, there are scenarios for proactively |
| shutting down server implementation objects, which involves tearing down the |
| server bindings. Drivers for example need to stop relevant servers when the |
| device is removed. |
| |
| There are two ways a server could voluntarily teardown the binding on their end: |
| |
| - `fidl::ServerBindingRef::Close` or `fidl::ServerBindingRef::Unbind`. |
| - `SomeCompleter::Close` where `SomeCompleter` is a method completer provided to |
| a server method handler. |
| |
| For a precise reference of their semantics, refer to their documentation in the |
| [server header][server-header]. |
| |
| All methods above only initiate teardown, hence may safely race with in-progress |
| operations or parallel **to-user** calls (e.g. method handlers). Consequently, |
| the trade-off is that we need to practice some care in maintaining the lifetime |
| of the server implementation object. There are two cases: |
| |
| * [Initiating teardown from the single dispatcher |
| thread](#initiating_teardown_from_the_single_dispatcher_thread) |
| * [Initiating teardown from an arbitrary |
| thread](#initiating_teardown_from_an_arbitrary_thread) |
| |
| ### Initiating teardown from the single dispatcher thread |
| |
| When the async dispatcher (`async_dispatcher_t*`) passed to `fidl::BindServer` |
| only has one thread backing it, and teardown is initiated from that thread (e.g. |
| from within a server method handler or a task running on this dispatcher), then |
| the binding will not make any calls on the server object after `Unbind`/`Close` |
| returns. It is safe to destroy the server object at this point. |
| |
| If the unbound handler is specified, the binding _will_ make one final |
| **to-user** call that is the unbound handler soon after, usually at the next |
| iteration of the event loop. The unbound handler has the following signature: |
| |
| ```cpp |
| // |impl| is the pointer to the server implementation. |
| // |info| contains the reason for binding teardown. |
| // |server_end| is the server channel endpoint. |
| // |Protocol| is the type of the FIDL protocol. |
| void OnUnbound(ServerImpl* impl, fidl::UnbindInfo info, |
| fidl::ServerEnd<Protocol> server_end) { |
| // If teardown is manually initiated and not due to an error, |info.ok()| will be true. |
| if (info.ok()) |
| return; |
| // Handle errors... |
| } |
| ``` |
| |
| If the server object was destroyed earlier on, the callback must not access the |
| `impl` variable as it now points to invalid memory. |
| |
| ### Initiating teardown from an arbitrary thread |
| |
| If the application cannot guarantee that the teardown is always initiated from |
| the single dispatcher thread, then there could be ongoing **to-user** calls |
| during teardown. To prevent use-after-free, we may implement a similar two-phase |
| shutdown pattern as found on the client side. |
| |
| Suppose a server object is allocated on the heap for each incoming connection |
| request: |
| |
| ```cpp |
| {% includecode gerrit_repo="fuchsia/fuchsia" gerrit_path="examples/fidl/llcpp/server/main.cc" region_tag="create_server" %} |
| ``` |
| |
| We could destroy the server object at the end of the `unbound_handler` callback. |
| Here the code accomplishes this by moving the `std::unique_ptr` containing the |
| server into the callback. |
| |
| ```cpp |
| class EchoImpl { |
| public: |
| {% includecode gerrit_repo="fuchsia/fuchsia" gerrit_path="examples/fidl/llcpp/server/main.cc" region_tag="bind_server" %} |
| |
| // Later, when the server is shutting down... |
| void Shutdown() { |
| binding_->Unbind(); // This stops accepting new requests. |
| // The server is destroyed asynchronously in the unbound handler. |
| } |
| }; |
| ``` |
| |
| Note: if the server is always managed in a `unique_ptr` or a `shared_ptr`, you |
| may pass the smart pointer directly to `fidl::BindServer` which has the relevant |
| special casing for these pointer types. The pointer is destroyed after the |
| unbound handler returns. The example above manually arranges this to show that |
| other custom teardown logic may also be inserted. |
| |
| The two-phase shutdown pattern is necessary to accommodate the possibility of |
| parallel server method handler calls at the point of initiating teardown. The |
| bindings runtime will call the unbound handler after these **to-user** calls |
| return. In particular, if a server method handler takes a long time to return, |
| the unbinding procedure could be delayed by an equal amount of time. It is |
| recommended to offload long running handler work to a thread pool and make the |
| reply asynchronously via `completer.ToAsync()`, thus ensuring prompt return of |
| method handlers and timely unbinding. The reply will be discarded if the server |
| binding has been torn down in the meantime. |
| |
| ## Interacting with the async dispatcher |
| |
| All asynchronous request/responses handling, event handling, and error handling |
| are done through the `async_dispatcher_t*` provided when binding a client or |
| server. With the exception of shutting down the dispatcher, you can expect that |
| **to-user** calls will be executed on a dispatcher thread, and not nested within |
| other user code (no reentrancy issues). |
| |
| If you shutdown the dispatcher while there are any active bindings, the teardown |
| may be completed on the thread executing shutdown. As such, you must not take |
| any locks that could be taken by the teardown observers provided to |
| `fidl::WireShareClient` or the unbound handler provided to `fidl::BindServer` |
| while executing `async::Loop::Shutdown`/`async_loop_shutdown`. (You should |
| probably ensure that no locks are held around shutdown anyway since it joins all |
| dispatcher threads, which may take locks in user code). |
| |
| [client-header]: https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/system/ulib/fidl/include/lib/fidl/llcpp/client.h |
| [server-header]: https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/system/ulib/fidl/include/lib/fidl/llcpp/server.h |
| [wire-client]: https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/system/ulib/fidl/include/lib/fidl/llcpp/client.h?q=llcpp%2Fclient.h%20WireClient&ss=fuchsia%2Ffuchsia |
| [wire-shared-client]: https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/system/ulib/fidl/include/lib/fidl/llcpp/client.h?q=llcpp%2Fclient.h%20WireSharedClient&sq=&ss=fuchsia%2Ffuchsia |