blob: 85e19af25484246dd8d21ddaa38c721cd43b9954 [file] [log] [blame] [view] [edit]
# VMO Registration Pattern
## Summary
When transferring bulk data between applications and peripheral hardware, it
becomes important to minimize the number of copies the data goes through. For
example, let us say an application would like to read a file from component
persistent storage. In order to do so, the application makes a request to read
the file to a filesystem, which in turn may need to send a request to a block
device. Depending on the block partition topology, there may be several layers
of drivers the request passes through before ultimately hitting a driver which
can perform a read operation.
A naive approach to the above may result in sending FIDL messages over Zircon
channels across every layer between the application and the hardware, resulting
in many copies of the data. As this is inefficient, we dont do this. Following
a well established pattern found throughout the industry, we split our messages
into two planes: a control plane and a data plane. Messages sent over the
control plane are small and cheap to send, whereas messages in the data plane
contain the bulk data which would be expensive to copy. Messages sent over the
control plane generally use FIDL protocols built on top of Zircon channels.
Messages in the data plane are sent via a shared memory primitive, Zircon VMOs.
With this in mind, a naive implementation may choose to create a new VMO for
each transaction which gets transferred via the control plane until it reaches
the driver issuing DMA, achieving the desired goal of zero copies between the
application which placed the data in the VMO and the final driver. This however
may not be sufficiently performant for the following two reasons:
* In order to issue a DMA request, the memory must first be pinned, which
requires calling into the kernel and optionally setting up page mappings in an
IOMMU.
* If the final driver needs to copy the request into a special buffer (as not
all hardware supports DMA), it must either map the VMO into its process or
call into the kernel in order to copy the memory.
Since both of these are costly we need a better approach: using pre-registered
VMOs. This works by having the application send a one-time control message in
order to register a VMO with the final driver in the stack. The response to this
message returns an identifier which may be used to refer to the VMO in the
future. Control messages should simply refer to this identifier rather than
attaching a VMO handle. Upon registration, the final driver in the stack can
perform the costly pinning or mapping operations once, and cache the results.
## Notes on VMO Identifier
In order to ensure that we do not fall prey to confused deputy attacks, we must
uphold the same invariants with respect to the VMO identifier as the kernel does
with handles. In order to do this, the VMO identifier must be unique to the
client at each layer, and each layer must validate that the identifier is valid.
More specifically, using a koid as an identifier still requires that the server
checks that a VMO with that koid was registered by the client.
In order to lower the number of round trips, it is possible to allow the client
to name the VMO identifier as part of the registration API, allowing one-shot
VMO usage to be efficient. Alternatively, the protocol can state that the VMOs
koid will always be used as the identifier.
## Zircon FIFOs
In order to additionally improve performance, some protocols may also opt to use
FIFOs for their control plane. FIFOs have reduced complexity allowing for lower
overhead. One of their limitations is that they may not transfer handles. As a
result, using the VMO registration pattern is a necessity in order to use FIFOs.
(Note that a channel must still be used to perform the registration.)
## Library
This pattern potentially adds a lot of complexity to the driver which maintains
the mappings between VMO and the identifier. A library has been created to aid
the implementation, and lives under
[//src/lib/vmo_store](https://cs.opensource.google/fuchsia/fuchsia/+/main:src/lib/vmo_store/).
See
[//src/connectivity/network/drivers/network-device/device](https://cs.opensource.google/fuchsia/fuchsia/+/main:src/connectivity/network/drivers/network-device/device/)
for example usage.
## Downsides of the Pattern
For low-throughput situations, this pattern is unnecessarily complex and should
likely be avoided.
VMO registration causes a one-shot operation to become 2 round trips. If
one-shots are common, FIDL protocols should be sure to continue to allow for
one-shot VMOs to be used in addition to pre-registered VMOs. This can also be
mitigated by allowing the client to provide the identifier for the VMO during
registration.
VMOs which are pre-registered may lead to leaked memory situations where a
client keeps registering VMOs and forgets to unregister them. Additionally, if
the server is not careful with managing its clients, it may forget to clean up
registered VMOs belonging to a client which may have disconnected from the
server.
VMOs which are pre-registered with a driver which pins the VMOs cause the pages
backing the VMO to no longer be pageable.
## Driver-Specific Considerations
Since some drivers reside in the same driver host process and we have a
mini-driver pattern whereby we hoist common logic into a core driver, it might
seem like the obvious thing to do would be to perform the VMO registration in
the core driver rather than the device-specific driver. This however is not a
good idea for the following reasons:
* The core driver needs to be informed whether to perform pinning or mapping
operations by the device-specific driver.
* Pinning requires access to the [bus transaction initiator
(BTI)](/docs/reference/kernel_objects/bus_transaction_initiator.md) handle
provided by the platform-bus or pci drivers. Passing a BTI handle up the
driver stack is an anti-pattern.
* In the case mapping is necessary, this means that raw buffers are passed over
FIDL. This is an anti-pattern as it may no longer be possible without a copy
in future iterations of in-process inter-driver communication.
* In either case if the operation is asynchronous (which most are), then the
core driver becomes responsible for ensuring that it doesnt unpin/unmap the
VMO while its still in use. This is particularly problematic in situations
such as shutdown and suspend which arent as well tested.
* In cases such as the block stack, the core driver is bound multiple times
recursively in the same driver host. The core driver would need to be aware of
whether it is bound directly to the driver which talks to hardware or a filter
layer.