Field | Value |
---|---|
Status | Accepted |
Author(s) | steveaustin@google.com |
Submitted | 2020-10-22 |
Reviewed | 2020-11-04 |
Issue | fxbug.dev/62553 fxbug.dev/45709 |
Waiting for signals to be asserted on an object is usually level-triggered and a check is done at the start of zx_object_wait_async
in case the signal is already active, in which case a packet is immediately sent to the port. This RFC concerns adding an option to zx_object_wait_async
, ZX_WAIT_ASYNC_EDGE
, which does not perform that initial check and thus will only produce a packet when the signal transitions from inactive to active after the call.
It may be that zx_object_wait_async
is called with ZX_WAIT_ASYNC_EDGE
with the signals on the object already active. In this case, a packet will be queued on the port only after the signal on the object becomes inactive and then subsequently is asserted. In fact, this is how the ZX_WAIT_ASYNC_EDGE
is commonly used.
The epoll
polling mechanism in Linux can function in two modes - level-triggered and edge-triggered. Fuchsia's waiting features, particularly zx_object_wait_async
and zx_port_wait
already make level-triggered polling possible. However, edge-triggered polling requires the ability to wait on a signal on an object that is already active is expected (through I/O) to become inactive and subsequently active again, queuing a packet on the port on this subsequent signal transition. This is the intent of ZX_WAIT_ASYNC_EDGE
.
Implementation of the ZX_WAIT_ASYNC_EDGE
flag of zx_object_wait_async
is fortunately quite simple.
At present, if one of the signal set is already active, the observer's OnMatch
method is called directly without any further action. Otherwise, if none of the signal set is active, the set is added to the interest list of the DispatchObject
via the supplied SignalObserver
.
The proposal is that, if the ZX_WAIT_ASYNC_EDGE
flag is specified, the initial check is omitted and the signal set added to the interest list of the DispatchObject
regardless of the initial signal state. In this mode of operation, one of the signals must transition from inactive to active for a packet to be queued on the supplied port (possibly requiring a signal to become inactive in the process).
The main use of this change is to enable edge-triggered polling with the EPOLLET flag in epoll. Waiting in Zircon differs from polling in epoll
in that file descriptors are added to an epoll
file descriptor using epoll_ctl/EPOLL_CTL_ADD
and are continually monitored until removed with epoll_ctl/EPOLL_CTL_DEL
. Zircon waiting, especially with zx_object_wait_async
, is always one-shot and the file object must be “re-armed” by calling zx_object_wait_async
again after a signal has become active.
Because epoll
use must operate by repeatedly calling epoll_wait
(without necessarily calling epoll_ctl
), this re-arming call to zx_object_wait_async
must occur somewhere in epoll_wait
.
For the default level-triggered polling, in epoll_wait
once zx_port_wait
returns with a signalled file object, we cannot call zx_object_wait_async
before returning, because the signal on that object is actve and will generate a duplicate packet on the port. Therefore, a list of active level-triggered file descriptors is maintained and zx_object_wait_async
is called on file descriptors in this list on entering epoll_wait
prior to waiting in zx_port_wait
.
For edge-triggered polling, after epoll_wait
returns, non-blocking I/O should be performed until EWOULDBLOCK
is returned. At that point the signal on the file object will be inactive. At this point epoll_wait
should be called. However, if the signal on the file object becomes active between the I/O operation returning EWOULDBLOCK
and epoll_wait
being called, that event will be lost unless zx_object_wait_async
has already been called. It follows that, in edge-triggered mode, the call to zx_object_wait_async
to re-arm the file object must be called before epoll_wait
returns. This is where ZX_WAIT_ASYNC_EDGE
is necessary. The call to zx_object_wait_async
can be called with this flag between zx_port_wait
returning and epoll_wait
returning, because although the signal is active at this point, the ZX_WAIT_ASYNC_EDGE
skips the check that the signals are active (which they are at this point), so no packet is immediately queued on the port. This means that when the I/O occurs until EWOULDBLOCK
, the file object is already being monitored by zx_object_wait_async
and there is no gap in coverage.
The addition of the ZX_WAIT_ASYNC_EDGE
option to zx_object_wait_async
has already been implemented in [https://fuchsia-review.googlesource.com/c/fuchsia/+/438521] and its use in epoll has been implemented in [https://fuchsia-review.googlesource.com/c/fuchsia/+/438656].
The performance impact will be negligible as only an extra added method parameter and a check for that parameter is added to the existing code.
N/A
N/A
Additional unit tests have been added.
Documentation has been added to zx_object_wait_async
in the implementing CL. zx_object_wait_async
This appears to be the simplest way of implementing this feature and is analogous to how edge-triggered polling is implemented in other operating systems.
Care should be taken to not miss I/O events when using this flag. A signal may become active after performing I/O and before making a call to zx_object_wait_async
, in which case the transition from unsignalled to signalled may be missed. In practice, the ZX_WAIT_ASYNC_EDGE
flag is used immediately zx_port_wait
has returned indicating that the signal is active on the object. In this way, after non-blocking I/O is performed on the object until the signal is inactive (usually until ZX_ERR_SHOULD_WAIT
is returned) zx_port_wait
can be called to wait until additional I/O is ready.
Because of the pattern in edge-triggered epoll_wait
with EPOLLET
of (1) epoll_wait
(2) non-blocking I/O until fd is not-ready (3) epoll_wait
again, an alternative to ZX_WAIT_ASYNC_EDGE
would be to perform a check in every I/O operation to see if the file descriptor has been added to an epoll file descriptor via epoll_ctl
with EPOLLET
(and has ceased to be ready) and re-arms a wait. This would require considerable modification to zxio
and fdio
for a somewhat rare use case.
The purpose of this change is to emulate the EPOLLET flag in Linux: