futex - A primitive for creating userspace synchronization tools.
A futex is a Fast Userspace muTEX. It is a low level synchronization primitive that is a building block for higher level APIs such as
Futexes are designed to not enter the kernel or allocate kernel resources in the uncontested case.
The zircon futex implementation currently supports three operations distributed over 6 syscalls:
zx_status_t zx_futex_wait(const zx_futex_t* value_ptr, zx_futex_t current_value, zx_handle_t new_futex_owner, zx_time_t deadline); zx_status_t zx_futex_wake(const zx_futex_t* value_ptr, uint32_t wake_count); zx_status_t zx_futex_wake_single_owner(const zx_futex_t* value_ptr); zx_status_t zx_futex_requeue(const zx_futex_t* value_ptr, uint32_t wake_count, zx_futex_t current_value, const zx_futex_t* requeue_ptr, uint32_t requeue_count, zx_handle_t new_requeue_owner); zx_status_t zx_futex_requeue_single_owner(const zx_futex_t* value_ptr, zx_futex_t current_value, const zx_futex_t* requeue_ptr, uint32_t requeue_count, zx_handle_t new_requeue_owner); zx_status_t zx_futex_get_owner(const zx_futex_t* value_ptr, uint64_t* koid);
All of these share a
value_ptr parameter, which is the virtual address of an aligned userspace integer. This virtual address is the information used in kernel to track what futex given threads are waiting on. The kernel does not currently modify the value of
*value_ptr (but see below for future operations that might do so). It is up to userspace code to correctly atomically modify this value across threads in order to build mutexes and so on.
Note that with address tagging, userspace pointers won't always have a 1-to-1 mapping of futex instances in the kernel. Addresses which have been stripped of architecture-specific tagging information are used for futex IDs. For example, on ARM where Top-Byte-Ignore (TBI) is enabled, a futex pointer with the value
0x0A000000FF123450 has the same futex ID as a futex pointer with the value
0x0B000000FF123450, because while their tags (
0x0B) are different, their address bits are the same.
Futex objects do not have any rights associated with them.
There are only 2 primitive operations that userspace code can perform on a futex: waiting and waking (requeue is a combination of the two). Because futexes are strictly a process local concept, revoking access to either of these operations would make the futex functionally worthless.
Additionally, from the kernel's perspective, futexes are ephemeral objects whose state only exists while the futex has waiters. Without a more durable state present in the kernel, it is more or less impossible to have a persisted concept of rights for a futex.
Note that all of the zircon futex operations key off of the virtual address of an userspace pointer. This differs from the Linux implementation, which distinguishes private futex operations (which correspond to our in-process-only ones) from ones shared across address spaces.
As noted above, all of our futex operations leave the value of the futex unmodified from the kernel. Other potential operations, such as Linux's
FUTEX_WAKE_OP, requires atomic manipulation of the value from the kernel, which our current implementation does not require.
Some runtimes may need to implement synchronization primitives based on futexes which exhibit priority inheritance behavior. In order to support these users, zircon futexes have a concept of ‘ownership’ which can be used to implement such primitives. Use of this feature is optional.
At any point in time, a futex may be either unowned, or owned by a single thread. When a thread owns one or more futexes, its effective priority becomes the maximum of its base priority, and the priorities of all of the current waiters of all of the futexes currently owned by it. As soon a thread no longer owns a futex, the pressure of the priorities of the futex's waiters disappears from the relationship above. Once the thread no longer owns any futexes, its priority will relax back to its base priority.
Signaling of the owner of a futex is the responsibility of the userspace code, as is applying the ownership concept properly when constructing a specific type of synchronization object that needs priority inheritance behavior.
Zircon futexes have at most a single owner. Multiple ownership of futexes for the purpose of priority inheritance is not supported. The owner of a futex may never simultaneously be a waiter for the same futex.
Ownership of a futex is assigned via each ‘wait’ or ‘requeue’ operation. In the case of a requeue operation, the target futex is the requeue futex, not the wake_futex. Users pass a handle to a thread indicating who the current owner of the futex should be, or ZX_HANDLE_INVALID if there should be no owner.
Ownership of a futex may be transferred by the kernel on behalf of the user during a wake operation or a requeue operation. In the case of a requeue operation, the target of the transfer is the wake_futex, not the requeue_futex. Ownership transfer only takes place when using the
zx_futex_requeue_single_owner() variants of the wake/requeue operations. The
single_owner variants of these operations will release exactly one waiter, and assign ownership of the futex to the released thread.
Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux, Hubertus Franke and Rusty Russell
This is the original white paper describing the Linux futex. It documents the history and design of the original implementation, prior (failed) attempts at creating a fast userspace synchronization primitive, and performance measurements.
Futexes Are Tricky, Ulrich Drepper
This paper describes some gotchas and implementation details of futexes in Linux. It discusses the kernel implementation, and goes into more detail about correct and efficient userspace implementations of mutexes, condition variables, and so on.
Further commentary on “Futexes are tricky”, outlining a simple implementation that avoids the need for
Locking in WebKit, Filip Pizlo
An in-depth tour of the locking primitives in WebKit, complete with benchmarks and analysis. Contains a detailed explanation of the “parking lot” concept, which allows very compact representation of userspace mutexes.