blob: 225520b6eb59d3f1d5d24e2639ced2f51e90951a [file] [log] [blame] [view] [edit]
<!-- mdformat off(templates not supported) -->
{% set rfcid = "RFC-0143" %}
{% include "docs/contribute/governance/rfcs/_common/_rfc_header.md" %}
# {{ rfc.name }}: {{ rfc.title }}
<!-- SET the `rfcid` VAR ABOVE. DO NOT EDIT ANYTHING ELSE ABOVE THIS LINE. -->
<!-- mdformat on -->
<!-- This should begin with an H2 element (for example, ## Summary).-->
## Summary
This document proposes changes to the kernel ABI to support tagged userspace
pointers.
## Motivation
[Top-Byte-Ignore (TBI)][tbi] is a feature on all ARMv8.0 CPUs that causes the
top byte of virtual addresses to be ignored on loads and stores. Instead, bit
55 is extended over bits 56-63 before address translation. This feature allows
use of the (ignored) top byte as a tag or for other in-band
metadata. One of the immediate uses of TBI is enabling [Hardware-assisted
AddressSanitizer (HWASan)][hwasan] in userspace, where tags are stored in the
top byte for memory tracking.
This document describes how the kernel should handle tagged user pointers.
While TBI and HWASan are the most relevant use cases of tagged pointers, this
design is not meant to solely cover them. There are other platforms with their
own hardware features similar that support tagged pointers, and other userspace
programs that can use these tag bits for their own specific use cases. This
design should be generic enough to support other implementations of tagged
pointers without any specific focus on one user application.
## Terminology
These are some terms that will be used frequently in this proposal. There are
_addresses_ and there are _pointers_. These are similar concepts but treated
very differently. Semantically, some syscalls operate on addresses while others
operate on pointers.
### Address
An address is a 64-bit integer that represents a location within
the bounds of a user address space. An address is never tagged. Syscalls that
manipulate an address space operate on addresses. The value of an address is
always constrained within the range of an address space (indicated by
`ZX_INFO_VMAR`).
### Pointer
A pointer is a programming language-specific concept that generally
indicates a location of dereferenceable memory. Each language defines the
implementation of a pointer and its translation into hardware. For C/C++, in the
context of HWASan, a pointer is a 64-bit integer that consists of tag bits and
address bits. Pointers can either be tagged (indicating non-zero tag bits) or
untagged (indicating tag bits of zero). Syscalls that access user memory
generally operate on pointers.
### Tag
A tag refers to the upper bits of a pointer, generally used for
metadata. On ARM with TBI enabled, a tag is 8 bits wide consisting of bits
56-63.
### TBI (Top-*Bits*-Ignore)
Other prospective hardware features such as [ARM MTE][mte] or [Intel LAM][lam]
also support a way of "ignoring" some portion of the upper bits of a pointer.
Unless specified, the term "TBI" used in this doc represents any generic hardware
feature that supports ignoring tags rather than exclusively ARM TBI.
## Design {#design}
Kernel code should replicate the behavior of the hardware. Tags
should be handled such that kernel behavior makes sense to users.
These are some examples of how the system should behave when TBI is enabled:
1. `zx_channel_write` can accept tagged pointers, and the call will behave the
same as if the pointers were untagged.
2. When a process takes a page fault on a tagged user pointer, the page fault
will be resolved as if the fault occurred on the same untagged pointer, with
one exception. If the fault generates a Zircon exception, the exception
report's fault pointer will contain the original tagged pointer to the degree
that the hardware preserves it.
3. For the purpose of a futex wake/wait resolution, any tag on the supplied
pointers will be ignored. In other words, waking a pointer that only differs
by the tag will still wake any waiters waiting on that pointer, regardless of
any tag they may have specified.
4. When reading memory from a process (like with `zx_process_read_memory`), the
kernel will accept an **address** as the argument for the location of the block
of memory being read. In conjunction to [software debugging](#debugging),
debuggers will need to explicitly translate debuggee pointer values to
addresses to read via kernel APIs.
### Tagged Pointer ABI: Tag-Insensitive, but Tag-Preserving {#taggedptrabi}
The following will hold when TBI is enabled:
1. __The kernel will ignore tags on user pointers received from syscalls.__ For
example, a `zx_channel_read` call with a buffer pointer containing a tag
would behave exactly the same as if the buffer pointer were untagged.
2. __It is an error to pass a tagged pointer on syscalls that accept
addresses (ie. `zx_vaddr_t`).__ For example, a virtual address passed to
`zx_process_read_memory` cannot be tagged. Using a tagged pointer where an
address is required will be treated like any other invalid address.
3. __When the kernel accepts a tagged pointer, whether through syscall or
fault, it will try to preserve the tag to the degree that user code may
later observe it.__
For example, if a user program faults on a tagged user pointer, then
the resulting Zircon exception report will include the tag if the hardware
can preserve it. The tag will be stripped if the hardware does not guarantee
the tag can be preserved. If there is no mechanism by which userspace may
observe the tag, the kernel is free to strip it provided it does not alter
system behavior. If the hardware only guarantees partial preservation of a
tag, then the kernel may only strip bits not guaranteed to be preserved.
4. __The kernel itself will never generate tagged pointers.__ For example, when
mapping a VMO (via `zx_vmar_map`), the resulting value selected by the kernel
will be a pure address with no tag.
5. __When comparing userspace pointers, the kernel will ignore any tags that may
be present.__ For example if a thread is waiting (via `zx_futex_wait`) on a
pointer with tag A, and another thread is waking (via `zx_futex_wake`) on a
pointer with the same address bits but tag B, then the waiter will be woken.
### ARM64 TBI enabled for Everything
TBI will be controlled by a kernel boot-option. When enabled, TBI will be on
for all userspace processes.
### Debugging Software {#debugging}
Debuggers will need to be TBI-aware. ARM TBI does not allow setting a tag on
debug registers. Debuggers will need to explicitly sign-extend the most
significant VA bit before setting debug registers.
## Implementation
A boot-option will control whether user address spaces have ARM TBI enabled.
ARM TBI can be enabled by setting the `TBI0` and `TBI1` bits in the
translation control register (EL1).
In addition to enabling/disabling TBI, we'll need to make sure existing
syscalls correctly handle pointers/addresses. There are only a few places where
the kernel handles user pointers (e.g. `user_ptr`) so the changes required to
implement this proposal are relatively small.
We can indicate to userspace the type of TBI running through new system
[features][features]. We can introduce a new feature kind
`ZX_FEATURE_KIND_ADDRESS_TAGGING` and this kind can support new feature bits
indicating the address tagging, like `ZX_ARM64_FEATURE_ADDRESS_TAGGING_TBI` for
ARM TBI.
## Performance
Performance impact should be negligible and existing microbenchmarks will be
used to verify.
## Testing
We will need to test for:
1. Checking syscalls that use pointers with different tags, and those tags are
effectively ignored.
2. Waking on a tagged vs untagged pointer (tag-insensitive behavior).
3. Faulting on a tagged pointer preserves the tag in the exception
(tag-preserving behavior).
4. Any behavior to make kernel TBI known to userspace, such as the presence
of a system feature or a query that returns the number of top bits
ignored.
5. Verifying tagged pointers are rejected by syscalls that expect addresses.
These tests need to be skipped if TBI is not supported.
## Documentation
All documentation for the Tagged Pointer ABI is documented in this RFC. Once
this has been implemented, we may need to update some Zircon documentation to
specify which arguments for which syscalls cannot accept tags.
Syscalls that guarantee some degree of tag preservation will need to be
documented to specify which bits are preserved and which can be stripped.
## Drawbacks, alternatives, and unknowns
### TBI Toggle Granularity
We have two levels at which we can control the scope of TBI: globally and
per-process. A per-process approach would involve some mechanism that allows
toggling TBI at either process creation time or start time. This would require
either a new syscall, argument, or bitflag that would need more testing and
potentially introduce new bugs or security issues that will take time to
discover. Having a process toggle could be [expensive][contextswitch].
A global switch is less complex, and helps avoid many of the "unknown unknowns"
with having to implement a runtime switch. It will also likely be safer if all
applications for the system were either TBI-aware or non-TBI-aware rather than
having a mixture of both.
### Stripping Tags in Usermode
This would involve stripping all tags in the syscall layer before they made it
into the kernel. This way, no kernel changes would be needed, and the kernel
could remain agnostic to tags. One issue with this involves fault handling on
userspace pointers. If a fault is generated on a tagged pointer, then it will
be up to each userspace handler to strip the tag.
### Support for Other Addressing Modes
This proposal should be flexible enough to account for other hardware features
that involve "ignoring" top bits. We don't plan to support these in the near
future, but we should be in a state where turning one on would require minimal
changes.
### ARM Memory Tagging Extension (MTE)
[MTE][mte] is a feature that works on top of TBI for finding bad memory
accesses. [Memory tagging][memtag] works by associating each allocation and
pointer with a specific tag value. A pointer with a tag different from the
allocation it's trying to access indicates a bad memory access because of a
tag mismatch. With MTE, this tag is a 4-bit value stored in the top byte of the
pointer.
Under this ABI, should MTE be enabled, the tag would refer to the top 8-bits
of a pointer, but only bits 56-59 would be preserved of faults since the
hardware only guarantees preservation of those 4 bits.
### Intel Linear Address Masking (LAM)
[LAM][lam] is an upcoming feature for x86 where either the top 7 or 16 bits in a
pointer are masked out on loads/stores. This is controlled globally by altering
the CR3 register. Unlike TBI, LAM will not preserve any of the tag bits on a
page fault.
## Prior art and references
### Tagged virtual addresses in AArch64 Linux
Much of the design for this proposal was inspired from the [Tagged Address ABI
from Linux][linuxtbi],
namely most kernel behavior should remain unaffected when accepting tagged
pointers. One major differences is that Linux supports toggling the ABI
per-thread whereas this proposal aims to toggle the ABI globally at build/boot
time. Additionally, ARM TBI is enabled all the time on Linux whereas ARM TBI is
also controlled by the same build option.
[tbi]: https://developer.arm.com/documentation/den0024/a/ch12s05s01
[hwasan]: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[mte]: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
[lam]: https://www.phoronix.com/scan.php?page=news_item&px=Intel-LAM-Glibc
[linuxtbi]: https://www.kernel.org/doc/html/latest/arch/arm64/tagged-address-abi.html
[contextswitch]: https://lore.kernel.org/linux-arm-kernel/20201124184742.GC42276@C02TF0J2HF1T.local/
[memtag]: https://arxiv.org/pdf/1802.09517.pdf
[features]: /reference/syscalls/system_get_features.md