blob: a0b420746533ab997235339a5b15ae6218c24f02 [file] [log] [blame] [view] [edit]
# How to implement a user pager
## Introduction
Fuchsia filesystems are implemented in userspace so Fuchsia exposes a way to implement pager-backed
memory in userspace. As client programs access memory, the kernel will send requests to the
associated pager to populate the corresponding memory.
Non-filesystem programs can also implement a pager. With the addition of another thread, any
userspace program can implement paging functionality for a memory region. This can enable
significant flexibility for dynamically-populated data regions.
Most filesystems will use the [vfs](/src/storage/lib/vfs) library which implements these concepts
and exposes them to the filesystem in an easier-to-use interface. Using the "vfs" library is
recommended when possible to implement filesystems.
This document describes the low-level implementation details of a writing a pager from scratch.
## Requesting a pager-backed VMO
Paged [Virtual Memory Objects](/docs/concepts/kernel/concepts.md) (VMOs) can be created and
transferred in any way the implementation desires. Most mappings are created by filesystems which
map memory in response to `GetBackingMemory` requests on the
[fuchsia.io/File](/docs/reference/fidl/fuchsia.io) interface. (Most client programs will make this
IPC request indirectly through the Posix `mmap` interface.)
## Creating a pager-backed VMO
For general setup:
* Create a [pager](/docs/reference/kernel_objects/pager) kernel object with
[zx\_pager\_create](/docs/reference/syscalls/pager_create) (or `zx::pager::create` in C++). This
object will be used to create individual VMOs. An example of this setup is in the
[paged_vfs.cc](/src/storage/lib/vfs/cpp/pager-backed.cc) implementation.
* Create a thread or pool of threads to respond to paging requests. You can use the
[async loop](/sdk/lib/async/include/lib/async/cpp/paged_vmo.h), but if all your
thread does is respond to paging requests, it can be simpler to create a port and wait on it
manually. An example is in
[pager\_thread\_pool.cc](/src/storage/lib/vfs/cpp/pager_thread_pool.cc) implementation.
To create a pager-backed vmo, call
[zx\_pager\_create\_vmo](/docs/reference/syscalls/pager_create_vmo) and supply the size, the port
that you will use to wait on page requests, and a unique ID for your code to associate requests with
(these are not used by the kernel; they will come out as the `key` in the `zx_port_packet_t` that
you read from the port).
This pager-backed vmo should never be directly written or read to by the code backing the pager.
Doing so will cause the kernel to try to page in the data which will re-enter the pager. This
vmo will instead be populated on demand using a special API described below.
At this time, the pager implementation should also register a watcher for "no clones" of the
pager-backed VMO (see "Freeing the paged VMO" below).
### Vending a pager-backed VMO
A pager would create one pager-backed VMO for each file (or equivalent concept), but typically
there can be multiple consumers of the same file. To implement this, the pager would keep a single
pager-backed VMO for each file and send clones of this VMO to each consumer. Using clones is also
important to know when the clients are done with the mappings (see "Freeing the pager-backed VMO"
below).
```c++
zx::vmo clone;
paged_vmo.create_child(ZX_VMO_CHILD_SNAPSHOT_AT_LEAST_ON_WRITE, 0, size, &clone);
```
If the paging data is executable code, it must be marked with the executable permission. Such
permission is not generally available to user programs. Filesystems that need to vend executable
pages will be passed a "vmex" [resource](/docs/reference/kernel_objects/resource). The clone is
marked executable by calling
[zx\_replace\_as\_executable](/docs/reference/syscalls/vmo_replace_as_executable) to replace the
clone's handle with one that has the executable permission.
## Responding to paging requests
Page requests for a pager-backed VMO are delivered on the port associated with the
`zx_pager_create_vmo()` call that created it. They will come in with a packet type of
`ZX_PKT_TYPE_PAGE_REQUEST` and the `key` will be the unique ID supplied at creation time. The
pager would use the ID to lookup the information required for the object. An example is in the
[pager\_thread\_pool.cc](/src/storage/lib/vfs/cpp/pager_thread_pool.cc) implementation.
The pager responds to the request by either populating the requested range of the VMO or by marking
it as failed. The pager must always report the entire requested range as either populated or failed
or the thread that triggered the page request will hang forever waiting on the page fault to
complete.
### Reporting errors
Pager errors are reported by calling [zx\_pager\_op\_range](/docs/reference/syscalls/pager_op_range)
with `ZX_PAGER_OP_FAIL`. It is passed the pager-backed VMO handle, the offset and length originally
requested by the kernel, and an error value. Importantly, the error value must be one of several
known values (see the syscall documentation).
### Supplying pager-backed data
To supply data, the pager creates an "aux VMO" that holds the data being transmitted to the kernel
for writing into the pager-backed vmo. This is a normal non-pager-backed VMO that is only used for the
unique communication of data from a pager to the kernel. The pager then calls
[zx\_pager\_supply\_pages](/docs/reference/syscalls/pager_supply_pages) with the aux vmo and the
data range requested by the kernel in the page request.
Importantly, [zx\_pager\_supply\_pages](/docs/reference/syscalls/pager_supply_pages) enforces some
requirements of the aux vmo, including that it must not be mapped during the call. You can populate
it by mapping, writing to it, and then unmapping, or using `zx_vmo_write`. You can re-use the same
aux vmo for every page request, or create a new one each time. The unique requirements of the aux
vmo allow the pages of the aux vmo to be spliced into the pager-backed VMO without copying.
Attempting to write directly into the pager-backed VMO, or even indirectly with `zx_vmo_write()` will cause
pager requests and will reenter the pager. The use of the aux vmo and the special
`zx_pager_supply_pages()` function avoid this problem.
## Freeing the pager-backed VMO
When there are no mapped clones of the pager-backed vmo, it can be deleted. A pager implementation
watches for the "no clones" notification of the main pager-backed VMO to know when this happens. For
an example, see `PagedVnode::WatchForZeroVmoClones()` in [paged\_vnode.cc](/src/storage/lib/vfs/cpp/paged_vnode.cc).
One thing to keep in mind is that the kernel will queue this message for delivery. In the meantime,
it might be possible for the pager to create a new clone of the VMO. As a result, the message
handler should verify there are actually no clones before doing any cleanup. An example can be seen
in the `PagedVnode::OnNoPagedVmoClonesMessage()` in
[paged\_vnode.cc](/src/storage/lib/vfs/cpp/paged_vnode.cc).
### Freeing a pager-backed VMO when there are still clones
In some implementations, it might be possible for the file (or other backing data) to be destroyed
before the clients of the pager-backed data. This can result in page requests getting delivered to
the port for objects that are destroyed. Such implementations should be sure to handle this case
(possibly by validating the unique pager ID and silently returning if there is no corresponding
pager).
Although not always necessary, it is also possible to cleanly detach such that the kernel guarantees
that no more pager requests will be forthcoming. To avoid races with already-queued page requests,
this operation is a two-step process.
* The pager calls [zx\_pager\_detach\_vmo](/docs/reference/syscalls/pager_detach_vmo).
* The kernel will enqueue a port packet with the type `ZX_PAGER_VMO_COMPLETE` for that object.
After this message is received, no more messages will be received for that VMO.