src/storage/docs/paging.md - fuchsia - Git at Google

 # How to implement a user pager

 ## Introduction

 Fuchsia filesystems are implemented in userspace so Fuchsia exposes a way to implement pager-backed
 memory in userspace. As client programs access memory, the kernel will send requests to the
 associated pager to populate the corresponding memory.

 Non-filesystem programs can also implement a pager. With the addition of another thread, any
 userspace program can implement paging functionality for a memory region. This can enable
 significant flexibility for dynamically-populated data regions.

 Most filesystems will use the [vfs](/src/storage/lib/vfs) library which implements these concepts
 and exposes them to the filesystem in an easier-to-use interface. Using the "vfs" library is
 recommended when possible to implement filesystems.

 This document describes the low-level implementation details of a writing a pager from scratch.

 ## Requesting a pager-backed VMO

 Paged [Virtual Memory Objects](/docs/concepts/kernel/concepts.md) (VMOs) can be created and
 transferred in any way the implementation desires. Most mappings are created by filesystems which
 map memory in response to `GetBackingMemory` requests on the
 [fuchsia.io/File](/docs/reference/fidl/fuchsia.io) interface. (Most client programs will make this
 IPC request indirectly through the Posix `mmap` interface.)

 ## Creating a pager-backed VMO

 For general setup:

   * Create a [pager](/docs/reference/kernel_objects/pager) kernel object with
     [zx\_pager\_create](/docs/reference/syscalls/pager_create) (or `zx::pager::create` in C++). This
     object will be used to create individual VMOs. An example of this setup is in the
     [paged_vfs.cc](/src/storage/lib/vfs/cpp/pager-backed.cc) implementation.

   * Create a thread or pool of threads to respond to paging requests. You can use the
     [async loop](/sdk/lib/async/include/lib/async/cpp/paged_vmo.h), but if all your
     thread does is respond to paging requests, it can be simpler to create a port and wait on it
     manually. An example is in
     [pager\_thread\_pool.cc](/src/storage/lib/vfs/cpp/pager_thread_pool.cc) implementation.

 To create a pager-backed vmo, call
 [zx\_pager\_create\_vmo](/docs/reference/syscalls/pager_create_vmo) and supply the size, the port
 that you will use to wait on page requests, and a unique ID for your code to associate requests with
 (these are not used by the kernel; they will come out as the `key` in the `zx_port_packet_t` that
 you read from the port).

 This pager-backed vmo should never be directly written or read to by the code backing the pager.
 Doing so will cause the kernel to try to page in the data which will re-enter the pager. This
 vmo will instead be populated on demand using a special API described below.

 At this time, the pager implementation should also register a watcher for "no clones" of the
 pager-backed VMO (see "Freeing the paged VMO" below).

 ### Vending a pager-backed VMO

 A pager would create one pager-backed VMO for each file (or equivalent concept), but typically
 there can be multiple consumers of the same file. To implement this, the pager would keep a single
 pager-backed VMO for each file and send clones of this VMO to each consumer. Using clones is also
 important to know when the clients are done with the mappings (see "Freeing the pager-backed VMO"
 below).

 ```c++
   zx::vmo clone;
   paged_vmo.create_child(ZX_VMO_CHILD_SNAPSHOT_AT_LEAST_ON_WRITE, 0, size, &clone);
 ```

 If the paging data is executable code, it must be marked with the executable permission. Such
 permission is not generally available to user programs. Filesystems that need to vend executable
 pages will be passed a "vmex" [resource](/docs/reference/kernel_objects/resource). The clone is
 marked executable by calling
 [zx\_replace\_as\_executable](/docs/reference/syscalls/vmo_replace_as_executable) to replace the
 clone's handle with one that has the executable permission.

 ## Responding to paging requests

 Page requests for a pager-backed VMO are delivered on the port associated with the
 `zx_pager_create_vmo()` call that created it. They will come in with a packet type of
 `ZX_PKT_TYPE_PAGE_REQUEST` and the `key` will be the unique ID supplied at creation time. The
 pager would use the ID to lookup the information required for the object. An example is in the
 [pager\_thread\_pool.cc](/src/storage/lib/vfs/cpp/pager_thread_pool.cc) implementation.

 The pager responds to the request by either populating the requested range of the VMO or by marking
 it as failed. The pager must always report the entire requested range as either populated or failed
 or the thread that triggered the page request will hang forever waiting on the page fault to
 complete.

 ### Reporting errors

 Pager errors are reported by calling [zx\_pager\_op\_range](/docs/reference/syscalls/pager_op_range)
 with `ZX_PAGER_OP_FAIL`. It is passed the pager-backed VMO handle, the offset and length originally
 requested by the kernel, and an error value. Importantly, the error value must be one of several
 known values (see the syscall documentation).

 ### Supplying pager-backed data

 To supply data, the pager creates an "aux VMO" that holds the data being transmitted to the kernel
 for writing into the pager-backed vmo. This is a normal non-pager-backed VMO that is only used for the
 unique communication of data from a pager to the kernel. The pager then calls
 [zx\_pager\_supply\_pages](/docs/reference/syscalls/pager_supply_pages) with the aux vmo and the
 data range requested by the kernel in the page request.

 Importantly, [zx\_pager\_supply\_pages](/docs/reference/syscalls/pager_supply_pages) enforces some
 requirements of the aux vmo, including that it must not be mapped during the call. You can populate
 it by mapping, writing to it, and then unmapping, or using `zx_vmo_write`. You can re-use the same
 aux vmo for every page request, or create a new one each time. The unique requirements of the aux
 vmo allow the pages of the aux vmo to be spliced into the pager-backed VMO without copying.

 Attempting to write directly into the pager-backed VMO, or even indirectly with `zx_vmo_write()` will cause
 pager requests and will reenter the pager. The use of the aux vmo and the special
 `zx_pager_supply_pages()` function avoid this problem.

 ## Freeing the pager-backed VMO

 When there are no mapped clones of the pager-backed vmo, it can be deleted. A pager implementation
 watches for the "no clones" notification of the main pager-backed VMO to know when this happens. For
 an example, see `PagedVnode::WatchForZeroVmoClones()` in [paged\_vnode.cc](/src/storage/lib/vfs/cpp/paged_vnode.cc).

 One thing to keep in mind is that the kernel will queue this message for delivery. In the meantime,
 it might be possible for the pager to create a new clone of the VMO. As a result, the message
 handler should verify there are actually no clones before doing any cleanup. An example can be seen
 in the `PagedVnode::OnNoPagedVmoClonesMessage()` in
 [paged\_vnode.cc](/src/storage/lib/vfs/cpp/paged_vnode.cc).

 ### Freeing a pager-backed VMO when there are still clones

 In some implementations, it might be possible for the file (or other backing data) to be destroyed
 before the clients of the pager-backed data. This can result in page requests getting delivered to
 the port for objects that are destroyed. Such implementations should be sure to handle this case
 (possibly by validating the unique pager ID and silently returning if there is no corresponding
 pager).

 Although not always necessary, it is also possible to cleanly detach such that the kernel guarantees
 that no more pager requests will be forthcoming. To avoid races with already-queued page requests,
 this operation is a two-step process.

   * The pager calls [zx\_pager\_detach\_vmo](/docs/reference/syscalls/pager_detach_vmo).
   * The kernel will enqueue a port packet with the type `ZX_PAGER_VMO_COMPLETE` for that object.
     After this message is received, no more messages will be received for that VMO.
	# How to implement a user pager

	## Introduction

	Fuchsia filesystems are implemented in userspace so Fuchsia exposes a way to implement pager-backed
	memory in userspace. As client programs access memory, the kernel will send requests to the
	associated pager to populate the corresponding memory.

	Non-filesystem programs can also implement a pager. With the addition of another thread, any
	userspace program can implement paging functionality for a memory region. This can enable
	significant flexibility for dynamically-populated data regions.

	Most filesystems will use the [vfs](/src/storage/lib/vfs) library which implements these concepts
	and exposes them to the filesystem in an easier-to-use interface. Using the "vfs" library is
	recommended when possible to implement filesystems.

	This document describes the low-level implementation details of a writing a pager from scratch.

	## Requesting a pager-backed VMO

	Paged [Virtual Memory Objects](/docs/concepts/kernel/concepts.md) (VMOs) can be created and
	transferred in any way the implementation desires. Most mappings are created by filesystems which
	map memory in response to `GetBackingMemory` requests on the
	[fuchsia.io/File](/docs/reference/fidl/fuchsia.io) interface. (Most client programs will make this
	IPC request indirectly through the Posix `mmap` interface.)

	## Creating a pager-backed VMO

	For general setup:

	* Create a [pager](/docs/reference/kernel_objects/pager) kernel object with
	[zx\_pager\_create](/docs/reference/syscalls/pager_create) (or `zx::pager::create` in C++). This
	object will be used to create individual VMOs. An example of this setup is in the
	[paged_vfs.cc](/src/storage/lib/vfs/cpp/pager-backed.cc) implementation.

	* Create a thread or pool of threads to respond to paging requests. You can use the
	[async loop](/sdk/lib/async/include/lib/async/cpp/paged_vmo.h), but if all your
	thread does is respond to paging requests, it can be simpler to create a port and wait on it
	manually. An example is in
	[pager\_thread\_pool.cc](/src/storage/lib/vfs/cpp/pager_thread_pool.cc) implementation.

	To create a pager-backed vmo, call
	[zx\_pager\_create\_vmo](/docs/reference/syscalls/pager_create_vmo) and supply the size, the port
	that you will use to wait on page requests, and a unique ID for your code to associate requests with
	(these are not used by the kernel; they will come out as the `key` in the `zx_port_packet_t` that
	you read from the port).

	This pager-backed vmo should never be directly written or read to by the code backing the pager.
	Doing so will cause the kernel to try to page in the data which will re-enter the pager. This
	vmo will instead be populated on demand using a special API described below.

	At this time, the pager implementation should also register a watcher for "no clones" of the
	pager-backed VMO (see "Freeing the paged VMO" below).

	### Vending a pager-backed VMO

	A pager would create one pager-backed VMO for each file (or equivalent concept), but typically
	there can be multiple consumers of the same file. To implement this, the pager would keep a single
	pager-backed VMO for each file and send clones of this VMO to each consumer. Using clones is also
	important to know when the clients are done with the mappings (see "Freeing the pager-backed VMO"
	below).

	```c++
	zx::vmo clone;
	paged_vmo.create_child(ZX_VMO_CHILD_SNAPSHOT_AT_LEAST_ON_WRITE, 0, size, &clone);
	```

	If the paging data is executable code, it must be marked with the executable permission. Such
	permission is not generally available to user programs. Filesystems that need to vend executable
	pages will be passed a "vmex" [resource](/docs/reference/kernel_objects/resource). The clone is
	marked executable by calling
	[zx\_replace\_as\_executable](/docs/reference/syscalls/vmo_replace_as_executable) to replace the
	clone's handle with one that has the executable permission.

	## Responding to paging requests

	Page requests for a pager-backed VMO are delivered on the port associated with the
	`zx_pager_create_vmo()` call that created it. They will come in with a packet type of
	`ZX_PKT_TYPE_PAGE_REQUEST` and the `key` will be the unique ID supplied at creation time. The
	pager would use the ID to lookup the information required for the object. An example is in the
	[pager\_thread\_pool.cc](/src/storage/lib/vfs/cpp/pager_thread_pool.cc) implementation.

	The pager responds to the request by either populating the requested range of the VMO or by marking
	it as failed. The pager must always report the entire requested range as either populated or failed
	or the thread that triggered the page request will hang forever waiting on the page fault to
	complete.

	### Reporting errors

	Pager errors are reported by calling [zx\_pager\_op\_range](/docs/reference/syscalls/pager_op_range)
	with `ZX_PAGER_OP_FAIL`. It is passed the pager-backed VMO handle, the offset and length originally
	requested by the kernel, and an error value. Importantly, the error value must be one of several
	known values (see the syscall documentation).

	### Supplying pager-backed data

	To supply data, the pager creates an "aux VMO" that holds the data being transmitted to the kernel
	for writing into the pager-backed vmo. This is a normal non-pager-backed VMO that is only used for the
	unique communication of data from a pager to the kernel. The pager then calls
	[zx\_pager\_supply\_pages](/docs/reference/syscalls/pager_supply_pages) with the aux vmo and the
	data range requested by the kernel in the page request.

	Importantly, [zx\_pager\_supply\_pages](/docs/reference/syscalls/pager_supply_pages) enforces some
	requirements of the aux vmo, including that it must not be mapped during the call. You can populate
	it by mapping, writing to it, and then unmapping, or using `zx_vmo_write`. You can re-use the same
	aux vmo for every page request, or create a new one each time. The unique requirements of the aux
	vmo allow the pages of the aux vmo to be spliced into the pager-backed VMO without copying.

	Attempting to write directly into the pager-backed VMO, or even indirectly with `zx_vmo_write()` will cause
	pager requests and will reenter the pager. The use of the aux vmo and the special
	`zx_pager_supply_pages()` function avoid this problem.

	## Freeing the pager-backed VMO

	When there are no mapped clones of the pager-backed vmo, it can be deleted. A pager implementation
	watches for the "no clones" notification of the main pager-backed VMO to know when this happens. For
	an example, see `PagedVnode::WatchForZeroVmoClones()` in [paged\_vnode.cc](/src/storage/lib/vfs/cpp/paged_vnode.cc).

	One thing to keep in mind is that the kernel will queue this message for delivery. In the meantime,
	it might be possible for the pager to create a new clone of the VMO. As a result, the message
	handler should verify there are actually no clones before doing any cleanup. An example can be seen
	in the `PagedVnode::OnNoPagedVmoClonesMessage()` in
	[paged\_vnode.cc](/src/storage/lib/vfs/cpp/paged_vnode.cc).

	### Freeing a pager-backed VMO when there are still clones

	In some implementations, it might be possible for the file (or other backing data) to be destroyed
	before the clients of the pager-backed data. This can result in page requests getting delivered to
	the port for objects that are destroyed. Such implementations should be sure to handle this case
	(possibly by validating the unique pager ID and silently returning if there is no corresponding
	pager).

	Although not always necessary, it is also possible to cleanly detach such that the kernel guarantees
	that no more pager requests will be forthcoming. To avoid races with already-queued page requests,
	this operation is a two-step process.

	* The pager calls [zx\_pager\_detach\_vmo](/docs/reference/syscalls/pager_detach_vmo).
	* The kernel will enqueue a port packet with the type `ZX_PAGER_VMO_COMPLETE` for that object.
	After this message is received, no more messages will be received for that VMO.