Zircon Kernel Concepts

Introduction

The kernel manages a number of different types of Objects. Those which are accessible directly via system calls are C++ classes which implement the Dispatcher interface. These are implemented in kernel/object. Many are self-contained higher-level Objects. Some wrap lower-level lk primitives.

System Calls

Userspace code interacts with kernel objects via system calls, and almost exclusively via Handles. In userspace, a Handle is represented as 32bit integer (type zx_handle_t). When syscalls are executed, the kernel checks that Handle parameters refer to an actual handle that exists within the calling process's handle table. The kernel further checks that the Handle is of the correct type (passing a Thread Handle to a syscall requiring an event handle will result in an error), and that the Handle has the required Rights for the requested operation.

System calls fall into three broad categories, from an access standpoint:

  1. Calls which have no limitations, of which there are only a very few, for example zx_clock_get_monotonic() and zx_nanosleep() may be called by any thread.
  2. Calls which take a Handle as the first parameter, denoting the Object they act upon, which are the vast majority, for example zx_channel_write() and zx_port_queue().
  3. Calls which create new Objects but do not take a Handle, such as zx_event_create() and zx_channel_create(). Access to these (and limitations upon them) is controlled by the Job in which the calling Process is contained.

System calls are provided by libzircon.so, which is a “virtual” shared library that the Zircon kernel provides to userspace, better known as the virtual Dynamic Shared Object or vDSO. They are C ELF ABI functions of the form zx_noun_verb() or zx_noun_verb_direct-object().

The system calls are defined in a customized form of FIDL in //zircon/vdso. Those definitions are first processed by fidlc, and then by kazoo which takes the IR representation from fidlc and outputs various formats that are used as glue in the VDSO, kernel, etc.

Handles and Rights

Objects may have multiple Handles (in one or more Processes) that refer to them.

For almost all Objects, when the last open Handle that refers to an Object is closed, the Object is either destroyed, or put into a final state that may not be undone.

Handles may be moved from one Process to another by writing them into a Channel (using zx_channel_write()), or by using zx_process_start() to pass a Handle as the argument of the first thread in a new Process.

The actions which may be taken on a Handle or the Object it refers to are governed by the Rights associated with that Handle. Two Handles that refer to the same Object may have different Rights.

The zx_handle_duplicate() and zx_handle_replace() system calls may be used to obtain additional Handles referring to the same Object as the Handle passed in, optionally with reduced Rights. The zx_handle_close() system call closes a Handle, releasing the Object it refers to, if that Handle is the last one for that Object. The zx_handle_close_many() system call similarly closes an array of handles.

Kernel Object IDs

Every object in the kernel has a “kernel object id” or “koid” for short. It is a 64 bit unsigned integer that can be used to identify the object and is unique for the lifetime of the running system. This means in particular that koids are never reused.

There are two special koid values:

ZX_KOID_INVALID Has the value zero and is used as a “null” sentinel.

ZX_KOID_KERNEL There is only one kernel, and it has its own koid.

Kernel generated koids only use 63 bits (which is plenty). This leaves space for artificially allocated koids by having the most significant bit set. The sequence in which kernel generated koids are allocated is unspecified and subject to change.

Artificial koids exist to support things like identifying artificial objects, like virtual threads in tracing, for consumption by tools. How artificial koids are allocated is left to each program, this document does not impose any rules or conventions.

Running Code: Jobs, Processes, and Threads.

Threads represent threads of execution (CPU registers, stack, etc) within an address space which is owned by the Process in which they exist. Processes are owned by Jobs, which define various resource limitations. Jobs are owned by parent Jobs, all the way up to the Root Job which was created by the kernel at boot and passed to userboot, the first userspace Process to begin execution.

Without a Job Handle, it is not possible for a Thread within a Process to create another Process or another Job.

Program loading is provided by userspace facilities and protocols above the kernel layer.

See: zx_process_create(), zx_process_start(), zx_thread_create(), and zx_thread_start().

Message Passing: Sockets and Channels

Both Sockets and Channels are IPC Objects which are bi-directional and two-ended. Creating a Socket or a Channel will return two Handles, one referring to each endpoint of the Object.

Sockets are stream-oriented and data may be written into or read out of them in units of one or more bytes. Short writes (if the Socket's buffers are full) and short reads (if more data is requested than in the buffers) are possible.

Channels are datagram-oriented and have a maximum message size given by ZX_CHANNEL_MAX_MSG_BYTES, and may also have up to ZX_CHANNEL_MAX_MSG_HANDLES Handles attached to a message. They do not support short reads or writes -- either a message fits or it does not.

When Handles are written into a Channel, they are removed from the sending Process. When a message with Handles is read from a Channel, the Handles are added to the receiving Process. Between these two events, the Handles continue to exist (ensuring the Objects they refer to continue to exist), unless the end of the Channel which they have been written towards is closed -- at which point messages in flight to that endpoint are discarded and any Handles they contained are closed.

See: zx_channel_create(), zx_channel_read(), zx_channel_write(), zx_channel_call(), zx_socket_create(), zx_socket_read(), and zx_socket_write().

Objects and Signals

Objects may have up to 32 signals (represented by the zx_signals_t type and the ZX_SIGNAL defines) which represent a piece of information about their current state. Channels and Sockets, for example, may be READABLE or WRITABLE. Processes or Threads may be TERMINATED. And so on.

Threads may wait for signals to become active on one or more Objects.

See signals for more information.

Waiting: Wait One, Wait Many, and Ports

A Thread may use zx_object_wait_one() to wait for a signal to be active on a single handle or zx_object_wait_many() to wait for signals on multiple handles. Both calls allow for a timeout after which they'll return even if no signals are pending.

Timeouts may deviate from the specified deadline according to timer slack. See timer slack for more information.

If a Thread is going to wait on a large set of handles, it is more efficient to use a Port, which is an Object that other Objects may be bound to such that when signals are asserted on them, the Port receives a packet containing information about the pending Signals.

See: zx_port_create(), zx_port_queue(), zx_port_wait(), and zx_port_cancel().

Events, Event Pairs.

An Event is the simplest Object, having no other state than its collection of active Signals.

An Event Pair is one of a pair of Events that may signal each other. A useful property of Event Pairs is that when one side of a pair goes away (all Handles to it have been closed), the PEER_CLOSED signal is asserted on the other side.

See: zx_event_create(), and zx_eventpair_create().

Shared Memory: Virtual Memory Objects (VMOs)

Virtual Memory Objects represent a set of physical pages of memory, or the potential for pages (which will be created/filled lazily, on-demand).

They may be mapped into the address space of a Process with zx_vmar_map() and unmapped with zx_vmar_unmap(). Permissions of mapped pages may be adjusted with zx_vmar_protect().

VMOs may also be read from and written to directly with zx_vmo_read() and zx_vmo_write(). Thus the cost of mapping them into an address space may be avoided for one-shot operations like “create a VMO, write a dataset into it, and hand it to another Process to use.”

Address Space Management

Virtual Memory Address Regions (VMARs) provide an abstraction for managing a process's address space. At process creation time, a handle to the root VMAR is given to the process creator. That handle refers to a VMAR that spans the entire address space. This space can be carved up via the zx_vmar_map() and zx_vmar_allocate() interfaces. zx_vmar_allocate() can be used to generate new VMARs (called subregions or children) which can be used to group together parts of the address space.

See: zx_vmar_map(), zx_vmar_allocate(), zx_vmar_protect(), zx_vmar_unmap(), and zx_vmar_destroy(),

Futexes

Futexes are kernel primitives used with userspace atomic operations to implement efficient synchronization primitives -- for example, Mutexes which only need to make a syscall in the contended case. Usually they are only of interest to implementers of standard libraries. Zircon's libc and libc++ provide C11, C++, and pthread APIs for mutexes, condition variables, etc, implemented in terms of Futexes.

See: zx_futex_wait(), zx_futex_wake(), and zx_futex_requeue().