blob: 76001fa5a50cc6f84746a12fa86e892071fdd03a [file] [log] [blame] [view]
# Zircon Kernel Concepts
## Introduction
The kernel manages a number of different types of Objects. Those that are
accessible directly via system calls are C++ classes that implement the
Dispatcher interface. These are implemented in
[kernel/object](/zircon/kernel/object). Many are self-contained higher-level Objects.
Some wrap lower-level [lk][glossary.lk] primitives.
## [System Calls](/reference/syscalls)
Userspace code interacts with kernel objects via system calls, and almost
exclusively via [Handles](/docs/concepts/kernel/handles.md). In userspace, a Handle is represented as
32bit integer (type zx_handle_t). When syscalls are executed, the kernel checks
that Handle parameters refer to an actual handle that exists within the calling
process's handle table. The kernel further checks that the Handle is of the
correct type (passing a Thread Handle to a syscall requiring an event handle
will result in an error), and that the Handle has the required Rights for the
requested operation.
System calls fall into three broad categories, from an access standpoint:
1. Calls that have no limitations, of which there are only a very few, for
example [`zx_clock_get_monotonic()`](/reference/syscalls/clock_get_monotonic.md)
and [`zx_nanosleep()`](/reference/syscalls/nanosleep.md) may be called by any thread.
2. Calls that take a Handle as the first parameter, denoting the Object they act upon,
which are the vast majority, for example [`zx_channel_write()`](/reference/syscalls/channel_write.md)
and [`zx_port_queue()`](/reference/syscalls/port_queue.md).
3. Calls that create new Objects but do not take a Handle, such as
[`zx_event_create()`](/reference/syscalls/event_create.md) and
[`zx_channel_create()`](/reference/syscalls/channel_create.md). Access to these (and limitations
upon them) is controlled by the Job in which the calling Process is contained.
System calls are provided by libzircon.so, which is a "virtual" shared
library that the Zircon kernel provides to userspace, better known as the
[*virtual Dynamic Shared Object* or vDSO](vdso.md).
They are C ELF ABI functions of the form `zx_noun_verb()` or
`zx_noun_verb_direct-object()`.
The system calls are defined in a customized form of FIDL in [//zircon/vdso](/zircon/vdso/).
Those definitions are first processed by `fidlc`, and then by `zither`, which takes the IR
representation from `fidlc` and outputs various formats that are used as glue in the VDSO, kernel,
etc.
## [Handles](/docs/concepts/kernel/handles.md) and [Rights](/docs/concepts/kernel/rights.md)
Objects may have multiple Handles (in one or more Processes) that refer to them.
For almost all Objects, when the last open Handle that refers to an Object is closed,
the Object is either destroyed, or put into a final state that may not be undone.
Handles may be moved from one Process to another by writing them into a Channel
(using [`zx_channel_write()`](/reference/syscalls/channel_write.md)), or by using
[`zx_process_start()`](/reference/syscalls/process_start.md) to pass a Handle as the argument
of the first thread in a new Process.
The actions that may be taken on a Handle or the Object it refers to are governed
by the Rights associated with that Handle. Two Handles that refer to the same Object
may have different Rights.
The [`zx_handle_duplicate()`](/reference/syscalls/handle_duplicate.md) and
[`zx_handle_replace()`](/reference/syscalls/handle_replace.md) system calls may be used to
obtain additional Handles referring to the same Object as the Handle passed in,
optionally with reduced Rights. The [`zx_handle_close()`](/reference/syscalls/handle_close.md)
system call closes a Handle, releasing the Object it refers to, if that Handle is
the last one for that Object. The [`zx_handle_close_many()`](/reference/syscalls/handle_close_many.md)
system call similarly closes an array of handles.
## Kernel Object IDs
Every object in the kernel has a "kernel object id" or "koid" for short.
It is a 64 bit unsigned integer that can be used to identify the object
and is unique for the lifetime of the running system.
This means in particular that koids are never reused.
There are two special koid values:
**ZX_KOID_INVALID** Has the value zero and is used as a "null" sentinel.
**ZX_KOID_KERNEL** There is only one kernel, and it has its own koid.
Kernel generated koids only use 63 bits (which is plenty).
This leaves space for artificially allocated koids by having the most
significant bit set. The sequence in which kernel generated koids are allocated
is unspecified and subject to change.
Artificial koids exist to support things like identifying artificial objects,
like virtual threads in tracing, for consumption by tools.
How artificial koids are allocated is left to each program,
this document does not impose any rules or conventions.
## Running Code: Jobs, Processes, and Threads.
Threads represent threads of execution (CPU registers, stack, etc) within an
address space that is owned by the Process in which they exist. Processes are
owned by Jobs, which define various resource limitations. Jobs are owned by
parent Jobs, all the way up to the Root Job, which was created by the kernel at
boot and passed to [`userboot`, the first userspace Process to begin execution](/docs/concepts/process/userboot.md).
Without a Job Handle, it is not possible for a Thread within a Process to create another
Process or another Job.
[Program loading](/docs/concepts/process/program_loading.md) is provided by userspace facilities and
protocols above the kernel layer.
See: [`zx_process_create()`](/reference/syscalls/process_create.md),
[`zx_process_start()`](/reference/syscalls/process_start.md),
[`zx_thread_create()`](/reference/syscalls/thread_create.md),
and [`zx_thread_start()`](/reference/syscalls/thread_start.md).
## Message Passing: Sockets and Channels
Both Sockets and Channels are IPC Objects that are bi-directional and two-ended.
Creating a Socket or a Channel will return two Handles, one referring to each endpoint
of the Object.
Sockets are stream-oriented and data may be written into or read out of them in units
of one or more bytes. Short writes (if the Socket's buffers are full) and short reads
(if more data is requested than in the buffers) are possible.
Channels are datagram-oriented and have a maximum message size given by **ZX_CHANNEL_MAX_MSG_BYTES**,
and may also have up to **ZX_CHANNEL_MAX_MSG_HANDLES** Handles attached to a message.
They do not support short reads or writes -- either a message fits or it does not.
When Handles are written into a Channel, they are removed from the sending Process.
When a message with Handles is read from a Channel, the Handles are added to the receiving
Process. Between these two events, the Handles continue to exist (ensuring the Objects
they refer to continue to exist), unless the end of the Channel that they have been written
towards is closed -- at which point messages in flight to that endpoint are discarded and
any Handles they contained are closed.
See: [`zx_channel_create()`](/reference/syscalls/channel_create.md),
[`zx_channel_read()`](/reference/syscalls/channel_read.md),
[`zx_channel_write()`](/reference/syscalls/channel_write.md),
[`zx_channel_call()`](/reference/syscalls/channel_call.md),
[`zx_socket_create()`](/reference/syscalls/socket_create.md),
[`zx_socket_read()`](/reference/syscalls/socket_read.md),
and [`zx_socket_write()`](/reference/syscalls/socket_write.md).
## Objects and Signals
Objects may have up to 32 signals (represented by the zx_signals_t type and the ZX_*_SIGNAL_*
defines), which represent a piece of information about their current state. Channels and Sockets,
for example, may be READABLE or WRITABLE. Processes or Threads may be TERMINATED. And so on.
Threads may wait for signals to become active on one or more Objects.
See [signals](/docs/concepts/kernel/signals.md) for more information.
## Waiting: Wait One, Wait Many, and Ports
A Thread may use [`zx_object_wait_one()`](/reference/syscalls/object_wait_one.md)
to wait for a signal to be active on a single handle or
[`zx_object_wait_many()`](/reference/syscalls/object_wait_many.md) to wait for
signals on multiple handles. Both calls allow for a timeout after
which they'll return even if no signals are pending.
Timeouts may deviate from the specified deadline according to timer
slack. See [timer slack](/docs/concepts/kernel/timer_slack.md) for more information.
If a Thread is going to wait on a large set of handles, it is more efficient to use
a Port, which is an Object that other Objects may be bound to such that when signals
are asserted on them, the Port receives a packet containing information about the
pending Signals.
See: [`zx_port_create()`](/reference/syscalls/port_create.md),
[`zx_port_queue()`](/reference/syscalls/port_queue.md),
[`zx_port_wait()`](/reference/syscalls/port_wait.md),
and [`zx_port_cancel()`](/reference/syscalls/port_cancel.md).
## Events, Event Pairs.
An Event is the simplest Object, having no other state than its collection of active Signals.
An Event Pair is one of a pair of Events that may signal each other. A useful property of
Event Pairs is that when one side of a pair goes away (all Handles to it have been
closed), the PEER_CLOSED signal is asserted on the other side.
See: [`zx_event_create()`](/reference/syscalls/event_create.md),
and [`zx_eventpair_create()`](/reference/syscalls/eventpair_create.md).
## Shared Memory: Virtual Memory Objects (VMOs)
Virtual Memory Objects represent a set of physical pages of memory, or the *potential*
for pages (which will be created/filled lazily, on-demand).
They may be mapped into the address space of a Process with
[`zx_vmar_map()`](/reference/syscalls/vmar_map.md) and unmapped with
[`zx_vmar_unmap()`](/reference/syscalls/vmar_unmap.md). Permissions of
mapped pages may be adjusted with [`zx_vmar_protect()`](/reference/syscalls/vmar_protect.md).
VMOs may also be read from and written to directly with
[`zx_vmo_read()`](/reference/syscalls/vmo_read.md) and [`zx_vmo_write()`](/reference/syscalls/vmo_write.md).
Thus the cost of mapping them into an address space may be avoided for one-shot operations
like "create a VMO, write a dataset into it, and hand it to another Process to use."
## Address Space Management
Virtual Memory Address Regions (VMARs) provide an abstraction for managing a
process's address space. At process creation time, a handle to the root VMAR
is given to the process creator. That handle refers to a VMAR that spans the
entire address space. This space can be carved up via the
[`zx_vmar_map()`](/reference/syscalls/vmar_map.md) and
[`zx_vmar_allocate()`](/reference/syscalls/vmar_allocate.md) interfaces.
[`zx_vmar_allocate()`](/reference/syscalls/vmar_allocate.md) can be used to generate new
VMARs (called subregions or children), which can be used to group together
parts of the address space.
See: [`zx_vmar_map()`](/reference/syscalls/vmar_map.md),
[`zx_vmar_allocate()`](/reference/syscalls/vmar_allocate.md),
[`zx_vmar_protect()`](/reference/syscalls/vmar_protect.md),
[`zx_vmar_unmap()`](/reference/syscalls/vmar_unmap.md),
and [`zx_vmar_destroy()`](/reference/syscalls/vmar_destroy.md),
## Futexes
Futexes are kernel primitives used with userspace atomic operations to implement
efficient synchronization primitives -- for example, Mutexes, which only need to make
a syscall in the contended case. Usually they are only of interest to implementers of
standard libraries. Zircon's libc and libc++ provide C11, C++, and pthread APIs for
mutexes, condition variables, etc, implemented in terms of Futexes.
See: [`zx_futex_wait()`](/reference/syscalls/futex_wait.md),
[`zx_futex_wake()`](/reference/syscalls/futex_wake.md),
and [`zx_futex_requeue()`](/reference/syscalls/futex_requeue.md).
[glossary.lk]: /docs/glossary/README.md#lk