tree: 65a1b725872bada71fb4f2443fe702a99a37b3fd [path history] [tgz]
  1. crash_reporter.rs
  2. executor.rs
  3. loop_entry.rs
  4. mod.rs
  5. README.md
  6. task_creation.rs
src/starnix/kernel/core/execution/README.md

Starnix Execution

The execution module handles executing Linux tasks inside Starnix. It uses Zircon‘s “restricted mode” to execute Linux tasks efficiently. This crate also provides a bridge between Fuchsia’s execution model and Linux tasks.

Container logic

Linux tasks run inside a “container” which provides a shared environment similar to a virtual machine or Docker container. A container is associated with a single Starnix Kernel object, a root system image and init configuration.

The code in container.rs deals with these concerns.

Executor

The executor is responsible for transferring control in and out of Linux logic and coordinating state changes with Zircon. To achieve this, the executor sets up Zircon objects (processes, vmars, etc) to contain and execute Linux logic.

The restricted executor takes advantage of Zircon's restricted execution mode feature (https://fuchsia.dev/fuchsia-src/reference/syscalls#restricted_mode_work_in_progress) to efficiently handle syscalls from Linux. Specifically:

  1. A Zircon process is created for each Linux thread group.

  2. A Zircon thread is created for each Linux thread.

  3. The process' address space is divided in to two ranges: a restricted range covering the lower half of the userspace range and a shared range covering the upper half.

    Linux programs have access to the restricted range. The shared range is the same for every process in the same container and is used to manage Starnix state across the container.

  4. Threads in this process can be executing in either “restricted mode” or “normal mode”. Threads begin execution in normal mode with the shared range accessible. Starnix enters restricted mode by issuing a zx_restricted_enter() syscall which makes the restricted range accessible and the shared range inaccessible.

  5. Linux code runs in restricted mode until it issues a syscall instruction or generates an exception. On a syscall from restricted mode, Zircon places the thread back in normal mode and returns from the zx_restricted_enter() syscall. The restricted executor then decodes and dispatches the syscall. The same pattern applies to other exits from restricted mode. For example, see the next item...

  6. On an exception, Zircon places the thread back in normal mode, and returns from the zx_restricted_enter() syscall. The restricted executor then executes Starnix code to handle the Zircon exception. Some exceptions are handled internally within Starnix by adjusting the memory mapping or other state. Other exceptions generate Linux signals which are delivered according to the task's signal disposition.

  7. The executor exports data about the state of the Linux address space to Fuchsia-aware debugging tools such as crashsvc and zxdb.

This diagram shows the process, address space, and thread relationships for a Linux thread group containing 2 threads running in the restricted executor:

                  Zircon process

                  Restricted vmar

                  0x...020000 0x4000...100000

                 +----------------------+
                 |Linux thread group    |
                 |                      |
                 | Thread 1 |  Thread 2 |
                 |          |           |
restricted_enter |          |           |
+----------------+---->     |  fault    |    ZX_EXCP_...
|                |          |     ------+-------------------+
|                | syscall  |           |                   |
|            +---+-----     |           |                   |
|            |   |          |           |                   |
|            |   |          |           |                   |
|            |   +----------+-----------+                   |
|            |                                              |
|            |   Shared (aka root) vmar                     |
|            |                                              |
|            |   0x4000...100000  0x8000...1000             |
|            |                                              |
|            |   +---------------------+                    |
|            |   | Restricted executor |                    |
|            |   |                     |                    |
|            |   | Thread 1 | Thread 2 |                    |
|            |   |          |          |                    |
|            |   |          |          |                    |
+------------+---+-----     |          |                    |
             |   |          |   <------+--------------------+
             +---+---->     |          |
                 |          |          |
                 |          |          |
                 |          |          |
                 +----------+----------+

The shared portion of the address space is shared between all Linux thread groups in the same container. This allows Starnix to access information about any thread group in the container when handling a system call or exception.

Execution flow

The execution flow structure is as follows:

  • Starnix sets up the initial restricted mode state
  • Starnix calls an assembly routine restricted_enter_loop providing a callback and context
    • restricted_enter_loop stores all callee-saved registers and the callback + context on the normal mode stack
    • restricted_enter_loop calls zx_restricted_enter to switch to restricted mode
      • If this fails, restricted_enter_loop unwinds and returns the error from Zircon
      • Zircon switches the thread's architectural state to the restricted mode copy and transitions memory protections to restricted mode.
        • Linux logic runs in restricted mode until exit (exception, syscall, or kick)
      • Zircon switches the thread's architectural state to the normal mode copy and transitions memory protections to normal mode
    • Zircon jumps to restricted_return_loop with the stored context in a register
    • restricted_return_loop stores the address of the restricted mode register state in a register and emits CFI directives telling unwinders that the logical stack continues into restricted mode
    • restricted_return_loop invokes callback providing the restricted mode exit value
      • restricted_enter_callback in Starnix interprets the restricted mode exit and decides whether to update the restricted mode state and re-enter restricted mode or exit
    • restricted_return_loop reads the bool from the callback and either jumps back to the middle of restricted_enter_loop to re-enter restricted mode or to the epilogue of restricted_enter_loop to return.
  • restricted_enter_loop returns once the task is ready to unwind