| # Exception handling |
| |
| ## Introduction |
| |
| Exception handling support in Zircon was impired by similar support in Mach. |
| |
| Exceptions are mainly used for debugging. Outside of debugging |
| one generally uses ["signals"](signals.md). |
| Signals are the core Zircon mechanism for observing state changes on |
| kernel Objects (a Channel becoming readable, a Process terminating, |
| an Event becoming signaled, etc). |
| See [Signals](#signals) below. |
| |
| The reader is assumed to have a basic understanding of what exceptions like |
| segmentation faults, etc. are, as well as Posix signals. |
| This document does not explain what a segfault is, nor what "exception |
| handling" is at a high level (though it certainly can if there is a need). |
| |
| ## The basics |
| |
| Exceptions are handled by binding a Zircon Port to the Exception Port |
| of the desired object: thread, process, or job. This is done with the |
| [**task_bind_exception_port**() system call](syscalls/task_bind_exception_port.md). |
| |
| Example: |
| |
| ``` |
| zx_handle_t eport; |
| auto status = zx_port_create(0, &eport); |
| // ... check status ... |
| uint32_t options = 0; |
| // The key is anything that is useful to the code handling the exception. |
| uint64_t child_key = 0; |
| // Assume |child| is a process handle. |
| status = zx_task_bind_exception_port(child, eport, child_key, options); |
| // ... check status ... |
| ``` |
| |
| When an exception occurs a report is sent to the port, |
| after which the receiver must reply with either "exception handled" |
| or "exception not handled". |
| The thread stays paused until then, or until the port is unbound, |
| either explicitly or by the port being closed (say because the handler |
| process exited). |
| |
| Here is a simple exception handling loop. |
| The main components of it are the call to the |
| [**port_wait**() system call](syscalls/port_wait.md) |
| to wait for an exception, or anything else that's interesting, to happen, |
| and the call to the |
| [**thread_resume**() system call](syscalls/thread_resume.md) |
| to indicate the handler is finished processing the exception. |
| |
| ``` |
| while (true) { |
| zx_port_packet_t packet; |
| auto status = zx_port_wait(eport, ZX_TIME_INFINITE, packet, 1); |
| // ... check status ... |
| if (packet.key != child_key) { |
| // ... do something else, depending on what else the port is used for ... |
| continue; |
| } |
| if (!ZX_PKT_IS_EXCEPTION(packet.type)) { |
| // ... probably a signal, process it ... |
| continue; |
| } |
| zx_koid_t packet_tid = packet.exception.tid; |
| zx_handle_t thread; |
| status = zx_object_get_child(child, packet_tid, ZX_RIGHT_SAME_RIGHTS, |
| &thread); |
| // ... check status ... |
| bool handled = process_exception(child, thread, &packet); |
| uint32_t resume_flags = ZX_RESUME_EXCEPTION; |
| if (!handled) |
| resume_flags |= ZX_RESUME_TRY_NEXT; |
| status = zx_task_resume(thread, resume_flags); |
| // ... check status ... |
| status = zx_handle_close(thread); |
| assert(status == ZX_OK); |
| } |
| ``` |
| |
| To unbind an exception port, pass **ZX_HANDLE_INVALID** for the |
| exception port: |
| |
| ``` |
| uint32_t options = 0; |
| status = zx_task_bind_exception_port(child, ZX_HANDLE_INVALID, |
| key, options); |
| // ... check status ... |
| ``` |
| |
| ## Exception processing details |
| |
| When a thread gets an exception it is paused while the kernel processes |
| the exception. The kernel looks for bound exception ports in a specific order |
| and if it finds one an "exception report" is sent to the bound port. |
| |
| Exception reports are messages sent through the port with a specific format |
| defined by the port message protocol. The packet contents are defined by |
| the *zx_exception_packet_t* type defined in |
| [`<zircon/syscalls/port.h>`](../system/public/zircon/syscalls/port.h). |
| |
| The exception handler is intended to read the message, decide how it |
| wants to process the exception, and then resume the thread that got the |
| exception with the [**task_resume**() system call](syscalls/task_resume.md). |
| |
| Resuming the thread can be done in either of two ways: |
| |
| - Resume execution of the thread as if the exception has been resolved. |
| If the thread gets another exception then exception processing begins |
| again anew. An example of when one would do this is when resuming after a |
| debugger breakpoint. |
| |
| ``` |
| auto status = zx_task_resume(thread, ZX_RESUME_EXCEPTION); |
| // ... check status ... |
| ``` |
| |
| - Resume exception processing, marking the exception as "unhandled" by the |
| current handler, and giving the next exception port in the search order a |
| chance to process the exception. An example of when one would do this is |
| when the exception is not one the handler intends to process. |
| |
| ``` |
| auto status = zx_task_resume(thread, |
| ZX_RESUME_EXCEPTION | ZX_RESUME_TRY_NEXT); |
| // ... check status ... |
| ``` |
| |
| If there are no remaining exception ports to try the kernel terminates |
| the process. |
| |
| Resuming the thread requires a handle of the thread, which the handler |
| may not yet have. The handle is obtained with the |
| [**object_get_child**() system call](syscalls/object_get_child.md). |
| The pid,tid necessary to look up the thread are contained in the |
| exception report. See the above trivial exception handler example. |
| |
| ## Exception search order |
| |
| Exception ports are searched in the following order: |
| |
| - *Debugger* - The debugger exception port is associated with processes, and |
| is for things like gdb and lldb. To bind to the debugger exception port |
| pass **ZX_EXCEPTION_PORT_DEBUGGER** in *options* when binding an |
| exception port to the process. |
| There is only one debugger exception port per process. |
| |
| - *Thread* - This is for exception ports bound directly to the thread. |
| There is only one thread exception port per thread. |
| |
| - *Process* - This is for exception ports bound directly to the process. |
| There is only one process exception port per process. |
| |
| - *Job* - This is for exception ports bound to the process's job. Note that |
| jobs have a hierarchy. First the process's job is searched. If it has a bound |
| exception port then the exception is delivered to that port. If it does not |
| have a bound exception port, or if the handler returns **ZX_RESUME_TRY_NEXT**, |
| then that job's parent job is searched, and so on right up to the root job. |
| |
| - *System* - This is the last port searched and gives the system a chance to |
| process the exception before the kernel kills the process. |
| |
| If no exception port handles the exception then the kernel finishes |
| exception processing by killing the process. |
| |
| Notes: |
| |
| - The "system" exception port is going away, to be supplanted by |
| the root job. |
| |
| - The search order is different than that of Mach. In Zircon the |
| debugger exception port is tried first, before all other ports. |
| This is useful for at least a few reasons: |
| |
| - Allows "fix and continue" debugging. E.g., if a thread gets a segfault, |
| the debugger user can fix the segfault and resume the thread before the |
| thread even knows it got a segfault. |
| - Makes debugger breakpoints easier to reason about. |
| |
| ## Types of exceptions |
| |
| At a high level there are two types of exceptions: architectural and |
| synthetic. |
| Architectural exceptions are things like a segment fault (e.g., dereferencing |
| the NULL pointer) or executing an undefined instruction. Synthetic exceptions |
| are things like thread start and exit notifications. |
| |
| Exception types are enumerated in the *zx_excp_type_t* enum defined |
| in [`<zircon/syscalls/exception.h>`](../system/public/zircon/syscalls/exception.h). |
| |
| Some exceptions are debugger specific, and are only sent to the |
| debugger exception port. These exceptions are: |
| |
| - **ZX_EXCP_THREAD_STARTING** |
| - **ZX_EXCP_THREAD_EXITING** |
| |
| ## Interaction with thread suspension |
| |
| Although the same system call, **zx_thread_resume**(),is used to resume |
| threads from both exceptions and suspensions, they are treated separately. |
| In other words, a thread can be both in an exception and be suspended. |
| This can happen if the thread is suspended while waiting for a response |
| from an exception handler. The thread stays paused until it is resumed |
| for both the exception and the suspension: two calls to |
| **zx_thread_resume**() are required, one for the exception: |
| |
| ``` |
| auto status = zx_task_resume(thread, ZX_RESUME_EXCEPTION); |
| // ... check status ... |
| ``` |
| |
| and one for the suspension: |
| |
| ``` |
| auto status = zx_task_resume(thread, 0); |
| // ... check status ... |
| ``` |
| |
| The order does not matter. |
| |
| ## Signals |
| |
| Signals are the core Zircon mechanism for observing state changes on |
| kernel Objects (a Channel becoming readable, a Process terminating, |
| an Event becoming signaled, etc). See ["signals"](signals.md). |
| |
| Unlike exceptions, signals do not require a response from an exception handler. |
| On the other hand signals are sent to whomever is waiting on the thread's |
| handle, instead of being sent to the exception port that could be |
| bound to the thread's process. |
| This is generally not a problem for exception handlers because they generally |
| keep track of thread handles anyway. For example, they need the thread handle |
| to resume the thread after an exception. |
| |
| It does, however, mean that an exception handler must wait on the |
| port *and* every thread handle that it wishes to monitor. |
| Fortunately, one can reduce this to continuing to just have to wait |
| on the port by using the |
| [**object_wait_async**() system call](syscalls/object_wait_async.md) |
| to have signals regarding each thread sent to the port. |
| In other words, there is still just one system call involved to wait |
| for something interesting to happen. |
| |
| ``` |
| uint64_t key = some_key_denoting_the_thread; |
| bool is_suspended = thread_is_suspended(thread); |
| zx_signals_t signals = ZX_THREAD_TERMINATED; |
| if (is_suspended) |
| signals |= ZX_THREAD_RUNNING; |
| else |
| signals |= ZX_THREAD_SUSPENDED; |
| uint32_t options = ZX_WAIT_ASYNC_ONCE; |
| auto status = zx_object_wait_async(thread, eport, key, signals, options); |
| // ... check status ... |
| ``` |
| |
| When the thread gets any of the specified signals a **ZX_PKT_TYPE_SIGNAL_ONE** |
| packet will be sent to the port. After processing the signal the above |
| call to **zx_object_wait_async**() must be done again, that is the nature |
| of **ZX_WAIT_ASYNC_ONCE**. |
| |
| *Note:* There is both an exception and a signal for thread termination. |
| The **ZX_EXCP_THREAD_EXITING** exception is sent first. When the thread |
| is finally terminated the **ZX_THREAD_TERMINATED** signal is sent. |
| |
| The following signals are relevant to exception handlers: |
| |
| - **ZX_THREAD_TERMINATED** |
| - **ZX_THREAD_SUSPENDED** |
| - **ZX_THREAD_RUNNING** |
| |
| When a thread is started **ZX_THREAD_RUNNING** is asserted. |
| When it is suspended **ZX_THREAD_RUNNING** is deasserted, and |
| **ZX_THREAD_SUSPENDED** is asserted. When the thread is resumed |
| **ZX_THREAD_SUSPENDED** is deasserted and **ZX_THREAD_RUNNING** is |
| asserted. When a thread terminates both **ZX_THREAD_RUNNING** and |
| **ZX_THREAD_SUSPENDED** are deasserted and **ZX_THREAD_TERMINATED** |
| is asserted. However, signals are OR'd into the state maintained by |
| by the port thus you may see any combination of requested signals |
| when **zx_port_wait**() returns. |
| |
| ## Comparison with Posix (and Linux) |
| |
| This table shows equivalent terms, types, and function calls between |
| Zircon and Posix/Linux for exceptions and the kinds of things exception |
| handlers generally do. |
| |
| ``` |
| Zircon Posix/Linux |
| ------ ----------- |
| Exception/Signal Signal |
| ZX_EXCP_* SIG* |
| task_bind_exception_port() ptrace(ATTACH,DETACH) |
| task_resume() kill(SIGCONT),ptrace(CONT) |
| task_suspend() kill(SIGSTOP),ptrace(KILL(SIGSTOP)) |
| N/A kill() |
| TBD signal()/sigaction() |
| port_wait() wait*() |
| zx_packet_exception_t siginfo_t |
| zx_exception_context_t siginfo_t |
| thread_read_state ptrace(GETREGS,GETREGSET) |
| thread_write_state ptrace(SETREGS,SETREGSET) |
| process_read_memory ptrace(PEEKTEXT) |
| process_write_memory ptrace(POKETEXT) |
| ``` |
| |
| Zircon does not have asynchronous signals like SIGINT, SIGQUIT, SIGTERM, |
| SIGUSR1, SIGUSR2, and so on. |
| |
| Another significant different from Posix is that the exception handler |
| is always run on a separate thread. |
| |
| ## Example programs |
| |
| There are three good example programs in the Zircon tree to use to |
| further one's understanding of exceptions and signals in Zircon. |
| |
| - `system/core/crashlogger` |
| |
| This is the crashlogger "daemon", it currently prints backtraces |
| of crashing programs. |
| |
| - `system/utest/exception` |
| |
| The basic exception handling testcase. |
| |
| - `system/utest/debugger` |
| |
| Testcase for the rest of the system calls a debugger would use, beyond |
| those exercised by system/utest/exception. |
| There are tests for segfault recovery, reading/writing thread registers, |
| reading/writing process memory, as well as various other tests. |
| |
| ## Todo |
| |
| There are a few outstanding issues: |
| |
| - signal()/sigaction() replacement |
| |
| In Posix one is able to specify handlers for particular signals, |
| whereas in Zircon there is currently just the exception port, |
| and the handler is expected to understand all possible exceptions. |
| This is tracked as ZX-560. |
| |
| - more selectiveness in which exceptions to see |
| |
| In addition to ZX-560 IWBN to be able to specify to the kernel |
| when binding the exception port that one is only interested in |
| seeing a particular subset of exceptions. |
| This is tracked as ZX-990. |
| |
| - ability to say exception ports unbind quietly when closed |
| |
| The default behaviour when a port is unbound implicitly due to |
| the port being closed is to resume exception processing, i.e., |
| given the next exception port in the search order a try. |
| In debugging sessions it is useful to change the default behavior |
| and have the port unbound "quietly", in other words leave things as |
| is, with the thread still waiting for an exception response. |
| This is because debuggers can crash, and obliterating an active debugging |
| session is counterproductive. |
| This is tracked as ZX-988. |
| |
| - rights for binding exception ports and getting debuggable thread handles |
| |
| In Zircon rights can, in general, only be taken away, they can't be added. |
| However, one doesn't want to have "debuggability" a default right: |
| debuggers are privileged processes. Thus we need a way to obtain handles |
| with sufficient rights for debugging. |
| This is tracked as ZX-509, ZX-911, and ZX-923. |
| |
| - strace? |
| |
| There is currently no way to trace syscalls like there is in Linux. |
| The typical way this would be implemented is with syscall start/end |
| synthetic exceptions. |
| It's a nice feature, but it's not necessary. Plus while useful, |
| strace operates at the syscall layer and thus is confusing when |
| trying to trace things like fork, which is no longer implemented |
| as a syscall. Since every syscall in Zircon is via the vdso, it |
| makes more sense to implement this by having breakpoints on all |
| the relevant vdso entry points. |
| This is tracked as ZX-567. |
| |
| - restrictions on **zx_task_resume**() |
| |
| This is tracked as ZX-562. The basic discussion is about only allowing |
| appropriate processes to resume a thread in an exception. |
| The same thing applies to thread suspension: IWBN if when a process |
| suspends a thread, then another process can't come along and resume it |
| against the suspender's wishes. |
| |
| - no way to obtain currently bound port or to chain handlers |
| |
| Currently, there's no way to get the currently bound exception port. |
| Possible use-cases are for debugging purposes (e.g, to see what's going on |
| in the system). |
| Another possible use-case is to allow chaining exception handlers, though for |
| the case of in-process chaining it's likely better to use a |
| signal()/sigaction() replacement (see ZX-560). |
| This is tracked as ZX-1216. |