Just In Time Debugging

Overview

Just In Time Debugging (JITD) is a way for Fuchsia to suspend processes that crash so that interested parties can debug/process them later. This permits interesting flows such as attaching zxdb to a program that crashed overnight, when the debugger was not attached/running.

This is done by storing process in exceptions in a special place called the “Process Limbo”. This place will keep those processes suspended until some other agent comes and releases them.

See Implementation for more details about how it works.

How to enable it

One of the great benefits of the Process Limbo is to be able to catch crashing processes in the wild, without the need to have already running debuggers. This is specially useful for situations where the debugger cannot be running, such as driver startup. For such cases, having an active Process Limbo can provide an invaluable source of debugging information.

There are two ways of enabling the Process Limbo:

Manual activation

The Process Limbo comes with a CLI tool that permits the user to query the current state of the limbo:

$ run run fuchsia-pkg://fuchsia.com/limbo-client#meta/limbo_client.cmx
Usage: limbo [--help] <option>

  The process limbo is a service that permits the system to suspend any processes that throws an
  exception (crash) for later processing/debugging. This CLI tool permits to query and modify the
  state of the limbo.

  Options:
    --help: Prints this message.
    enable: Enable the process limbo. It will now begin to capture crashing processes.
    disable: Disable the process limbo. Will free any pending processes waiting in it.
    list: Lists the processes currently waiting on limbo. The limbo must be active.
    release: Release a process from limbo. The limbo must be active. Usage: limbo release <pid>.

Enable on startup

Manual activation works only if you have a way to send commands to the system. But some development environments run software earlier that the user can interact with (or run a debugger). Drivers are a good example of this. For those cases, having the Process Limbo active from the start lets you catch driver crashes as they occur while the driver is spinning up, which is normally the hardest part to debug.

In order to do this, there is a configuration that has to be set into the build:

fx set <YOUR CONFIG> --with-base //src/developer/forensics:exceptions_enable_jitd_on_startup

Or add this label to the base_package_labels in your build args. You can still use the Process Limbo CLI tool to disable and manipulate the limbo afterwards. Then you will need to push an update to your device for this to take an effect.

NOTE: Driver initialization is finicky and freezing crashing process can leave the system in an undefined state and “hang” it, so your mileage may vary when using this feature, especially for very early drivers.

How to use it

zxdb

The main user of JITD is zxdb, which is able to attach to a process waiting in the limbo. When starting zxdb, it will display the processes that are waiting in it:

> fx debug
Checking for debug agent on [fe80::2e0:4cff:fe68:8d%3]:2345.
Debug agent not found. Starting one.
Connecting (use "disconnect" to cancel)...
Connected successfully.

👉 To get started, try "status" or "help".

Processes waiting on exception:
272401: crasher
Type "attach <pid>" to reconnect.

[zxdb] attach 272401
Process 1 [Running] koid=272401 crashed
Attached Process 1 [Running] koid=272401 crasher
[Warning] Received thread exception for an unknown thread.

[zxdb] thread
  # State                 Koid Name
â–¶ 1 Blocked (Exception) 272403 initial-thread

[zxdb] frame
▶ 0 blind_write(volatile unsigned int*) • crasher.c:22 (inline)
  1 main(int, char**) • crasher.c:201
  2 start_main(const start_params*) • __libc_start_main.c:93
  3 __libc_start_main(zx_handle_t, int (*)(int, char**, char**)) • __libc_start_main.c:165
  4 _start + 0x14

[zxdb] list
   17   int (*func)(volatile unsigned int*);
   18   const char* desc;
   19 } command_t;
   20
   21 int blind_write(volatile unsigned int* addr) {
 â–¶ 22   *addr = 0xBAD1DEA;
   23   return 0;
   24 }
   25
   26 int blind_read(volatile unsigned int* addr) { return (int)(*addr); }
   27
   28 int blind_execute(volatile unsigned int* addr) {
   29   void (*func)(void) = (void*)addr;
   30   func();
   31   return 0;

Within zxdb you can also do help process-limbo to get more information about how to use it.

Process Limbo FIDL Service

The Process Limbo presents itself as a FIDL service, which is what the Process Limbo CLI tool and zxdb use. The FIDL protocol is defined in zircon/system/fidl/fuchsia-exception/process_limbo.fidl.

A good example about how to use the API is the Process Limbo CLI tool itself: src/developer/forensics/exceptions/limbo_client/limbo_client.cc.

Implementation

Crash Service

When a process throws an exception, Zircon will generate an associated exception handle. It will then look if there are any listeners in any associated exception channels that might be interested in handling that exception. That is how debuggers such as zxdb get the exceptions from running processes. See the exceptions handling for more details.

But when there are no more exception handlers left, either because there weren't any or they all decided to pass on handling it, the root job has an exclusive handler called crashsvc. Once an exception has reached the Crash Service, it is understood that it has “crashed” and that no program was able to handle it. The Crash Service will then dump the crashing stack trace to the logs and pass the exception over to the Exception Broker.

Exception Broker

The Exception Broker is in charge of deciding what is to be done with a crashing exception, depending on the actual system configuration. It might decide to create a minidump file and dump a crash report, send the exception over to the Process Limbo or kill the process.

The Exception Broker is aware of the Process Limbo and whether it is active or not. When it receives an exception, it will check whether the Process Limbo is enabled. If so, it will pass the exception handle over to it. This is the same Process Limbo exposed by the FIDL service.