blob: 8758f2efc3ba9de7c8ecef2f5dac4fe55a158f98 [file] [log] [blame] [view]
# Debugging kernel crashes with JTRACE
## Overview
### What it is
JTRACE is a simple, lightweight, "just-in-case" tracing system meant to assist
developers in debugging kernel crashes and spontaneous reboots by recording a
series of trace event records generated by instrumentation added by a developer,
and automatically playing them back in the event of a kernel panic. The intent
is to give developers the ability to create trace records from almost anywhere
in kernel code which will give them additional information, beyond just a
register dump and stack trace, to assist in understanding the sequence of events
which led up to a crash. Useful properties of JTRACE include:
+ Lockless ring buffer data structure: Creating a trace event will never
block, and will never attempt to acquire any locks, making it available for
use during early init, inside of arbitrary locks, during hard IRQ time, and
almost anywhere else in the kernel code.
+ Low overhead: Creating trace events is a relatively cheap operation, making it
possible to instrument performance sensitive code with minimal timing
disruption.
+ Optional persistence across reboot: Platform permitting, JTRACE may be
configured to use some of the persistent RAM typically used for crashlog
storage to assist in debugging issues involving spontaneous reboots, typically
caused by triggering a [hardware watchdog timer](/docs/concepts/kernel/watchdog.md).
### What it is not
JTRACE is not a production tracing system. By default, it is completely
disabled, and it is never meant to be enabled in a production build. Trace
points are _not_ meant to be left in code by developers in changes which are
checked in. Typical usage of JTRACE involves a developer adding trace points to
their code, reproducing a crash, then modifying existing trace points or adding
new ones, and repeating until they have root caused their bug. Once the bug has
been found, understood, and fixed, the code which added the trace points is
discarded.
## Configuration {#configuration}
Configuration of JTRACE is controlled using build arguments set via `fx set`.
There are currently 4 options which can be specified. They are:
Name | Type | Notes
----------------------------|-----------|--------
`jtrace_enabled` | tri-state | Enables JTRACE when set to `true` or `"persistent"`. Disabled by default.
`jtrace_target_buffer_size` | uint32_t | Controls the amount of RAM reserved for the trace buffer.
`jtrace_last_entry_storage` | uint32_t | Controls the amount of RAM reserved for [per-cpu last event records](#per-cpu-last-event-records).
`jtrace_use_large_entries` | bool | Controls whether large or small [trace entries](#trace-entries) are used.
By default, JTRACE is disabled and not included in a kernel build. To enable
JTRACE, users may set the `jtrace_enabled` build argument to a value of either
`true`, or `"persistent"` to enable JTRACE and configure it for
[persistent tracing](#persistent-tracing).
If [persistent tracing](#persistent-tracing) is chosen,
`jtrace_target_buffer_size` *must* be a multiple of the kernel's configured
`persistent_ram_allocation_granularity` value (defaults to 128 bytes).
If not explicitly specified by the user, defaults values for each of the other
configuration parameters will be chosen based on whether or not [persistent
tracing](#persistent-tracing) has been selected.
Name | Non-persistent Default | Persistent Default
----------------------------|------------------------|--------------------
`jtrace_target_buffer_size` | 32768 | 4096
`jtrace_last_entry_storage` | 0 | 0
`jtrace_use_large_entries` | true | false
Once the build has been configured, simply rebuild and deploy the new kernel to
your target in order to start to use JTRACE.
### Examples
To enable non-persistent tracing with large records and 32KB worth of
storage, add the following to your `fx set` line.
> --args 'jtrace_enabled=true'
To expand this storage to 64KB and use small records instead, the arguments
would be:
> --args 'jtrace_enabled=true jtrace_target_buffer_size=65536 jtrace_use_large_entries=false'
To enable persistent tracing with large records (overriding the defaults)
and storage for 4 per-cpu last event records (also overriding the defaults), requesting 4KB
worth of storage, add the following to your `fx set` line.
> --args 'jtrace_enabled="persistent" jtrace_last_entry_storage=4 jtrace_use_large_entries=true'
Please note that `""`s are required around `persistent`.
## Adding trace points
### Requirements
In order to add a trace point to a file, users must include the jtrace header by
adding the following `#include` statement in each file where trace points are to
be added.
```c++
#include <lib/jtrace/jtrace.h>
```
If the include file is not found, it may be necessary to add the headers to the
`deps` section of your module's `BUILD.gn` file.
```gn
deps = [ "//zircon/kernel/lib/jtrace:headers" ] # If deps has not been declared yet.
deps += [ "//zircon/kernel/lib/jtrace:headers" ] # If deps has already been declared.
```
### The JTRACE macro. {#jtrace-macro}
Trace points may now be added by invoking the JTRACE macro. Every trace point
_must_ specify a string literal "tag", and _may_ specify as many general purpose
arguments as the configured [trace entry](#trace-entries) record supports. Tags
*must* be string literals in order to remain valid across reboots when
persistent tracing is configured.
General purpose arguments do not need to be specified, and will default to 0
if omitted. Additionally, no casting should be needed when specifying general
purpose arguments. If the argument field is large enough to store the value, it
will. Small trace entries have space for a single 32 bit argument, while large
trace entries have space for four 32 bit args, followed by two 64 bit args.
Here is an example of using tracing, when configured for large entries.
```c++
#include <lib/jtrace/jtrace.h>
struct MyStruct {
void* ptr1, ptr2;
uint32_t foo, bar;
}
void MyFunction(const MyStruct* args, uint32_t num_args) {
JTRACE("(num_args, args)", num_args, 0, 0, 0, args);
for (uint32_t i = 0; i < num_args; ++i) {
const auto& a = args[i];
JTRACE("Before (i, foo, bar, ptr1, ptr2)", i, a.foo, a.bar, 0, a.ptr1, a.ptr2);
zx_status_t status = DoAThingWithMyStruct(a);
JTRACE("After (status)", status);
}
JTRACE("Finished");
}
```
## Displaying the contents of a trace buffer
When JTRACE is enabled, the contents of the trace buffer will be dumped
automatically whenever a kernel panic occurs, along with the register dump and
backtrace.
Users may also dump the contents of the trace buffer manually using the kernel
shell. From a serial console, simply run `k jtrace` to dump the current trace
buffer, or run `k jtrace -r` to dump the contents of the recovered persistent
trace buffer, if available. Note that while the live trace buffer is being
dumped, tracing will be temporarily disabled and no new trace entries will be
added to the buffer. Invocations of the [JTRACE macro](#jtrace-macro) will be
treated as no-ops.
### Examples
Here is an example of a dump of a trace buffer configured to use small entries,
with 4kB of storage, and 4 [Per-CPU last event records](#per-cpu-last-event-records).
Syscall entry and exit was instrumented using two JTRACE macros. The tag was
used to hold the string literal name of the syscall being made, while the first
general purpose argument was used to record a bool indicating whether the trace
marks the start, or the end, of the syscall.
```none
JTRACE: Recovered 166/166 entries
[ 17.431102506][cpu 0] : 00000001 : ( 0.000 uSec) : (futex_wait)
[ 17.431107979][cpu 0] : 00000000 : ( 5.473 uSec) : (nanosleep)
[ 17.431109757][cpu 0] : 00000001 : ( 1.778 uSec) : (nanosleep)
[ 17.431118119][cpu 0] : 00000000 : ( 8.362 uSec) : (futex_wake)
[ 17.431131543][cpu 0] : 00000001 : ( 13.424 uSec) : (futex_wake)
[ 17.431134424][cpu 0] : 00000000 : ( 2.881 uSec) : (futex_wake)
<snip>
[ 17.437545001][cpu 3] : 00000001 : ( 1.443 uSec) : (object_wait_async)
[ 17.437547341][cpu 3] : 00000000 : ( 2.340 uSec) : (port_wait)
[ 17.437548263][cpu 3] : 00000001 : ( 0.922 uSec) : (port_wait)
[ 17.437666664][cpu 3] : 00000000 : ( 118.401 uSec) : (channel_read_etc)
[ 17.437668850][cpu 3] : 00000001 : ( 2.186 uSec) : (channel_read_etc)
[ 17.437678993][cpu 3] : 00000000 : ( 10.143 uSec) : (debug_send_command)
JTRACE: Last recorded per-CPU events.
[ 17.437001443][cpu 0] : 00000000 : ( 0.000 uSec) : (futex_wait)
[ 17.436016989][cpu 1] : 00000000 : ( 0.000 uSec) : (channel_call_etc_noretry)
[ 17.437066990][cpu 2] : 00000000 : ( 0.000 uSec) : (futex_wait)
[ 17.437678993][cpu 3] : 00000000 : ( 0.000 uSec) : (debug_send_command)
JTRACE: Last log timestamp [ 17.437678993]
```
Here is the same example, this time with large records configured.
```none
JTRACE: Recovered 47/47 entries
[ 24.270872973][cpu 2 tid 1301] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_post:84 (channel_read_etc)
[ 24.270875158][cpu 2 tid 1301] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 2.185 uSec) : syscalls.cc:do_syscall_pre:59 (object_wait_async)
[ 24.270879083][cpu 2 tid 1301] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 3.925 uSec) : syscalls.cc:do_syscall_post:84 (object_wait_async)
[ 24.270886634][cpu 2 tid 1301] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 7.551 uSec) : syscalls.cc:do_syscall_pre:59 (port_wait)
[ 24.273031016][cpu 1 tid 1295] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 2144.382 uSec) : syscalls.cc:do_syscall_pre:59 (channel_write_etc)
[ 24.273057126][cpu 1 tid 1295] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 26.110 uSec) : syscalls.cc:do_syscall_post:84 (channel_write_etc)
<snip>
[ 24.273871139][cpu 1 tid 2735] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 5.089 uSec) : syscalls.cc:do_syscall_post:84 (object_wait_async)
[ 24.273875048][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 3.909 uSec) : syscalls.cc:do_syscall_pre:59 (port_wait)
[ 24.273876820][cpu 1 tid 2735] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 1.772 uSec) : syscalls.cc:do_syscall_post:84 (port_wait)
[ 24.274000887][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 124.067 uSec) : syscalls.cc:do_syscall_pre:59 (channel_read_etc)
[ 24.274003866][cpu 1 tid 2735] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 2.979 uSec) : syscalls.cc:do_syscall_post:84 (channel_read_etc)
[ 24.274019066][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 15.200 uSec) : syscalls.cc:do_syscall_pre:59 (debug_send_command)
JTRACE: Last recorded per-CPU events.
[ 24.269107951][cpu 0 tid 2475] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (debug_read)
[ 24.274019066][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (debug_send_command)
[ 24.273785974][cpu 2 tid 1902] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (port_wait)
[ 24.154375056][cpu 3 tid 28770] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (futex_wait)
JTRACE: Last log timestamp [ 24.274019066]
```
## Trace Entries {#trace-entries}
Trace entries may be configured at compile time to be either "large" or "small"
records (see [Configuration](#configuration)). All records in the trace buffer
are of the same configured size. It is not possible to mix small records with
large records. Large entries are supersets of small entries. In other words,
a large entry contains all of the fields that a small record contains, plus some
additional records.
Because of the (typically) limited amount of persistent DRAM available to
systems, small records are typically used when JTRACE is configured to be
persistent, and large records are used otherwise. That said, users may override
this behavior and force either small or large records in either situation.
### Small entries
Small trace entries record the following fields.
+ Timestamp: A timestamp which records when the trace entry was added to the buffer. Recorded in
units of [ticks](/docs/reference/syscalls/ticks_get.md), but translated to the
[clock monotonic](/docs/reference/syscalls/clock_get_monotonic.md) timeline when the buffer is
dumped.
+ CPU Id: The ID of the CPU which created the trace entry.
+ Tag: A pointer to a C-string literal supplied by the developer to assist in debugging.
+ `a`: A single 32-bit general purpose field supplied by the developer to assist in debugging.
Small trace entries currently take up 24 bytes of storage each.
### Large entries
Large entries extend small entries, adding the following fields.
+ File: The file name of the location of the trace point.
+ Line: The line number of in the file of the location of the trace point.
+ Function: The function name of the location of the trace point.
+ TID: The numeric thread ID of the thread which created the trace entry.
+ `[a-d]`: Four 32-bit general purpose fields (instead of a small record's one) supplied by the developer to assist in debugging.
+ `[e-f]`: Two 64-bit general purpose fields supplied by the developer to assist in debugging.
Small trace entries currently take up 72 bytes of storage each.
## Per-CPU Last Event Records {#per-cpu-last-event-records}
In addition to the main trace buffer, JTRACE can be configured to separately
store the last events recorded by each CPU. When enabled, these records will be
dumped separately, after the main trace buffer dump is complete.
This feature can be useful when attempting to debug hangs of individual CPUs,
especially if they lead to the triggering of a [HW WDT](/docs/concepts/kernel/watchdog.md).
If a CPU gets stuck for any significant amount of time, it is likely that trace
events from other CPUs will rapidly flush the events from the stuck CPU out of
the trace buffer, especially if the buffer is small because it is located in
persistent RAM. The separate last-event record for the stuck CPU allows
developers to know the last trace point executed by the stuck CPU before it
stopped responding.
To enable Per-CPU last event records, set the `jtrace_last_entry_storage`
[configuration](#configuration) parameter to be equal to the number of CPUs
present in your target hardware. Note that if this number is too low, last event
records for CPU IDs >= the configured number never be recorded. If the number is
too high, some of the entries will never have a record logged to them and some
of the storage reserved in the trace buffer will be wasted.
## Persistent Tracing {#persistent-tracing}
### Overview
Occasionally, a developer might find themselves in the unfortunate position of
needing to debug a complete system hang. When these occur, it can be very
difficult to gain any insights as all ability to observe the state of the
system, or what led up to it, is lost. This is a situation where persistent
tracing may be able to help.
When [configured](#configuration) for persistent tracing, the trace buffer is
placed in special "persistent RAM" provided to the kernel by the bootloader.
Typically, this is a region of physical RAM which is guaranteed to not be
touched by the bootloader(s) as the system boots, meaning that it is possible to
put information in the region which manages to survive a full HW reset of the
system.
### Usage
In order to enable persistent tracing, set the `jtrace_enabled` to the value
`"persistent`". After a hang occurs, reboot the system (either via a
[HW WDT](/docs/concepts/kernel/watchdog.md), or a manual reset line in the
target). Once the system has rebooted, if the recovered trace buffer was
successfully recovered, it may be dumped by executing `k jtrace -r` from a
serial shell. Note that persistent RAM is typically just a section of DRAM which
is put aside by the bootloader chain, and it may suffer corruption in the time
it takes for the early stage bootloaders to re-enable dynamic RAM refresh. The
trace system will make every effort to dump what it can from the recovered
buffer, falling back to a hexdump of the region if needed, however some care
should be taken when interpreting the results of this dump if there is any
possibility of corruption as a result of the reboot.
Not all targets currently support a direct memory-mappable persistent RAM
region, and when they do, this region is frequently very small. Currently, only
ARM64 targets support these regions, so persistent tracing is not currently
available for x64 systems. The `jtrace_target_buffer_size` value configured by
the user is simply a request for storage, there is no guarantee that there will
be enough storage to support the request. In the case that there is not, the
trace system will claim as much storage as it can, and make due with what it is
able to reserve. After booting, users can example the current configuration of
the JTRACE subsystem by running `k jtrace -i` from a serial console. For
example on a target with 8kB of persistent RAM, a request for a 32kB buffer
might result in the following.
```none
JTRACE configuration
--------------------
Requested Buffer Size : 32768
Allocated Buffer Size : 6144
Allocated Buffer Loc : 0xffff00023fff8000
Per-CPU last entry cnt : 0
Large entries : no
Persistent : yes
```
32kB was requested, but only 8kB was available, and 2kB of that was reserved for
the crashlog up front, leaving only 6kB for the trace buffer.
## Caveats
By its nature, there are some side effects and limitations of the JTRACE system
which users should keep in mind.
+ Trace records do not _have_ to exist in monotonically increasing timestamp
order. Because of the lockless nature of the ring buffer storage, a record
slot is first reserved, and then the timestamp is recorded. It is possible for
effects like cache misses and IRQs to cause two adjacent trace entries to
actually have out of order timestamps.
+ It is _technically_ possible (but extremely unlikely) for a trace record to be
corrupted because of two CPUs writing to the record concurrently. For this to
happen, a CPU would need to reserve a slot and begin writing to it, then
encounter a long delay (perhaps caused by preemption). During this delay, if
sufficient entries are recorded to wrap the trace buffer, a second CPU could
claim the same slot in the ring buffer as the first, resulting in a corrupted
record with information from both CPUs recorded in it.
+ Reservation of an entry slot in the ring buffer uses atomics and
acquire/release memory order semantics. Depending on the memory ordering
details of the target architecture, this _might_ have an affect on the
reproducibility of some bugs, as read and write operations from two different
CPUs which are also JTRACEing may suddenly may not be able to be re-ordered
across the point of the JTRACE.
+ When using persistent tracing, records written to the JTRACE buffer must
be immediately flushed from cache all of the way to physical memory in order
to properly survive a spontaneous reboot. A properly configured
`persistent_ram_allocation_granularity` value should prevent these cache
cleaning operations from overlapping with anything else, but it should be
noted that the write of the entry to the trace buffer will result in a small
amount of additional data being forced from cache into physical RAM more
frequently than normal.