| # Debugging kernel crashes with JTRACE |
| |
| ## Overview |
| |
| ### What it is |
| |
| JTRACE is a simple, lightweight, "just-in-case" tracing system meant to assist |
| developers in debugging kernel crashes and spontaneous reboots by recording a |
| series of trace event records generated by instrumentation added by a developer, |
| and automatically playing them back in the event of a kernel panic. The intent |
| is to give developers the ability to create trace records from almost anywhere |
| in kernel code which will give them additional information, beyond just a |
| register dump and stack trace, to assist in understanding the sequence of events |
| which led up to a crash. Useful properties of JTRACE include: |
| |
| + Lockless ring buffer data structure: Creating a trace event will never |
| block, and will never attempt to acquire any locks, making it available for |
| use during early init, inside of arbitrary locks, during hard IRQ time, and |
| almost anywhere else in the kernel code. |
| + Low overhead: Creating trace events is a relatively cheap operation, making it |
| possible to instrument performance sensitive code with minimal timing |
| disruption. |
| + Optional persistence across reboot: Platform permitting, JTRACE may be |
| configured to use some of the persistent RAM typically used for crashlog |
| storage to assist in debugging issues involving spontaneous reboots, typically |
| caused by triggering a [hardware watchdog timer](/docs/concepts/kernel/watchdog.md). |
| |
| ### What it is not |
| |
| JTRACE is not a production tracing system. By default, it is completely |
| disabled, and it is never meant to be enabled in a production build. Trace |
| points are _not_ meant to be left in code by developers in changes which are |
| checked in. Typical usage of JTRACE involves a developer adding trace points to |
| their code, reproducing a crash, then modifying existing trace points or adding |
| new ones, and repeating until they have root caused their bug. Once the bug has |
| been found, understood, and fixed, the code which added the trace points is |
| discarded. |
| |
| ## Configuration {#configuration} |
| |
| Configuration of JTRACE is controlled using build arguments set via `fx set`. |
| There are currently 4 options which can be specified. They are: |
| |
| Name | Type | Notes |
| ----------------------------|-----------|-------- |
| `jtrace_enabled` | tri-state | Enables JTRACE when set to `true` or `"persistent"`. Disabled by default. |
| `jtrace_target_buffer_size` | uint32_t | Controls the amount of RAM reserved for the trace buffer. |
| `jtrace_last_entry_storage` | uint32_t | Controls the amount of RAM reserved for [per-cpu last event records](#per-cpu-last-event-records). |
| `jtrace_use_large_entries` | bool | Controls whether large or small [trace entries](#trace-entries) are used. |
| |
| By default, JTRACE is disabled and not included in a kernel build. To enable |
| JTRACE, users may set the `jtrace_enabled` build argument to a value of either |
| `true`, or `"persistent"` to enable JTRACE and configure it for |
| [persistent tracing](#persistent-tracing). |
| |
| If [persistent tracing](#persistent-tracing) is chosen, |
| `jtrace_target_buffer_size` *must* be a multiple of the kernel's configured |
| `persistent_ram_allocation_granularity` value (defaults to 128 bytes). |
| |
| If not explicitly specified by the user, defaults values for each of the other |
| configuration parameters will be chosen based on whether or not [persistent |
| tracing](#persistent-tracing) has been selected. |
| |
| Name | Non-persistent Default | Persistent Default |
| ----------------------------|------------------------|-------------------- |
| `jtrace_target_buffer_size` | 32768 | 4096 |
| `jtrace_last_entry_storage` | 0 | 0 |
| `jtrace_use_large_entries` | true | false |
| |
| Once the build has been configured, simply rebuild and deploy the new kernel to |
| your target in order to start to use JTRACE. |
| |
| ### Examples |
| |
| To enable non-persistent tracing with large records and 32KB worth of |
| storage, add the following to your `fx set` line. |
| |
| > --args 'jtrace_enabled=true' |
| |
| To expand this storage to 64KB and use small records instead, the arguments |
| would be: |
| |
| > --args 'jtrace_enabled=true jtrace_target_buffer_size=65536 jtrace_use_large_entries=false' |
| |
| To enable persistent tracing with large records (overriding the defaults) |
| and storage for 4 per-cpu last event records (also overriding the defaults), requesting 4KB |
| worth of storage, add the following to your `fx set` line. |
| |
| > --args 'jtrace_enabled="persistent" jtrace_last_entry_storage=4 jtrace_use_large_entries=true' |
| |
| Please note that `""`s are required around `persistent`. |
| |
| ## Adding trace points |
| |
| ### Requirements |
| |
| In order to add a trace point to a file, users must include the jtrace header by |
| adding the following `#include` statement in each file where trace points are to |
| be added. |
| |
| ```c++ |
| #include <lib/jtrace/jtrace.h> |
| ``` |
| |
| If the include file is not found, it may be necessary to add the headers to the |
| `deps` section of your module's `BUILD.gn` file. |
| |
| ```gn |
| deps = [ "//zircon/kernel/lib/jtrace:headers" ] # If deps has not been declared yet. |
| deps += [ "//zircon/kernel/lib/jtrace:headers" ] # If deps has already been declared. |
| ``` |
| |
| ### The JTRACE macro. {#jtrace-macro} |
| |
| Trace points may now be added by invoking the JTRACE macro. Every trace point |
| _must_ specify a string literal "tag", and _may_ specify as many general purpose |
| arguments as the configured [trace entry](#trace-entries) record supports. Tags |
| *must* be string literals in order to remain valid across reboots when |
| persistent tracing is configured. |
| |
| General purpose arguments do not need to be specified, and will default to 0 |
| if omitted. Additionally, no casting should be needed when specifying general |
| purpose arguments. If the argument field is large enough to store the value, it |
| will. Small trace entries have space for a single 32 bit argument, while large |
| trace entries have space for four 32 bit args, followed by two 64 bit args. |
| |
| Here is an example of using tracing, when configured for large entries. |
| |
| ```c++ |
| #include <lib/jtrace/jtrace.h> |
| |
| struct MyStruct { |
| void* ptr1, ptr2; |
| uint32_t foo, bar; |
| } |
| |
| void MyFunction(const MyStruct* args, uint32_t num_args) { |
| JTRACE("(num_args, args)", num_args, 0, 0, 0, args); |
| |
| for (uint32_t i = 0; i < num_args; ++i) { |
| const auto& a = args[i]; |
| JTRACE("Before (i, foo, bar, ptr1, ptr2)", i, a.foo, a.bar, 0, a.ptr1, a.ptr2); |
| zx_status_t status = DoAThingWithMyStruct(a); |
| JTRACE("After (status)", status); |
| } |
| |
| JTRACE("Finished"); |
| } |
| ``` |
| |
| ## Displaying the contents of a trace buffer |
| |
| When JTRACE is enabled, the contents of the trace buffer will be dumped |
| automatically whenever a kernel panic occurs, along with the register dump and |
| backtrace. |
| |
| Users may also dump the contents of the trace buffer manually using the kernel |
| shell. From a serial console, simply run `k jtrace` to dump the current trace |
| buffer, or run `k jtrace -r` to dump the contents of the recovered persistent |
| trace buffer, if available. Note that while the live trace buffer is being |
| dumped, tracing will be temporarily disabled and no new trace entries will be |
| added to the buffer. Invocations of the [JTRACE macro](#jtrace-macro) will be |
| treated as no-ops. |
| |
| ### Examples |
| |
| Here is an example of a dump of a trace buffer configured to use small entries, |
| with 4kB of storage, and 4 [Per-CPU last event records](#per-cpu-last-event-records). |
| Syscall entry and exit was instrumented using two JTRACE macros. The tag was |
| used to hold the string literal name of the syscall being made, while the first |
| general purpose argument was used to record a bool indicating whether the trace |
| marks the start, or the end, of the syscall. |
| |
| ```none |
| JTRACE: Recovered 166/166 entries |
| [ 17.431102506][cpu 0] : 00000001 : ( 0.000 uSec) : (futex_wait) |
| [ 17.431107979][cpu 0] : 00000000 : ( 5.473 uSec) : (nanosleep) |
| [ 17.431109757][cpu 0] : 00000001 : ( 1.778 uSec) : (nanosleep) |
| [ 17.431118119][cpu 0] : 00000000 : ( 8.362 uSec) : (futex_wake) |
| [ 17.431131543][cpu 0] : 00000001 : ( 13.424 uSec) : (futex_wake) |
| [ 17.431134424][cpu 0] : 00000000 : ( 2.881 uSec) : (futex_wake) |
| <snip> |
| [ 17.437545001][cpu 3] : 00000001 : ( 1.443 uSec) : (object_wait_async) |
| [ 17.437547341][cpu 3] : 00000000 : ( 2.340 uSec) : (port_wait) |
| [ 17.437548263][cpu 3] : 00000001 : ( 0.922 uSec) : (port_wait) |
| [ 17.437666664][cpu 3] : 00000000 : ( 118.401 uSec) : (channel_read_etc) |
| [ 17.437668850][cpu 3] : 00000001 : ( 2.186 uSec) : (channel_read_etc) |
| [ 17.437678993][cpu 3] : 00000000 : ( 10.143 uSec) : (debug_send_command) |
| |
| JTRACE: Last recorded per-CPU events. |
| [ 17.437001443][cpu 0] : 00000000 : ( 0.000 uSec) : (futex_wait) |
| [ 17.436016989][cpu 1] : 00000000 : ( 0.000 uSec) : (channel_call_etc_noretry) |
| [ 17.437066990][cpu 2] : 00000000 : ( 0.000 uSec) : (futex_wait) |
| [ 17.437678993][cpu 3] : 00000000 : ( 0.000 uSec) : (debug_send_command) |
| |
| JTRACE: Last log timestamp [ 17.437678993] |
| ``` |
| |
| Here is the same example, this time with large records configured. |
| |
| ```none |
| JTRACE: Recovered 47/47 entries |
| [ 24.270872973][cpu 2 tid 1301] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_post:84 (channel_read_etc) |
| [ 24.270875158][cpu 2 tid 1301] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 2.185 uSec) : syscalls.cc:do_syscall_pre:59 (object_wait_async) |
| [ 24.270879083][cpu 2 tid 1301] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 3.925 uSec) : syscalls.cc:do_syscall_post:84 (object_wait_async) |
| [ 24.270886634][cpu 2 tid 1301] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 7.551 uSec) : syscalls.cc:do_syscall_pre:59 (port_wait) |
| [ 24.273031016][cpu 1 tid 1295] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 2144.382 uSec) : syscalls.cc:do_syscall_pre:59 (channel_write_etc) |
| [ 24.273057126][cpu 1 tid 1295] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 26.110 uSec) : syscalls.cc:do_syscall_post:84 (channel_write_etc) |
| <snip> |
| [ 24.273871139][cpu 1 tid 2735] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 5.089 uSec) : syscalls.cc:do_syscall_post:84 (object_wait_async) |
| [ 24.273875048][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 3.909 uSec) : syscalls.cc:do_syscall_pre:59 (port_wait) |
| [ 24.273876820][cpu 1 tid 2735] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 1.772 uSec) : syscalls.cc:do_syscall_post:84 (port_wait) |
| [ 24.274000887][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 124.067 uSec) : syscalls.cc:do_syscall_pre:59 (channel_read_etc) |
| [ 24.274003866][cpu 1 tid 2735] : 00000001 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 2.979 uSec) : syscalls.cc:do_syscall_post:84 (channel_read_etc) |
| [ 24.274019066][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 15.200 uSec) : syscalls.cc:do_syscall_pre:59 (debug_send_command) |
| |
| JTRACE: Last recorded per-CPU events. |
| [ 24.269107951][cpu 0 tid 2475] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (debug_read) |
| [ 24.274019066][cpu 1 tid 2735] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (debug_send_command) |
| [ 24.273785974][cpu 2 tid 1902] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (port_wait) |
| [ 24.154375056][cpu 3 tid 28770] : 00000000 00000000 00000000 00000000 0000000000000000 0000000000000000 : ( 0.000 uSec) : syscalls.cc:do_syscall_pre:59 (futex_wait) |
| |
| JTRACE: Last log timestamp [ 24.274019066] |
| ``` |
| |
| ## Trace Entries {#trace-entries} |
| |
| Trace entries may be configured at compile time to be either "large" or "small" |
| records (see [Configuration](#configuration)). All records in the trace buffer |
| are of the same configured size. It is not possible to mix small records with |
| large records. Large entries are supersets of small entries. In other words, |
| a large entry contains all of the fields that a small record contains, plus some |
| additional records. |
| |
| Because of the (typically) limited amount of persistent DRAM available to |
| systems, small records are typically used when JTRACE is configured to be |
| persistent, and large records are used otherwise. That said, users may override |
| this behavior and force either small or large records in either situation. |
| |
| ### Small entries |
| |
| Small trace entries record the following fields. |
| |
| + Timestamp: A timestamp which records when the trace entry was added to the buffer. Recorded in |
| units of [ticks](/docs/reference/syscalls/ticks_get.md), but translated to the |
| [clock monotonic](/docs/reference/syscalls/clock_get_monotonic.md) timeline when the buffer is |
| dumped. |
| + CPU Id: The ID of the CPU which created the trace entry. |
| + Tag: A pointer to a C-string literal supplied by the developer to assist in debugging. |
| + `a`: A single 32-bit general purpose field supplied by the developer to assist in debugging. |
| |
| Small trace entries currently take up 24 bytes of storage each. |
| |
| ### Large entries |
| |
| Large entries extend small entries, adding the following fields. |
| |
| + File: The file name of the location of the trace point. |
| + Line: The line number of in the file of the location of the trace point. |
| + Function: The function name of the location of the trace point. |
| + TID: The numeric thread ID of the thread which created the trace entry. |
| + `[a-d]`: Four 32-bit general purpose fields (instead of a small record's one) supplied by the developer to assist in debugging. |
| + `[e-f]`: Two 64-bit general purpose fields supplied by the developer to assist in debugging. |
| |
| Small trace entries currently take up 72 bytes of storage each. |
| |
| ## Per-CPU Last Event Records {#per-cpu-last-event-records} |
| |
| In addition to the main trace buffer, JTRACE can be configured to separately |
| store the last events recorded by each CPU. When enabled, these records will be |
| dumped separately, after the main trace buffer dump is complete. |
| |
| This feature can be useful when attempting to debug hangs of individual CPUs, |
| especially if they lead to the triggering of a [HW WDT](/docs/concepts/kernel/watchdog.md). |
| If a CPU gets stuck for any significant amount of time, it is likely that trace |
| events from other CPUs will rapidly flush the events from the stuck CPU out of |
| the trace buffer, especially if the buffer is small because it is located in |
| persistent RAM. The separate last-event record for the stuck CPU allows |
| developers to know the last trace point executed by the stuck CPU before it |
| stopped responding. |
| |
| To enable Per-CPU last event records, set the `jtrace_last_entry_storage` |
| [configuration](#configuration) parameter to be equal to the number of CPUs |
| present in your target hardware. Note that if this number is too low, last event |
| records for CPU IDs >= the configured number never be recorded. If the number is |
| too high, some of the entries will never have a record logged to them and some |
| of the storage reserved in the trace buffer will be wasted. |
| |
| ## Persistent Tracing {#persistent-tracing} |
| |
| ### Overview |
| |
| Occasionally, a developer might find themselves in the unfortunate position of |
| needing to debug a complete system hang. When these occur, it can be very |
| difficult to gain any insights as all ability to observe the state of the |
| system, or what led up to it, is lost. This is a situation where persistent |
| tracing may be able to help. |
| |
| When [configured](#configuration) for persistent tracing, the trace buffer is |
| placed in special "persistent RAM" provided to the kernel by the bootloader. |
| Typically, this is a region of physical RAM which is guaranteed to not be |
| touched by the bootloader(s) as the system boots, meaning that it is possible to |
| put information in the region which manages to survive a full HW reset of the |
| system. |
| |
| ### Usage |
| |
| In order to enable persistent tracing, set the `jtrace_enabled` to the value |
| `"persistent`". After a hang occurs, reboot the system (either via a |
| [HW WDT](/docs/concepts/kernel/watchdog.md), or a manual reset line in the |
| target). Once the system has rebooted, if the recovered trace buffer was |
| successfully recovered, it may be dumped by executing `k jtrace -r` from a |
| serial shell. Note that persistent RAM is typically just a section of DRAM which |
| is put aside by the bootloader chain, and it may suffer corruption in the time |
| it takes for the early stage bootloaders to re-enable dynamic RAM refresh. The |
| trace system will make every effort to dump what it can from the recovered |
| buffer, falling back to a hexdump of the region if needed, however some care |
| should be taken when interpreting the results of this dump if there is any |
| possibility of corruption as a result of the reboot. |
| |
| Not all targets currently support a direct memory-mappable persistent RAM |
| region, and when they do, this region is frequently very small. Currently, only |
| ARM64 targets support these regions, so persistent tracing is not currently |
| available for x64 systems. The `jtrace_target_buffer_size` value configured by |
| the user is simply a request for storage, there is no guarantee that there will |
| be enough storage to support the request. In the case that there is not, the |
| trace system will claim as much storage as it can, and make due with what it is |
| able to reserve. After booting, users can example the current configuration of |
| the JTRACE subsystem by running `k jtrace -i` from a serial console. For |
| example on a target with 8kB of persistent RAM, a request for a 32kB buffer |
| might result in the following. |
| |
| ```none |
| JTRACE configuration |
| -------------------- |
| Requested Buffer Size : 32768 |
| Allocated Buffer Size : 6144 |
| Allocated Buffer Loc : 0xffff00023fff8000 |
| Per-CPU last entry cnt : 0 |
| Large entries : no |
| Persistent : yes |
| ``` |
| |
| 32kB was requested, but only 8kB was available, and 2kB of that was reserved for |
| the crashlog up front, leaving only 6kB for the trace buffer. |
| |
| ## Caveats |
| |
| By its nature, there are some side effects and limitations of the JTRACE system |
| which users should keep in mind. |
| |
| + Trace records do not _have_ to exist in monotonically increasing timestamp |
| order. Because of the lockless nature of the ring buffer storage, a record |
| slot is first reserved, and then the timestamp is recorded. It is possible for |
| effects like cache misses and IRQs to cause two adjacent trace entries to |
| actually have out of order timestamps. |
| + It is _technically_ possible (but extremely unlikely) for a trace record to be |
| corrupted because of two CPUs writing to the record concurrently. For this to |
| happen, a CPU would need to reserve a slot and begin writing to it, then |
| encounter a long delay (perhaps caused by preemption). During this delay, if |
| sufficient entries are recorded to wrap the trace buffer, a second CPU could |
| claim the same slot in the ring buffer as the first, resulting in a corrupted |
| record with information from both CPUs recorded in it. |
| + Reservation of an entry slot in the ring buffer uses atomics and |
| acquire/release memory order semantics. Depending on the memory ordering |
| details of the target architecture, this _might_ have an affect on the |
| reproducibility of some bugs, as read and write operations from two different |
| CPUs which are also JTRACEing may suddenly may not be able to be re-ordered |
| across the point of the JTRACE. |
| + When using persistent tracing, records written to the JTRACE buffer must |
| be immediately flushed from cache all of the way to physical memory in order |
| to properly survive a spontaneous reboot. A properly configured |
| `persistent_ram_allocation_granularity` value should prevent these cache |
| cleaning operations from overlapping with anything else, but it should be |
| noted that the write of the entry to the trace buffer will result in a small |
| amount of additional data being forced from cache into physical RAM more |
| frequently than normal. |