| # Zircon ELF Core Dump Support |
| |
| This library provides support for the Zircon incarnation of traditional "core |
| dump" support using the ELF file format. The ELF core file uses a very |
| straightforward format to dump a flexible amount of information, but usually a |
| very complete dump. In contrast to other dump formats such as "minidump", core |
| files tend to be large and complete, rather than compact and sufficient. The |
| format allows the dump-writer some leeway in choosing how much data to include. |
| |
| The library provides a flexible callback-based C++ API for controlling the |
| various phases of collecting data for a dump. The library produces dumps in a |
| streaming fashion, with disposition of the data left to callbacks. |
| |
| A simple writer using POSIX I/O is provided to plug into the callback API to |
| stream to a file descriptor. This works with either seekable or non-seekable |
| file descriptors, seeking forward over gaps of zero padding when possible. |
| |
| [TOC] |
| |
| ## Core file format |
| |
| The dump of a process is represented by an ELF file. The ELF header's class, |
| byte-order (always 64-bit and little-endian for Fuchsia), and `e_machine` |
| fields represent the machine, and `e_type` is `ET_CORE`. |
| |
| According to the standard format, `ET_CORE` files have program headers but no |
| section headers (not counting the `PN_XNUM` protocol for large numbers of |
| program headers, which uses a special section header). Each `PT_LOAD` segment |
| represents a memory mapping. One or more `PT_NOTE` segments give additional |
| information about the process and (optionally) its threads. |
| |
| ### Memory segments |
| |
| The representation of memory in core dumps is standard across systems. Zircon |
| core dumps do not deviate. |
| |
| The `PT_LOAD` segments represent all of the address space of the process. Each |
| uses a `p_align` value of page size, and both its `p_vaddr` and `p_offset` are |
| aligned to page size. As ELF requires, `PT_LOAD` segments are in ascending |
| order of address (`p_vaddr`) and do not overlap. Every gap not covered by the |
| `p_vaddr` and `p_memsz` of some `PT_LOAD` segment should be a hole in the |
| address space where nothing was mapped in the process. |
| |
| Each `PT_LOAD` segment has a `p_filesz` that may be anywhere from zero up to |
| its `p_memsz`. The `p_memsz` value says how much of the address space this |
| mapping took up in the process, and is always a multiple of page size. The |
| `p_filesz` value is what leading subset of that memory is included in the dump. |
| It's usually a multiple of page size too, but is not required to be. If it's |
| zero or less than the full `p_memsz` value, that means the dump-writer decided |
| to elide or truncate the contents of this memory. That could be because it |
| would just read as zero (though the dump-writer could instead leave a |
| sparse-file "hole" for zero pages when the filesystem supports that); or |
| because it was memory that the writer's policy said should never be dumped, |
| such as device memory or shared memory; or simply memory that the writer |
| decided was too uninteresting or too big to include, such as read-only program |
| code and data attributable to mapped files. In Zircon core dumps, the |
| information about mappings and VMOs in the process-wide notes (below) may shed |
| additional light on what some elided memory was or why it was not dumped. |
| |
| ### Note segments |
| |
| `ET_CORE` files also have `PT_NOTE` segments providing additional information |
| about the process. The details of the note formats vary widely by system, |
| though all use the ELF note container format. A segment with `p_offset` and |
| nonzero `p_filesz` but a zero `p_vaddr` and zero `p_memsz` is recognized as a |
| "non-allocated" segment, which holds offline data but does not correspond to |
| the process address space. This kind of segment is used in `ET_CORE` files. |
| |
| In Zircon core dumps, there is a single non-allocated `PT_NOTE` segment that |
| appears before all the `PT_LOAD` segments (both in its order in the program |
| header table and in the order of its `p_offset` locating data in the file). |
| This contains several notes using different name (string) and type (integer) |
| values to represent process and thread state. These map directly to state |
| reported by the Zircon kernel ABI. |
| |
| #### Process-wide notes |
| |
| The first series of notes describe process-wide state. |
| |
| ##### ZirconProcessInfo |
| |
| ELF notes using the name `ZirconProcessInfo` contain all the types that |
| `zx_object_get_info` yields on a Zircon process. The ELF note's 32-bit type is |
| exactly the `zx_object_info_topic_t` value in `zx_object_get_info`. The note's |
| "description" (payload) has the size and layout that corresponds to that topic. |
| All available types are usually included in the dump. |
| |
| ##### ZirconProcessProperty |
| |
| ELF notes using the name `ZirconProcessProperty` contain all the types that |
| `zx_object_get_property` yields on a Zircon process. The ELF note's 32-bit |
| type is exactly the `property` argument to `zx_object_get_property`. The |
| note's "description" (payload) has the size and layout that corresponds to that |
| property. All available properties are usually included in the dump. |
| |
| ##### Note ordering |
| |
| The first note is always for `ZX_INFO_HANDLE_BASIC`; this has the process KOID |
| (aka PID). (Note that the `rights` field indicates the rights the dump-writer |
| had to dump the process; this does not represent any handle present in the |
| process.) The second note is always for `ZX_PROP_NAME`. The set of remaining |
| notes and their order is unspecified and subject to change. Dumps generally |
| include all the information the kernel makes available, but a dump-writer might |
| be configured to omit some information or might be forced to omit some |
| information due to runtime errors from the system calls to collect data. |
| |
| #### Per-thread notes |
| |
| Additional sets of notes describe each thread in the process. These notes are |
| not always included in the dump, at the discretion of the dump-writer. The |
| process-wide data and memory can be collected while letting the threads |
| continue to run. In that case, the data and/or threads may be mutually |
| inconsistent since threads changed things while the dump was being taken. |
| Moreover, there is no per-thread data whatsoever. Ordinarily, the dump-writer |
| suspends the process and each thread before collecting any data. Only when all |
| data and memory has been dumped does it allow those threads to run again. In |
| this (usual) case, full information is dumped about each thread. |
| |
| There is no formal grouping or separation between the notes for one thread and |
| the next. All the notes for one thread appear, then all the notes for the next |
| thread. This is in the order those threads were reported by the kernel, |
| usually chronological order of their creation. |
| |
| The first note for each thread is always for `ZX_INFO_HANDLE_BASIC`; this has |
| the thread KOID and indicates that following notes apply to that thread. (Note |
| that the `rights` field indicates the rights the dump-writer had to dump the |
| thread; this does not represent any handle present in the process.) The second |
| note for each thread is always for `ZX_PROP_NAME`. The set of remaining notes |
| and their order is unspecified and subject to change; see above. |
| |
| The dump-writer normally tries to include every known per-thread note for each |
| thread. Some types are not available because they aren't used on the current |
| machine or because the thread was already dying when the dump started, but some |
| might be elided just because their contents are boring. If a known type is |
| omitted for a thread, it usually means there was no interesting data to report. |
| |
| ##### ZirconThreadInfo |
| |
| ELF notes using the name `ZirconThreadInfo` contain all the types that |
| `zx_object_get_info` yields on a Zircon thread. The ELF note's 32-bit type is |
| exactly the `zx_object_info_topic_t` value in `zx_object_get_info`. The note's |
| "description" (payload) has the size and layout that corresponds to that topic. |
| As mentioned above, the `ZX_INFO_HANDLE_BASIC` note comes first and provides |
| the KOID that can be used as a unique identifier for the thread across the |
| whole dump. |
| |
| ##### ZirconThreadProperty |
| |
| ELF notes using the name `ZirconThreadProperty` contain all the data that |
| `zx_object_get_property` yields on a Zircon thread. The ELF note's 32-bit type |
| is exactly the `property` argument to `zx_object_get_proprety`. The note's |
| "description" (payload) has the size and layout that corresponds to that |
| property. |
| |
| ##### ZirconThreadState |
| |
| ELF notes using the name `ZirconThreadState` contain all the data that |
| `zx_thread_read_state` yields on a Zircon thread. The ELF note's 32-bit type |
| is exactly the `zx_thread_state_topic_t` argument to `zx_thread_read_state`. |
| The note's "description" (payload) has the size and layout that corresponds to |
| that topic's `zx_thread_state_*_t` type. The types and layouts that will |
| appear vary by machine. |
| |
| #### Build ID Notes |
| |
| Zircon `ET_CORE` files usually contain additional `PT_NOTE` program headers. |
| These come after a `PT_LOAD` header and have a `p_vaddr` and `p_memsz` that |
| locates their data inside that load segment. These additional notes locate |
| places in the image where the dump-writer identified ELF build ID notes inside |
| ELF images mapped into the process memory. The notes don't convey any _new_ |
| data, it's just a small region of the process memory already in the dump. But |
| they save a reader of the dump the trouble of scanning the memory image for |
| ELF images and extracting build ID notes. Instead, each build ID note that |
| the dump-writer came across appears directly as a note in the `ET_CORE` file |
| when examined by normal ELF tools. |
| |
| ## Job archives |
| |
| As well as an individual process, a Zircon job can be dumped into a file |
| called a "job archive" that represents the job itself and may include dumps |
| for its processes and/or child jobs, at the discretion of the dump-writer. |
| |
| A job archive is a standard `ar` format archive (like `.a` files for linking). |
| The archive's member files can be listed and extracted using the standard `ar` |
| tool. It has the standard archive header and uses the standard long-name table |
| special member, but does not have a symbol table special member like the |
| archives used for static linking. |
| |
| The initial portion of the job archive contains member files that describe the |
| job itself. This subset alone is called the "stub archive". After the stub |
| archive, there may be additional member files containing dumps for processes |
| or child jobs. |
| |
| ### Stub archive |
| |
| The member files of the stub archive are analogous to the notes in an |
| `ET_CORE` file as described above. Rather than using ELF note format, the |
| name of each file encodes its type. |
| |
| #### ZirconJobInfo |
| |
| Member files with name `ZirconJobInfo.%u` contain all the types that |
| `zx_object_get_info` yields on a Zircon job. `%u` is the decimal |
| representation of the `zx_object_info_topic_t` value in `zx_object_get_info`. |
| The file has the size and layout that corresponds to that topic. All |
| available types are usually included in the dump. |
| |
| #### ZirconJobProperty |
| |
| Member files with name `ZirconJobProperty.%u` contain all the types that |
| `zx_object_get_property` yields on a Zircon job. `%u` is the decimal |
| representation of the `property` argument to `zx_object_get_property`. The |
| file has the size and layout that corresponds to that property. All available |
| properties are usually included in the dump. |
| |
| ### Process dump member files |
| |
| A job archive can include the whole dumps for the processes within the job. A |
| member file for a process dump is an `ET_CORE` file as described above. The |
| name of a process dump member file doesn't matter, but it is usually `core.%u` |
| where `%u` is the decimal representation of the process KOID (aka PID). The |
| definitive process KOID should be discovered from the notes inside the process |
| dump member file itself, not by parsing the member file name. |
| |
| ### Child job dump member files |
| |
| A job archive can include the whole dumps for child jobs within the job. A |
| member file for a child job dump is itself another job archive file. The name |
| of a child job dump member file doesn't matter, but it is usually `core.%u.a` |
| where `%u` is the decimal representation of the job KOID. The definitive job |
| KOID should be discovered from the information inside the job archive itself, |
| not by parsing the member file name. |
| |
| ### Member file ordering |
| |
| Robust readers can ignore the order of member files in a job archive and |
| recognize the stub archive members by name and others by their contents. This |
| works with job archives unpacked and repacked using `ar` or similar tools. |
| However, standard Zircon job archives are streamed out in a specific order. |
| |
| All member files that form the stub archive appear before any process or child |
| job dump member files. Usually all members in the stub archive use the long |
| name table and additional member files for process or child job dumps have |
| short names truncated to fit in the traditional member header. |
| |
| The first member in the stub archive is for `ZX_INFO_HANDLE_BASIC`; this has |
| the job KOID. (Note that the `rights` field indicates the rights the |
| dump-writer had to dump the job; this does not represent any handle present in |
| any dumped process.) The second member is always for `ZX_PROP_NAME`. The set |
| of remaining members and their order is unspecified and subject to change. |
| Dumps generally include all the information the kernel makes available. |
| |
| The order of process and/or child dump member files doesn't matter, but |
| usually all the processes are dumped and then all the child jobs, in the order |
| the kernel reported them as seen in the stub archive. The definitive type and |
| KOID of each process or child dump member file should be discovered from the |
| information inside the member file itself, not from member ordering or names. |
| |
| ### Flattened job archives |
| |
| When a job archive includes child job dumps, this can be done in two ways. |
| |
| In a hierarchical job archive, there are `ET_CORE` member files for each |
| process in the job and job archive member files for each child job. Each |
| child job's member file is itself another hierarchical job archive that might |
| contain a grandchild job archive, etc. |
| |
| In a flattened job archive, the archive member file for a child job is just |
| the stub archive that describes the job itself. That child job's processes |
| appear as member files in the flattened job archive, not inside the contained |
| job archive for the child job. Likewise, any grandchild jobs appear as member |
| files in the flattened job archive that are themselves just stub archives, |
| followed by the grandchild's processes and the great-grandchildren, etc. |
| |
| A hierarchical job archive preserves the job hierarchy in the structure of the |
| files. A flattened job archive loses that structure, requiring a reader to |
| reconstruct it from the process and child KOID lists in each stub archive. |
| |
| The very simple way the traditional `ar` archive format works means that the |
| hierarchical and flattened job archives for the same job tree are exactly the |
| same size and have all the same contents in all the same places. The only |
| difference is in the member header for a child job archive, which either says |
| the member extends to include the following additional dump members or that it |
| stops after just the stub archive so those members come after the child's stub |
| archive in the outer archive. |
| |
| Because the whole child job archive's size must be determined in advance, |
| hierarchical job archives require holding all the processes in the child's |
| whole subtree suspended while dumping the whole child's job archive _en masse_. |
| A flattened job archive can always be streamed out piecemeal while only one |
| process at a time is held suspended long enough to dump it. Thus streaming out |
| a flattened job archive will usually go more quickly than the equivalent |
| hierarchical job archive. However, the hierarchical job archive also ensures |
| that the state shown in the dump is synchronized across all the processes in |
| the hierarchy since they were all kept suspended while doing all the dumping. |
| |
| ## ZirconSystem.json |
| |
| The dump-writer may choose to include system-wide information in dumps. This |
| is information collected on the system at the time the dump is taken that may |
| be specific to the particular hardware or instance of the system but is not |
| specific to any single process or job on the system. It can be included in a |
| job archive, or in an `ET_CORE` file, or both. When system-wide information is |
| included in a job archive, then any child job or process dumps within the |
| archive might contain a copy of the same information, or they might omit it |
| since it is always the same across the whole job hierarchy. |
| |
| The system-wide information is encoded in UTF-8 JSON text. In a job archive, |
| it's found in a member file called `ZirconSystem.json`. In an `ET_CORE` file, |
| it's found in the ELF note with name `ZirconSystem.json` and `n_type` of zero. |
| |
| The JSON schema is subject to future extension, but it's a JSON object |
| (i.e. key/value dictionary) with a simple mapping to the `zx::system` kernel |
| interfaces, e.g. `"version_string"` maps to a JSON string value that |
| `zx_system_get_version_string()` returned, and `"num_cpus"` maps to a JSON |
| integer value that `zx_system_get_num_cpus()` returned. |
| |
| ## ZirconKernelInfo |
| |
| The dump-writer may choose to include privileged kernel information in dumps. |
| This is information collected on the system at the time the dump is taken that |
| contains details about system-wide privileged kernel state. Like the public |
| system-wide information in `ZirconSystem.json` notes, this data is not specific |
| to any single process or job on the system. Unlike that information, this is |
| data that actively changes while the system runs. It can be included in a job |
| archive, or in an `ET_CORE` file, or both. When kernel information is included |
| in a job archive, then any child job or process dumps within the archive might |
| contain a copy of the same information, or they might omit it since it is |
| always the same across the whole job hierarchy (unless collected at different |
| times). |
| |
| In job archives, member files with name `ZirconKernelInfo.%u` contain kernel |
| information that `zx_object_get_info` yields on Zircon "resource" objects. |
| `%u` is the decimal representation of the `zx_object_info_topic_t` value in |
| `zx_object_get_info`. The file has the size and layout that corresponds to |
| that topic. All available types are usually included in the dump. |
| |
| In `ET_CORE` files, ELF notes using the name `ZirconKernelInfo` contain the |
| same information. The ELF note's 32-bit type is exactly the |
| `zx_object_info_topic_t` value in `zx_object_get_info`. The note's |
| "description" (payload) has the size and layout that corresponds to that topic. |
| |
| ## Dump timestamps |
| |
| The dump-writer may choose to include a timestamp indicating when a dump was |
| taken. In job archives, each archive member file records a date; `ar tv` on an |
| archive file will display each member's date. Timestamps the dump-writer chose |
| to elide appear as zero (i.e. 1970-1-1T0:00 UTC). The dump-writer usually |
| samples the UTC clock just before collecting each job's job-wide information |
| and makes that the date of each member file in the "stub archive". It samples |
| the clock again just before collecting each process, and makes that the date of |
| the process dump member file. Hence, by convention the "dump date" indicates |
| the earliest time any data was collected, though some of the data collected may |
| reflect state changed by processes doing more work after collection began. |
| |
| The dump-writer may also choose to include the timestamp in an `ET_CORE` file. |
| This is represented by the ELF note with name `ZirconDumpDate` and `n_type` of |
| zero. It contains a 64-bit POSIX `time_t` "seconds since Epoch" value. This |
| note does not appear at all if the dump-writer chose to elide the timestamp. |
| |
| ## Dump remarks |
| |
| A dump can include arbitrary additional data provided to the dump-writer. |
| These are called "dump remarks". This is additional information whose format |
| and meaning is not specified by the overall dump format, nor necessarily known |
| to the dump-writer. Dump remarks can be included in a job archive, or in an |
| `ET_CORE` file, or both. When a job archive includes dump remarks, those may |
| be meant to apply to all dumps in the archive, or only to the specific job. |
| |
| In a job archive, any member file whose name begins with `ZirconDump.` holds |
| dump remarks. In an `ET_CORE` file, dump remarks are found in ELF notes with |
| names that begin with `ZirconDump.` and `n_type` of zero. The exact name must |
| have some suffix after `ZirconDump.`. The exact format and meaning of dump |
| remarks is set only by convention based on that full name. By convention, |
| remarks with names ending in `.json` are encoded as UTF-8 JSON text and remarks |
| with names ending in `.txt` are UTF-8 plain text. It's recommended that other |
| dump remarks use names that similarly end in something that looks like a |
| filename extension appropriate for their format (e.g. a custom non-text format |
| without a canonical filename extension might use `ZirconDump.something.bin`). |
| |
| ## Reader API |
| |
| The `zxdump` C++ library provides an API for reading dumps as well as one for |
| creating them. As described above, dumps can contain all the kinds of |
| information the Zircon kernel reports about processes, threads, and jobs, |
| using the kernel API's own formats. So the library interface for reading |
| information from core dumps and job archives has striking parallels with the |
| Zircon system call interface. In fact, most of the interface is what a |
| read-only subset of the Zircon API might look like in a new style of C++ |
| language binding. However, this is an API available on all host platforms as |
| well as on Fuchsia. |
| |
| ### `zxdump::TaskHolder` |
| |
| The [`<lib/zxdump/task.h>`](include/lib/zxdump/task.h) header describes this |
| API in detail. The `zxdump::TaskHolder` object is the root container used to |
| represent dump data in memory. As the name implies, it holds a set of related |
| "task" objects, that is `zxdump::Job`, `zxdump::Process`, and `zxdump::Thread` |
| objects. The holder can be fed dump files, either `ET_CORE` files with single |
| process dumps or job archives that can contain multiple dumps. This is done |
| with the `Insert` method to "insert" a dump into the holder by file descriptor. |
| |
| ### Job, Process, & Thread Objects |
| |
| Each Zircon kernel object read from dumps is represented by a C++ object. All |
| these objects are owned by the `zxdump::TaskHolder` object and are always used |
| by reference. The API mirrors the Zircon system call API for the same kernel |
| object types. `zxdump::Job`, `zxdump::Process`, and `zxdump::Thread` classes |
| are derived from a common base class `zxdump::Task`. |
| |
| Each has an object type and a KOID (aka PID in the case of processes) exactly |
| reflecting the Zircon kernel objects in the snapshot of the running system |
| taken by the dump. The `get_info` and `get_property` methods return all the |
| object-type-specific information captured in the snapshot, using the Zircon |
| system call API's own data structures. `zxdump::Thread` objects also have |
| `read_state` methods. The preferred form of each of these uses strong typing |
| via a template parameter selecting the topic, property, or state kind to avoid |
| the hassles and unsafety of the raw buffer and size in the C system call API. |
| |
| As in the live system's API, the various "KOID list" topics from `get_info` can |
| be used with the `get_child` method to navigate the task hierarchy, from job to |
| child job, from job to process, and from process to thread. A more convenient |
| and efficient `find` method is also provided to look up a descendent task by |
| KOID from any job or process above it in the hierarchy. `zxdump::Job` and |
| `zxdump::Process` objects also have convenience methods that return |
| `std::map<zx_koid_t, zxdump::...>` for the children, processes, and threads |
| lists for doing full enumeration. |
| |
| ### Task Hierarchy & Reading Multiple Dumps |
| |
| A single ELF core dump file describes only one process (with all its threads). |
| A job archive can describe any number of jobs and processes. Any particular |
| dump file, whether a single-process dump or a job archive, might be only one |
| slice of the picture that needs to be reassembled for post-mortem analysis. |
| |
| The `zxdump::TaskHolder` API supports reading multiple dump files into a |
| single, unified view of the data. As each dump is inserted, new job and |
| process objects are collated by matching up task KOIDs with children and |
| process KOID lists. The tasks thus "self-assemble" into a task hierarchy |
| replicating a partial view of the live system's task hierarchy. |
| |
| #### Root Job & The Super-Root |
| |
| Navigating the task hierarchy of a `zxdump::TaskHolder` works just like in the |
| Zircon system call API: start with the root job, and enumerate children. In |
| the zxdump case, the `zxdump::TaskHolder::root_job()` method simply returns |
| the root job object. |
| |
| It's possible that a single job archive, or a collection of job archives |
| together, actually represent the root job of a system instance and all its |
| descendent tasks. More often, the reader API is only looking at a partial |
| view of some subset of the tree. This might be a strict subtree with a single |
| parent job that can be considered the "local root job". But it could also be |
| just a collection of jobs that don't all share ancestry that's visible in the |
| dump data. It may well be only a collection of individual process dumps and |
| no job information at all to indicate any kind of hierarchy above the threads |
| within each process. The reader API handles all these cases. |
| |
| When job archives provide a coherent view that assembles into a single tree |
| with one root job, then the `zxdump::Job` object returned by `root_job()` is |
| just this job, with all its job-specific information as well as its children |
| and process lists. |
| |
| In other cases, `root_job()` is actually a special placeholder `zxdump::Job` |
| object called the "super-root". This object doesn't correspond to any real |
| Zircon kernel object from the dumped system. It serves only to provide the |
| child job and process lists that a real root job would provide. The |
| placeholder object has KOID of zero and no other job information to report. |
| All it does have is a children list and a processes list, which appear like the |
| normal job `get_info` topics for those KOID lists even though no other topics |
| are available. Every "orphaned" job or process whose parent job wasn't |
| described in any dump file will appear to be a child job or process of the |
| super-root. (When the only task without a known parent is a job, then that job |
| becomes the "real" root job instead and there is no "super-root".) |
| |
| When displaying information from a dump, a nonzero KOID for the root job |
| identifies a real, rooted job tree that can be displayed whole. The zero KOID |
| of the super-root indicates that instead it's really just a collection of |
| unrelated "top-level" jobs and/or processes. |
| |
| ### Resource Objects & Kernel Information |
| |
| The `zxdump::Task` base class is itself derived from `zxdump::Object`. As in |
| the Zircon system call API, `get_info` et al are actually provided generically |
| for all kinds of kernel objects. Zircon's resource objects are represented by |
| the `zxdump::Resource` class, which is also derived from `zxdump::Object`. |
| |
| In dump files, the only resource object that ever exists is the root resource. |
| `zxdump::TaskHolder` provides a simple `root_resource()` method similar to |
| `root_job()`. If a dump includes the privileged kernel information, then it |
| will be available via `get_info` calls on the root resource. If that data is |
| omitted, then `root_resource()` will be a placeholder object that returns a |
| zero KOID. |
| |
| ### Memory-Mapped & Streaming Input |
| |
| The reader code uses file descriptors to read dump files. When possible, it |
| will use `mmap` to map an ELF or archive file into read-only memory and use |
| its contents without requiring copies in memory. But the reader will also |
| generally work with pipes as input, and will read in a streaming fashion with |
| some caveats. |
| |
| The reader first reads all the file headers and the "notes" and caches them in |
| memory. This contains all `get_info`, `get_property`, and `read_state` data |
| items. What remains in the dump file is the contents of process memory, |
| which is read from files only on demand as needed for `read_memory` calls. |
| This has some ramifications: |
| |
| * When the reader can't use memory-mapped files, it has to hold onto the file |
| descriptor so it can seek and read for later `read_memory` calls. |
| |
| * When the input file descriptor is not seekable (such as streaming input from |
| a pipe or socket), then `read_memory` calls only work when they match the |
| order of the data in the input dump file's layout. |
| |
| Recall from the ordering sections in the format description above that dump |
| file layout is quite flexible. The reader can cope with any valid layout. But |
| the streaming input support is optimized for the canonical layout with all the |
| headers and note data first, followed by memory data with file order correlated |
| to ascending address order. If the reader has to seek past memory data to get |
| to all the note data, this may be inefficient; and no `read_memory` calls will |
| succeed later, even in ascending address order. |
| |
| Many particular uses of dump-reading are not concerned with reading memory. So |
| the `zxdump::TaskHolder::Insert` method takes an optional flag argument to say |
| that `read_memory` isn't expected to be used later. In this case, the reader |
| will clean up and close the file descriptors immediately after inserting the |
| dump. Any later attempts to use `read_memory` on a `zxdump::Process` whose |
| data came from that dump will fail with `ZX_ERR_NOT_SUPPORTED`. |
| |
| When full access to process memory via `read_memory` is required from a dump |
| file coming from a streaming input source, it's probably best to just write |
| the whole dump stream into a file and then use the memory-mapped reading mode. |
| In other cases, it works very well to feed the reader a dump stream piped from |
| a network connection or decompression process, etc. |
| |
| ### Partial Dumps & Missing Data |
| |
| The dump format in theory represents every type of information about each job, |
| process, and thread it describes. However, the dump-writer has wide discretion |
| to omit some pieces of information for any reason. Particular `get_info`, |
| `get_property`, or `read_state` items might be elided because the task died |
| while being dumped; because the kernel or hardware didn't support a particular |
| kind of information; to save space in the dump; to redact sensitive data; or |
| simply by the whim of the user. The dump reader can also often successfully |
| read a dump that has been truncated, and will then present it just the same as |
| a dump where specific information was elided intentionally. In all these |
| cases, particular task API calls will fail with `ZX_ERR_NOT_SUPPORTED` when the |
| specific data requested is missing, even where their Zircon system call |
| counterparts might never get that error. |
| |
| The `zxdump::Process::read_memory` distinguishes more cases: |
| |
| * `ZX_ERR_NO_MEMORY` is the same error the kernel returns for a memory range |
| that simply isn't all mapped to anything in the process. |
| |
| * `ZX_ERR_NOT_FOUND` indicates that the dump described the memory region as |
| present in the process, but intentionally omitted these actual memory |
| contents. The core file has a `PT_LOAD` segment covering the region, and |
| may include `ZX_INFO_PROCESS_MAPS` data that gives more details, but the |
| contents were not included in the dump. |
| |
| * `ZX_ERR_OUT_OF_RANGE` indicates that the memory was included in the dump, |
| but can't be read because the dump was truncated. This is also the result |
| when trying to read memory from a non-seekable dump stream where the needed |
| portion of the file has already been passed. |
| |
| * `ZX_ERR_NOT_SUPPORTED` specifically means that the dump file containing this |
| process was inserted by a `zxdump::TaskHolder::Insert` call with `false` |
| passed for the optional `read_memory` flag argument. |
| |
| ## Live Task API |
| |
| As described above, the `zxdump` C++ library's API for reading information out |
| of dumps looks very much like a subset of the Zircon system call API for |
| getting the same information on a live system from running Zircon jobs and |
| processes. So it's natural that when using this API on Fuchsia, you can use |
| the same API to handle information either from a dump or from a live system. |
| |
| When the [`<lib/zxdump/task.h>`](include/lib/zxdump/task.h) API is used on |
| Fuchsia systems, an additional signature for the `Insert` method is available |
| on `zxdump::TaskHolder` objects. Rather than taking a file descriptor to a |
| core file or job archive to read the dump of a process or job (aka a task), |
| this takes a Zircon handle to a process or job using the C++ |
| [lib/zx](/zircon/system/ulib/zx/) API's `zx::handle` family of types. This |
| "inserts" that live task into the holder in the same way: it "self-assembles" |
| with the other tasks already in the holder to form a job tree, presenting a |
| "super-root" if unrelated tasks go into the same holder. (It's not possible |
| to insert a `zx::thread` directly, only the `zx::process` containing it.) |
| |
| The biggest difference between inserting a dump and inserting a live task is |
| that the live task's information is not immediately collected. Instead, when |
| `get_info`, `get_property`, and `read_state` calls are made on a |
| `zxdump::Task` family object that actually represents a live task, the |
| information is collected on demand. Each topic, property, and state kind is |
| fetched only once and then cached, but none is fetched until it's requested. |
| (The one exception is the "basic" information, so the type and KOID are always |
| on hand.) This means that it's efficient to use this API purely as a nicer |
| API front-end for `get_info` et al, while also making it easy to write code |
| that makes use of the information in exactly the same way for either a live |
| case or a post mortem case. But the API is designed for the post mortem |
| style, which is to say, examining the state just once rather than fetching |
| fresh information as it changes over time. |
| |
| Once a live task has been inserted, all the same API conveniences are |
| available, including the `find` methods as well as direct `get_child` lookups. |
| Once a live job has been inserted, its child jobs and processes are implicitly |
| inserted on demand as they are found by KOID via `get_child` or `find` from |
| the job tree already inserted. As with dumps, when disconnected processes or |
| job trees are inserted, there will be a super-root presented as the fake root |
| job in the `zxdump::TaskHolder` object. But if the actual root job handle of |
| the running system is inserted as a live task, then `root_job().find(KOID)` |
| efficiently finds any job or process on the system by KOID. |
| |
| It's even possible to comingle live tasks and dump data in a single |
| `zxdump::TaskHolder` object. Just like with inserting multiple dumps, |
| whatever KOIDs are made visible in the holder by inserting a process or job |
| tree all self-assemble by KOID and become accessible under the root job tree. |
| So it can work to insert a post mortem dump of jobs and processes, that are |
| now dead but came from the currently running system and exist in the same KOID |
| space, alongside the current root job. The result is a combined picture of |
| the whole system's job tree that includes current jobs and processes in their |
| active state intermingled with their deceased relatives each in their last |
| known state. (Inserting a live task with the same KOID as a task already read |
| from a dump may have confusing results. The information from the dump will be |
| used as the cached information, but any information not present in the dump |
| that's requested later might be filled in from the live task.) |
| |
| Code that can work equally well with dumps or with live tasks can be built for |
| Fuchsia or for other host operating systems. To reduce the need for |
| conditional compilation, the `zxdump::LiveTask` type is provided as an alias |
| for `zx::handle` on Fuchsia that is also available as a placeholder API on |
| other systems. When not on Fuchsia, the only `zxdump::LiveTask` objects that |
| exist are default-constructed "invalid handle" objects. All the same APIs are |
| available, but they'll always fail because the handle passed will always be |
| invalid. |
| |
| For convenience, the `zxdump::GetRootJob()` function is provided to fetch the |
| live root job handle via the [fuchsia.kernel.RootJob][fuchsia.kernel.RootJob] |
| FIDL protocol. This returns failure if the current process's component |
| sandbox doesn't have access to that privileged protocol. (Even this is also |
| available on non-Fuchsia systems in a version that always returns failure, so |
| no conditional compilation is required.) A tool or service can insert one or |
| more dump files, or it can insert the live root job; and then look up tasks by |
| KOID and interrogate them with identical code either way. |
| |
| Similarly, the `zxdump::GetRootResource()` function is provided to fetch the |
| live system's root resource handle via the |
| [fuchsia.boot.RootResource][fuchsia.boot.RootResource] FIDL protocol. This |
| handle can be passed to `zxdump::TaskHolder::Insert` to make live kernel data |
| available via `root_resource()` as from a dump. (Resource objects other than |
| the root resource cannot be inserted.) |
| |
| [fuchsia.kernel.RootJob]: https://fuchsia.dev/reference/fidl/fuchsia.kernel#RootJob |
| [fuchsia.boot.RootResource]: https://fuchsia.dev/reference/fidl/fuchsia.boot#RootResource |