docs/symbolizer_markup.md - zircon - Git at Google

 # Symbolizer markup format #

 This document defines a text format for log messages that can be
 processed by a _symbolizing filter_.  The basic idea is that logging
 code emits text that contains raw address values and so forth, without
 the logging code doing any real work to convert those values to
 human-readable form.  Instead, logging text uses the markup format
 defined here to identify pieces of information that should be converted
 to human-readable form after the fact.  As with other markup formats,
 the expectation is that most of the text will be displayed as is, while
 the markup elements will be replaced with expanded text, or converted
 into active UI elements, that present more details in symbolic form.

 This means there is no need for symbol tables, DWARF debugging sections,
 or similar information to be directly accessible at runtime.  There is
 also no need at runtime for any logic intended to compute human-readable
 presentation of information, such as C++ symbol demangling.  Instead,
 logging must include markup elements that give the contextual
 information necessary to make sense of the raw data, such as memory
 layout details.

 This format identifies markup elements with a syntax that is both simple
 and distinctive.  It's simple enough to be matched and parsed with
 straightforward code.  It's distinctive enough that character sequences
 that look like the start or end of a markup element should rarely if
 ever appear incidentally in logging text.  It's specifically intended
 not to require sanitizing plain text, such as the HTML/XML requirement
 to replace `<` with `&lt;` and the like.

 ## Scope and assumptions ##

 This specification defines a format standard for Zircon and Fuchsia.
 But there is nothing specific to Zircon or Fuchsia about the markup
 format.  A symbolizing filter implementation will be independent both of
 the _target_ operating system and machine architecture where the logs
 are generated and of the _host_ operating system and machine
 architecture where the filter runs.

 This format assumes that the symbolizing filter processes intact whole
 lines.  If long lines might be split during some stage of a logging
 pipeline, they must be reassembled to restore the original line breaks
 before feeding lines into the symbolizing filter.  Most markup elements
 must appear entirely on a single line (often with other text before
 and/or after the markup element).  There are some markup elements that
 are specified to span lines, with line breaks in the middle of the
 element.  Even in those cases, the filter is not expected to handle line
 breaks in arbitrary places inside a markup element, but only inside
 certain fields.

 This format assumes that the symbolizing filter processes a coherent
 stream of log lines from a single process address space context.  If a
 logging stream interleaves log lines from more than one process, these
 must be collated into separate per-process log streams and each stream
 processed by a separate instance of the symbolizing filter.  Because the
 kernel and user processes use disjoint address regions in most operating
 systems (including Zircon), a single user process address space plus
 the kernel address space can be treated as a single address space for
 symbolization purposes if desired.

 ## Dependence on Build IDs ##

 The symbolizer markup scheme relies on contextual information about
 runtime memory address layout to make it possible to convert markup
 elements into useful symbolic form.  This relies on having an
 unmistakable identification of which binary was loaded at each address.

 An ELF Build ID is the payload of an ELF note with name `"GNU"` and type
 `NT_GNU_BUILD_ID`, a unique byte sequence that identifies a particular
 binary (executable, shared library, loadable module, or driver module).
 The linker generates this automatically based on a hash that includes
 the complete symbol table and debugging information, even if this is
 later stripped from the binary.

 This specification uses the ELF Build ID as the sole means of
 identifying binaries.  Each binary relevant to the log must have been
 linked with a unique Build ID.  The symbolizing filter must have some
 means of mapping a Build ID back to the original ELF binary (either the
 whole unstripped binary, or a stripped binary paired with a separate
 debug file).

 ## Colorization ##

 The markup format supports a restricted subset of ANSI X3.64 SGR (Select
 Graphic Rendition) control sequences.  These are unlike other markup
 elements:
  * They specify presentation details (**bold** or colors) rather than
    semantic information.  The assocation of semantic meaning with color
    (e.g. red for errors) is chosen by the code doing the logging, rather
    than by the UI presentation of the symbolizing filter.  This is a
    concession to existing code (e.g. LLVM sanitizer runtimes) that use
    specific colors and would require substantial changes to generate
    semantic markup instead.
  * A single control sequence changes "the state", rather than being an
    hierarchical structure that surrounds affected text.

 The filter processes ANSI SGR control sequences only within a single
 line.  If a control sequence to enter a **bold** or color state is
 encountered, it's expected that the control sequence to reset to default
 state will be encountered before the end of that line.  If a "dangling"
 state is left at the end of a line, the filter may reset to default
 state for the next line.

 An SGR control sequence is not interpreted inside any other markup element.
 However, other markup elements may appear between SGR control sequences and
 the color/**bold** state is expected to apply to the symbolic output that
 replaces the markup element in the filter's output.

 The accepted SGR control sequences all have the form `"\033[%um"`
 (expressed here using C string syntax), where `%u` is one of these:

 | Code | Effect | Notes |
 |:----:|:------:|-------|
 | `0`  | Reset to default formatting. | |
 | `1`  | Use **bold text**  | Combines with color states, doesn't reset them.|
 | `30` | Black foreground   | |
 | `31` | Red foreground     | |
 | `32` | Green foreground   | |
 | `33` | Yellow foreground  | |
 | `34` | Blue foreground    | |
 | `35` | Magenta foreground | |
 | `36` | Cyan foreground    | |
 | `37` | White foreground   | |

 ## Common markup element syntax ##

 All the markup elements share a common syntactic structure to facilitate
 simple matching and parsing code.  Each element has the form:

 ```
 {{{tag:fields}}}
 ```

 `tag` identifies one of the element types described below, and is always
 a short alphabetic string that must be in lower case.  The rest of the
 element consists of one or more fields.  Fields are separated by `:` and
 cannot contain any `:` or `}` characters.  How many fields must be or
 may be present and what they contain is specified for each element type.

 No markup elements or ANSI SGR control sequences are interpreted inside the
 contents of a field.

 In the descriptions of each element type, `printf`-style placeholders
 indicate field contents:

 * `%s`

   A string of printable characters, not including `:` or `}`.

 * `%p`

   An address value represented by `0x` followed by an even number of
   hexadecimal digits (using either lower-case or upper-case for
   `A`..`F`).  If the digits are all `0` then the `0x` prefix may be
   omitted.  No more than 16 hexadecimal digits are expected to appear in
   a single value (64 bits).

 * `%u`

   A nonnegative decimal integer.

 * `%x`

   A sequence of an even number of hexadecimal digits (using either
   lower-case or upper-case for `A`..`F`), with no `0x` prefix.
   This represents an arbitrary sequence of bytes, such as an ELF Build ID.

 ## Presentation elements ##

 These are elements that convey a specific program entity to be displayed
 in human-readable symbolic form.

 * `{{{symbol:%s}}}`

   Here `%s` is the linkage name for a symbol or type.  It may require
   demangling according to language ABI rules.  Even for unmangled names,
   it's recommended that this markup element be used to identify a symbol
   name so that it can be presented distinctively.

   Examples:
   ```
   {{{symbol:_ZN7Mangled4NameEv}}}
   {{{symbol:foobar}}}
   ```

 * `{{{pc:%p}}}`

   Here `%p` is the memory address of a code location.
   It might be presented as a function name and source location.

   Examples:
   ```
   {{{pc:0x12345678}}}
   {{{pc:0xffffffff9abcdef0}}}
   ```

 * `{{{data:%p}}}`

   Here `%p` is the memory address of a data location.
   It might be presented as the name of a global variable at that location.

   Examples:
   ```
   {{{data:0x12345678}}}
   {{{data:0xffffffff9abcdef0}}}
   ```

 * `{{{bt:%u:%p}}}`

   This represents one frame in a backtrace.  It usually appears on a
   line by itself (surrounded only by whitespace), in a sequence of such
   lines with ascending frame numbers.  So the human-readable output
   might be formatted assuming that, such that it looks good for a
   sequence of `bt` elements each alone on its line with uniform
   indentation of each line.  But it can appear anywhere, so the filter
   should not remove any non-whitespace text surrounding the element.

   Here `%u` is the frame number, which starts at zero for the location
   of the fault being identified, increments to one for the caller of
   frame zero's call frame, to two for the caller of frame one, etc.
   `%p` is the memory address of a code location.

   In frames after frame zero, this code location identifies a call site.
   Some emitters may subtract one byte or one instruction length from the
   actual return address for the call site, with the intent that the
   address logged can be translated directly to a source location for the
   call site and not for the apparent return site thereafter (which can
   be confusing).  It's recommended that emitters _not_ do this, so that
   each frame's code location is the exact return address given to its
   callee and e.g. could be highlighted in instruction-level disassembly.
   The symbolizing filter can do the adjustment to the address it
   translates into a source location.  Assuming that a call instruction
   is longer than one byte on all supported machines, applying the
   "subtract one byte" adjustment a second time still results in an
   address somewhere in the call instruction, so a little sloppiness here
   does no harm.

   Examples:
   ```
   {{{bt:0:0x12345678}}}
   {{{bt:1:0xffffffff9abcdef0}}}
   ```

 * `{{{hexdict:...}}}`

   This element can span multiple lines.  Here `...` is a sequence of
   key-value pairs where a single `:` separates each key from its value,
   and arbitrary whitespace separates the pairs.  The value (right-hand
   side) of each pair either is one or more `0` digits, or is `0x`
   followed by hexadecimal digits.  Each value might be a memory address
   or might be some other integer (including an integer that looks like a
   likely memory address but actually has an unrelated purpose).  When
   the contextual information about the memory layout suggests that a
   given value could be a code location or a global variable data
   address, it might be presented as a source location or variable name
   or with active UI that makes such interpretation optionally visible.

   The intended use is for things like register dumps, where the emitter
   doesn't know which values might have a symbolic interpretation but a
   presentation that makes plausible symbolic interpretations available
   might be very useful to someone reading the log.  At the same time,
   a flat text presentation should usually avoid interfering too much
   with the original contents and formatting of the dump.  For example,
   it might use footnotes with source locations for values that appear
   to be code locations.  An active UI presentation might show the dump
   text as is, but highlight values with symbolic information available
   and pop up a presentation of symbolic details when a value is selected.

   Example:
   ```
   {{{hexdict:
     CS:                   0 RIP:     0x6ee17076fb80 EFL:            0x10246 CR2:                  0
     RAX:      0xc53d0acbcf0 RBX:     0x1e659ea7e0d0 RCX:                  0 RDX:     0x6ee1708300cc
     RSI:                  0 RDI:     0x6ee170830040 RBP:     0x3b13734898e0 RSP:     0x3b13734898d8
      R8:     0x3b1373489860  R9:         0x2776ff4f R10:     0x2749d3e9a940 R11:              0x246
     R12:     0x1e659ea7e0f0 R13: 0xd7231230fd6ff2e7 R14:     0x1e659ea7e108 R15:      0xc53d0acbcf0
   }}}
   ```

 ## Trigger elements ##

 These elements cause an external action and will be presented to the
 user in a human readable form. Generally they trigger an external
 action to occur that results in a linkable page. The link or some
 other informative information about the external action can then be
 presented to the user.

 * `{{{dumpfile:%s:%s}}}`

   Here the first `%s` is an identifier for a type of dump and the
   second `%s` is an identifier for a particular dump that's just been
   published.  The types of dumps, the exact meaning of "published",
   and the nature of the identifier are outside the scope of the markup
   format per se.  In general it might correspond to writing a file by
   that name or something similar.

   This element may trigger additional post-processing work beyond
   symbolizing the markup. It indicates that a dump file of some sort
   has been published.  Some logic attached to the symbolizing filter may
   understand certain types of dump file and trigger additional
   post-processing of the dump file upon encountering this element (e.g.
   generating visualizations, symbolization).  The expectation is that the
   information collected from contextual elements (described below) in the
   logging stream may be necessary to decode the content of the dump.  So
   if the symbolizing filter triggers other processing, it may need to
   feed some distilled form of the contextual information to those
   processes.

   On Zircon and Fuchsia in particular, "publish" means to call the
   `__sanitizer_publish_data` function from `<zircon/sanitizer.h>`
   with the "type" identifier as the "sink name" string.  The "dump
   identifier" is the name attached to the Zircon VMO whose handle
   was passed in the call to `__sanitizer_publish_data`.
   **TODO(mcgrathr): Link to docs about `__sanitizer_publish_data` and
   getting data dumps off the device.**

   An example of a type identifier is `sancov`, for dumps from LLVM
   [SanitizerCoverage](https://clang.llvm.org/docs/SanitizerCoverage.html).

   Example:
   ```
   {{{dumpfile:sancov:sancov.8675}}}
   ```

 ## Contextual elements ##

 These are elements that supply information necessary to convert
 presentation elements to symbolic form.  Unlike presentation elements,
 they are not directly related to the surrounding text.  Contextual
 elements should appear alone on lines with no other non-whitespace
 text, so that the symbolizing filter might elide the whole line from
 its output without hiding any other log text.

 The contextual elements themselves do not necessarily need to be
 presented in human-readable output.  However, the information they
 impart may be essential to understanding the logging text even after
 symbolization.  So it's recommended that this information be preserved
 in some form when the original raw log with markup may no longer be
 readily accessible for whatever reason.

 Contextual elements should appear in the logging stream before they are
 needed.  That is, if some piece of context may affect how the
 symbolizing filter would interpret or present a later presentation
 element, the necessary contextual elements should have appeared
 somewhere earlier in the logging stream.  It should always be possible
 for the symbolizing filter to be implemented as a single pass over the
 raw logging stream, accumulating context and massaging text as it goes.

 * `{{{reset}}}`

   This should be output before any other contextual element. The need
   for this contextual element is to support implementations that handle
   logs coming from multiple processes. Such implementations might not
   know when a new process starts or ends. Because some identifying
   information (like process IDs) might be the same between old and new
   processes, a way is needed to distinguish two processes with such
   identical identifying information. This element informs such
   implementations to reset the state of a filter so that information
   from a previous process's contextual elements is not assumed for new
   process that just happens have the same identifying information.

 * `{{{module:%i:%s:%s:...}}}`

   This element represents a so called "module". A "module" is a single
   linked binary, such as a loaded ELF file. Usually each module occupies
   a contiguous range of memory (always does on Zircon).

   Here `%i` is the Module ID which is used by other contextual elements
   to refer to this module. The first `%s` is a human-readable identifier
   for the module, such as an ELF `DT_SONAME` string or a file name; but
   it might be empty. It's only for casual information. The Module ID
   will be exclusivelly used to refer to this module in other contextual
   elements. The second `%s` is the module type and it determines what
   the remaining fields are. The following module types are supported:

   * `elf:%x`

     Here `%x` encodes an ELF Build ID. The Build ID should refer to a
     single linked binary. The Build ID string is the sole way to identify
     the binary from which this module was loaded.

   Example:
   ```
   {{{module:1:libc.so:elf:83238ab56ba10497}}}
   ```

 * `{{{mmap:%p:%x:...}}}`

   This contextual element is used to give information about a particular
   region in memory. `%p` is the starting address and `%x` gives the size
   in hex of the region of memory. The `...` part can take different forms
   to give different information about the specified region of memory. The
   allowed forms are the following:

   * `load:%i:%s:%p`

     This subelement informs the filter that a segment was loaded from a
     module. The module is identified by its module id `%i`. The `%s` is
     one or more of the letters 'r', 'w', and 'x' (in that order and in
     either upper or lower case) to indicate this segment of memory is
     readable, writable, and/or executable. The symbolizing filter can use
     this information to guess whether an address is a likely code address
     or a likely data address in the given module. The remaining `%p` gives
     the module relative address. For ELF files the module relative address
     will be the `p_vaddr` of the associated program header. For example if
     your module's executable segment has `p_vaddr=0x1000`, `p_memsz=0x1234`,
     and was loaded at 0x7acba69d5000 then you need to subtract 0x7acba69d4000
     from any address between 0x7acba69d5000 and 0x7acba69d6234 to get the
     module relative address. The starting address will usually have been
     rounded down to the active page size, and the size rounded up.

   Example:
   ```
   {{{mmap:0x7acba69d5000:0x5a000:load:1:rx:0x1000}}}
   ```
	# Symbolizer markup format #

	This document defines a text format for log messages that can be
	processed by a _symbolizing filter_. The basic idea is that logging
	code emits text that contains raw address values and so forth, without
	the logging code doing any real work to convert those values to
	human-readable form. Instead, logging text uses the markup format
	defined here to identify pieces of information that should be converted
	to human-readable form after the fact. As with other markup formats,
	the expectation is that most of the text will be displayed as is, while
	the markup elements will be replaced with expanded text, or converted
	into active UI elements, that present more details in symbolic form.

	This means there is no need for symbol tables, DWARF debugging sections,
	or similar information to be directly accessible at runtime. There is
	also no need at runtime for any logic intended to compute human-readable
	presentation of information, such as C++ symbol demangling. Instead,
	logging must include markup elements that give the contextual
	information necessary to make sense of the raw data, such as memory
	layout details.

	This format identifies markup elements with a syntax that is both simple
	and distinctive. It's simple enough to be matched and parsed with
	straightforward code. It's distinctive enough that character sequences
	that look like the start or end of a markup element should rarely if
	ever appear incidentally in logging text. It's specifically intended
	not to require sanitizing plain text, such as the HTML/XML requirement
	to replace `<` with `<` and the like.

	## Scope and assumptions ##

	This specification defines a format standard for Zircon and Fuchsia.
	But there is nothing specific to Zircon or Fuchsia about the markup
	format. A symbolizing filter implementation will be independent both of
	the _target_ operating system and machine architecture where the logs
	are generated and of the _host_ operating system and machine
	architecture where the filter runs.

	This format assumes that the symbolizing filter processes intact whole
	lines. If long lines might be split during some stage of a logging
	pipeline, they must be reassembled to restore the original line breaks
	before feeding lines into the symbolizing filter. Most markup elements
	must appear entirely on a single line (often with other text before
	and/or after the markup element). There are some markup elements that
	are specified to span lines, with line breaks in the middle of the
	element. Even in those cases, the filter is not expected to handle line
	breaks in arbitrary places inside a markup element, but only inside
	certain fields.

	This format assumes that the symbolizing filter processes a coherent
	stream of log lines from a single process address space context. If a
	logging stream interleaves log lines from more than one process, these
	must be collated into separate per-process log streams and each stream
	processed by a separate instance of the symbolizing filter. Because the
	kernel and user processes use disjoint address regions in most operating
	systems (including Zircon), a single user process address space plus
	the kernel address space can be treated as a single address space for
	symbolization purposes if desired.

	## Dependence on Build IDs ##

	The symbolizer markup scheme relies on contextual information about
	runtime memory address layout to make it possible to convert markup
	elements into useful symbolic form. This relies on having an
	unmistakable identification of which binary was loaded at each address.

	An ELF Build ID is the payload of an ELF note with name `"GNU"` and type
	`NT_GNU_BUILD_ID`, a unique byte sequence that identifies a particular
	binary (executable, shared library, loadable module, or driver module).
	The linker generates this automatically based on a hash that includes
	the complete symbol table and debugging information, even if this is
	later stripped from the binary.

	This specification uses the ELF Build ID as the sole means of
	identifying binaries. Each binary relevant to the log must have been
	linked with a unique Build ID. The symbolizing filter must have some
	means of mapping a Build ID back to the original ELF binary (either the
	whole unstripped binary, or a stripped binary paired with a separate
	debug file).

	## Colorization ##

	The markup format supports a restricted subset of ANSI X3.64 SGR (Select
	Graphic Rendition) control sequences. These are unlike other markup
	elements:
	* They specify presentation details (bold or colors) rather than
	semantic information. The assocation of semantic meaning with color
	(e.g. red for errors) is chosen by the code doing the logging, rather
	than by the UI presentation of the symbolizing filter. This is a
	concession to existing code (e.g. LLVM sanitizer runtimes) that use
	specific colors and would require substantial changes to generate
	semantic markup instead.
	* A single control sequence changes "the state", rather than being an
	hierarchical structure that surrounds affected text.

	The filter processes ANSI SGR control sequences only within a single
	line. If a control sequence to enter a bold or color state is
	encountered, it's expected that the control sequence to reset to default
	state will be encountered before the end of that line. If a "dangling"
	state is left at the end of a line, the filter may reset to default
	state for the next line.

	An SGR control sequence is not interpreted inside any other markup element.
	However, other markup elements may appear between SGR control sequences and
	the color/bold state is expected to apply to the symbolic output that
	replaces the markup element in the filter's output.

	The accepted SGR control sequences all have the form `"\033[%um"`
	(expressed here using C string syntax), where `%u` is one of these:

	\| Code \| Effect \| Notes \|
	\|:----:\|:------:\|-------\|
	\| `0` \| Reset to default formatting. \| \|
	\| `1` \| Use bold text \| Combines with color states, doesn't reset them.\|
	\| `30` \| Black foreground \| \|
	\| `31` \| Red foreground \| \|
	\| `32` \| Green foreground \| \|
	\| `33` \| Yellow foreground \| \|
	\| `34` \| Blue foreground \| \|
	\| `35` \| Magenta foreground \| \|
	\| `36` \| Cyan foreground \| \|
	\| `37` \| White foreground \| \|

	## Common markup element syntax ##

	All the markup elements share a common syntactic structure to facilitate
	simple matching and parsing code. Each element has the form:

	```
	{{{tag:fields}}}
	```

	`tag` identifies one of the element types described below, and is always
	a short alphabetic string that must be in lower case. The rest of the
	element consists of one or more fields. Fields are separated by `:` and
	cannot contain any `:` or `}` characters. How many fields must be or
	may be present and what they contain is specified for each element type.

	No markup elements or ANSI SGR control sequences are interpreted inside the
	contents of a field.

	In the descriptions of each element type, `printf`-style placeholders
	indicate field contents:

	* `%s`

	A string of printable characters, not including `:` or `}`.

	* `%p`

	An address value represented by `0x` followed by an even number of
	hexadecimal digits (using either lower-case or upper-case for
	`A`..`F`). If the digits are all `0` then the `0x` prefix may be
	omitted. No more than 16 hexadecimal digits are expected to appear in
	a single value (64 bits).

	* `%u`

	A nonnegative decimal integer.

	* `%x`

	A sequence of an even number of hexadecimal digits (using either
	lower-case or upper-case for `A`..`F`), with no `0x` prefix.
	This represents an arbitrary sequence of bytes, such as an ELF Build ID.

	## Presentation elements ##

	These are elements that convey a specific program entity to be displayed
	in human-readable symbolic form.

	* `{{{symbol:%s}}}`

	Here `%s` is the linkage name for a symbol or type. It may require
	demangling according to language ABI rules. Even for unmangled names,
	it's recommended that this markup element be used to identify a symbol
	name so that it can be presented distinctively.

	Examples:
	```
	{{{symbol:_ZN7Mangled4NameEv}}}
	{{{symbol:foobar}}}
	```

	* `{{{pc:%p}}}`

	Here `%p` is the memory address of a code location.
	It might be presented as a function name and source location.

	Examples:
	```
	{{{pc:0x12345678}}}
	{{{pc:0xffffffff9abcdef0}}}
	```

	* `{{{data:%p}}}`

	Here `%p` is the memory address of a data location.
	It might be presented as the name of a global variable at that location.

	Examples:
	```
	{{{data:0x12345678}}}
	{{{data:0xffffffff9abcdef0}}}
	```

	* `{{{bt:%u:%p}}}`

	This represents one frame in a backtrace. It usually appears on a
	line by itself (surrounded only by whitespace), in a sequence of such
	lines with ascending frame numbers. So the human-readable output
	might be formatted assuming that, such that it looks good for a
	sequence of `bt` elements each alone on its line with uniform
	indentation of each line. But it can appear anywhere, so the filter
	should not remove any non-whitespace text surrounding the element.

	Here `%u` is the frame number, which starts at zero for the location
	of the fault being identified, increments to one for the caller of
	frame zero's call frame, to two for the caller of frame one, etc.
	`%p` is the memory address of a code location.

	In frames after frame zero, this code location identifies a call site.
	Some emitters may subtract one byte or one instruction length from the
	actual return address for the call site, with the intent that the
	address logged can be translated directly to a source location for the
	call site and not for the apparent return site thereafter (which can
	be confusing). It's recommended that emitters _not_ do this, so that
	each frame's code location is the exact return address given to its
	callee and e.g. could be highlighted in instruction-level disassembly.
	The symbolizing filter can do the adjustment to the address it
	translates into a source location. Assuming that a call instruction
	is longer than one byte on all supported machines, applying the
	"subtract one byte" adjustment a second time still results in an
	address somewhere in the call instruction, so a little sloppiness here
	does no harm.

	Examples:
	```
	{{{bt:0:0x12345678}}}
	{{{bt:1:0xffffffff9abcdef0}}}
	```

	* `{{{hexdict:...}}}`

	This element can span multiple lines. Here `...` is a sequence of
	key-value pairs where a single `:` separates each key from its value,
	and arbitrary whitespace separates the pairs. The value (right-hand
	side) of each pair either is one or more `0` digits, or is `0x`
	followed by hexadecimal digits. Each value might be a memory address
	or might be some other integer (including an integer that looks like a
	likely memory address but actually has an unrelated purpose). When
	the contextual information about the memory layout suggests that a
	given value could be a code location or a global variable data
	address, it might be presented as a source location or variable name
	or with active UI that makes such interpretation optionally visible.

	The intended use is for things like register dumps, where the emitter
	doesn't know which values might have a symbolic interpretation but a
	presentation that makes plausible symbolic interpretations available
	might be very useful to someone reading the log. At the same time,
	a flat text presentation should usually avoid interfering too much
	with the original contents and formatting of the dump. For example,
	it might use footnotes with source locations for values that appear
	to be code locations. An active UI presentation might show the dump
	text as is, but highlight values with symbolic information available
	and pop up a presentation of symbolic details when a value is selected.

	Example:
	```
	{{{hexdict:
	CS: 0 RIP: 0x6ee17076fb80 EFL: 0x10246 CR2: 0
	RAX: 0xc53d0acbcf0 RBX: 0x1e659ea7e0d0 RCX: 0 RDX: 0x6ee1708300cc
	RSI: 0 RDI: 0x6ee170830040 RBP: 0x3b13734898e0 RSP: 0x3b13734898d8
	R8: 0x3b1373489860 R9: 0x2776ff4f R10: 0x2749d3e9a940 R11: 0x246
	R12: 0x1e659ea7e0f0 R13: 0xd7231230fd6ff2e7 R14: 0x1e659ea7e108 R15: 0xc53d0acbcf0
	}}}
	```

	## Trigger elements ##

	These elements cause an external action and will be presented to the
	user in a human readable form. Generally they trigger an external
	action to occur that results in a linkable page. The link or some
	other informative information about the external action can then be
	presented to the user.

	* `{{{dumpfile:%s:%s}}}`

	Here the first `%s` is an identifier for a type of dump and the
	second `%s` is an identifier for a particular dump that's just been
	published. The types of dumps, the exact meaning of "published",
	and the nature of the identifier are outside the scope of the markup
	format per se. In general it might correspond to writing a file by
	that name or something similar.

	This element may trigger additional post-processing work beyond
	symbolizing the markup. It indicates that a dump file of some sort
	has been published. Some logic attached to the symbolizing filter may
	understand certain types of dump file and trigger additional
	post-processing of the dump file upon encountering this element (e.g.
	generating visualizations, symbolization). The expectation is that the
	information collected from contextual elements (described below) in the
	logging stream may be necessary to decode the content of the dump. So
	if the symbolizing filter triggers other processing, it may need to
	feed some distilled form of the contextual information to those
	processes.

	On Zircon and Fuchsia in particular, "publish" means to call the
	`__sanitizer_publish_data` function from `<zircon/sanitizer.h>`
	with the "type" identifier as the "sink name" string. The "dump
	identifier" is the name attached to the Zircon VMO whose handle
	was passed in the call to `__sanitizer_publish_data`.
	**TODO(mcgrathr): Link to docs about `__sanitizer_publish_data` and
	getting data dumps off the device.**

	An example of a type identifier is `sancov`, for dumps from LLVM
	[SanitizerCoverage](https://clang.llvm.org/docs/SanitizerCoverage.html).

	Example:
	```
	{{{dumpfile:sancov:sancov.8675}}}
	```

	## Contextual elements ##

	These are elements that supply information necessary to convert
	presentation elements to symbolic form. Unlike presentation elements,
	they are not directly related to the surrounding text. Contextual
	elements should appear alone on lines with no other non-whitespace
	text, so that the symbolizing filter might elide the whole line from
	its output without hiding any other log text.

	The contextual elements themselves do not necessarily need to be
	presented in human-readable output. However, the information they
	impart may be essential to understanding the logging text even after
	symbolization. So it's recommended that this information be preserved
	in some form when the original raw log with markup may no longer be
	readily accessible for whatever reason.

	Contextual elements should appear in the logging stream before they are
	needed. That is, if some piece of context may affect how the
	symbolizing filter would interpret or present a later presentation
	element, the necessary contextual elements should have appeared
	somewhere earlier in the logging stream. It should always be possible
	for the symbolizing filter to be implemented as a single pass over the
	raw logging stream, accumulating context and massaging text as it goes.

	* `{{{reset}}}`

	This should be output before any other contextual element. The need
	for this contextual element is to support implementations that handle
	logs coming from multiple processes. Such implementations might not
	know when a new process starts or ends. Because some identifying
	information (like process IDs) might be the same between old and new
	processes, a way is needed to distinguish two processes with such
	identical identifying information. This element informs such
	implementations to reset the state of a filter so that information
	from a previous process's contextual elements is not assumed for new
	process that just happens have the same identifying information.

	* `{{{module:%i:%s:%s:...}}}`

	This element represents a so called "module". A "module" is a single
	linked binary, such as a loaded ELF file. Usually each module occupies
	a contiguous range of memory (always does on Zircon).

	Here `%i` is the Module ID which is used by other contextual elements
	to refer to this module. The first `%s` is a human-readable identifier
	for the module, such as an ELF `DT_SONAME` string or a file name; but
	it might be empty. It's only for casual information. The Module ID
	will be exclusivelly used to refer to this module in other contextual
	elements. The second `%s` is the module type and it determines what
	the remaining fields are. The following module types are supported:

	* `elf:%x`

	Here `%x` encodes an ELF Build ID. The Build ID should refer to a
	single linked binary. The Build ID string is the sole way to identify
	the binary from which this module was loaded.

	Example:
	```
	{{{module:1:libc.so:elf:83238ab56ba10497}}}
	```

	* `{{{mmap:%p:%x:...}}}`

	This contextual element is used to give information about a particular
	region in memory. `%p` is the starting address and `%x` gives the size
	in hex of the region of memory. The `...` part can take different forms
	to give different information about the specified region of memory. The
	allowed forms are the following:

	* `load:%i:%s:%p`

	This subelement informs the filter that a segment was loaded from a
	module. The module is identified by its module id `%i`. The `%s` is
	one or more of the letters 'r', 'w', and 'x' (in that order and in
	either upper or lower case) to indicate this segment of memory is
	readable, writable, and/or executable. The symbolizing filter can use
	this information to guess whether an address is a likely code address
	or a likely data address in the given module. The remaining `%p` gives
	the module relative address. For ELF files the module relative address
	will be the `p_vaddr` of the associated program header. For example if
	your module's executable segment has `p_vaddr=0x1000`, `p_memsz=0x1234`,
	and was loaded at 0x7acba69d5000 then you need to subtract 0x7acba69d4000
	from any address between 0x7acba69d5000 and 0x7acba69d6234 to get the
	module relative address. The starting address will usually have been
	rounded down to the active page size, and the size rounded up.

	Example:
	```
	{{{mmap:0x7acba69d5000:0x5a000:load:1:rx:0x1000}}}
	```