Contributors: cphoenix@
This codelab explains the Triage utility.
The source files and examples to which this document refers are available at:
Triage allows you to scan bug dump files (bugreport.zip, fuchsia_feedback_data.zip) for predefined conditions.
The Triage system makes it easy to configure new conditions, increasing the usefulness of Triage for everyone.
The current version of Triage is a host-side command line tool, invoked with fx triage
.
{"format": "JSON"}
fx build
This codelab includes an inspect.json
file with values to make the exercises work predictably.
Note: inspect.json
files are packaged in the bugreport.zip
file produced by fx bugreport
. Use unzip
to unpack this file.
Note: If you run fx triage
without specifying a --inspect
option, it runs a fresh fx bugreport
and analyzes its inspect.json file.
fx triage
This command downloads a fresh bugreport.zip
file using the fx bugreport
command. This command runs the default rules from //src/diagnostics/config/triage/*.triage
.
inspect.json
file:$ fx triage --inspect my/foo/inspect.json
Note: You can only specify one --inspect
argument.
*.triage
files in the specific directory:fx triage --config my/directory --config my/file.triage
Note: You can use multiple --config
arguments.
Note: If a --config
argument is used, the default rules will not be automatically loaded.
fx triage --config . --inspect inspect.json
Running this command in the codelab directory with the unmodified codelab files prints a line indicating that Triage is working properly:
Warning: 'always_triggered' in 'rules' detected 'Triage is running': 'always_true' was true
The inspect.json file in the codelab directory indicates a couple of problems with the system. You're going to configure the triage system to detect those problems.
This step configures Triage to extract values from the data in the inspect.json
file.
The rules.triage
file contains a key-value section called “metrics”. The key name will be used in the body of other config entries. The key's value is a selector structure.
Note: Names (of metrics, actions, tests, and basenames of config files) can be any letter or underscore, followed by any number of letters, numbers, or underscores.
The selector structure has the key Selector
. Its value is a colon-separated string that tells where in the Inspect data to find the number you need.
"disk_used": {"Selector": "global_dat/storage:root.stats:used_bytes"}
Note: This line includes a typo which we'll fix later in the codelab.
Inspect data published by a component is organized as a tree of nodes with values (properties) at the leaves. The inspect.json file is an array of these trees, each with a path
that identifies the source component.
The portion of the selector string before the first colon should match (be a substring of) exactly one of the path
strings in the inspect.json file.
The portion between the two colons is a .
-separated list of node names.
The portion after the second colon is the property name.
The above selector string indicates a component whose path includes the string global_dat/storage
. It also indicates the used_bytes
property from the stats
subnode of the root
node of that component's Inspect Tree.
Copy the above “disk_used” selector metric, and add it to the “metrics” section of the rules.triage file.
Write and add another selector named “disk_total” to select the “total_bytes” property at the same node in the Inspect data.
Note: JSON is picky about commas. Make sure every selector except the last one is followed by a comma.
In addition to selecting values from the “inspect.json” file, you need to do some logic, and probably some arithmetic, to see whether those values indicate a condition worth flagging.
Copy and add the following metric to calculate how full the disk is:
"disk_percentage": {"Eval": "disk_used / disk_total"}
Copy and add the following metric to calculate whether the disk is 98% full.
"disk98": {"Eval": "disk_percentage > 0.98"}
In the “actions” part of the config file, add an action which prints a warning when the disk is 98% full. Use the following line:
"disk_full": {"trigger": "disk98", "print": "Disk is over 98% full"}
print
is the only available action.The following command will run Triage against the local config file.
fx triage --config . --inspect inspect.json
You will get several lines of error indications. What happened?
There was a typo in the selector rules. If you read past all the backslashes (the next version of Triage will be friendlier), you'll see that Triage could not find values needed to evaluate a rule. In fact, the correct selector is “global_data” not “global_dat.” Fix it in your selector rules and try again.
fx triage --config . --inspect inspect.json
Now what happened? Nothing, right? So, how do you know whether there was no problem in the inspect.json file, or a bug in your rule?
You can (and should!) add tests for your actions. For each test, write a snippet of inspect.json content and specify whether it should or should not trigger your rule.
To test the rule you've added, add the following to the “tests” section of the rules.triage file:
"is_full": {"yes": ["disk_full"], "no": [], "inspect": [ {"path": "global_data/storage", "contents": {"root": {"stats": {"total_bytes": 100, "used_bytes": 99}}}} ] }
You can also test conditions in which actions should not trigger:
"not_full": {"yes": [], "no": ["disk_full"], "inspect": [ {"path": "global_data/storage", "contents": {"root": {"stats": {"total_bytes": 100, "used_bytes": 97}}}} ] }
To run the test, just run Triage. It automatically self-tests each time it's run.
fx triage --config . --inspect inspect.json
Whoops! That should signal an error:
Test is_full failed: trigger disk98 of action disk_full returned Bool(false), expected true
Triage's arithmetic engine preserves the type of the operands, so 99/100 is 0. You can convert to floating point by adding 0.0. Modify your disk_percentage
rule:
"disk_percentage": {"Eval": "(disk_used + 0.0) / disk_total"}
Run Triage again. The error should disappear, replaced by a warning that your inspect.json file does in fact indicate a full disk.
Warning: 'disk_full' in 'rules' detected 'Disk is 98% full': 'disk98' was true
You can add any number of Triage configuration files, and even use metrics defined in one file in another file. This has lots of applications:
Add a file “product.triage” containing the following:
{ "metrics": { "max_widgets": {"Eval": "4"} }, "actions": {}, "tests": {} }
Add the following metrics to the rules.triage file:
"actual_widgets": {"Selector": "widget_maker.cmx:root:total_widgets"}
That will extract how many widgets were active in the device.
"too_many_widgets": {"Eval": "actual_widgets > product::max_widgets"}
That compares the actual widgets with the theoretical maximum for the product.
Note: To use metrics from another file, combine the file name, two colons, and the metric name.
Finally, add an action:
"widget_overflow": {"trigger": "too_many_widgets", "print": "Too many widgets!"}
Unfortunately, this device tried to use 6 widgets, so this warning should trigger when “fx triage” is run.
Note: The trigger
of an action can also use file::name
syntax to refer to a metric in another file.
In a production environment, several “product.triage” files could be maintained in different directories, and Triage could be directed to use any of them with the “--config” command line argument. (Future versions of Triage may be able to select the correct product file automatically.)
See Triage (fx triage) for the latest features and options - Triage will keep improving!