Trace Processor Parse Cache

Authors: @lalitm

Status: Draft

Problem

Loading large traces (multi-GB) into Trace Processor is slow. Parsing protobuf trace data, sorting events, and populating tables can take minutes for large traces. This cost is paid on every load, even when the trace hasn't changed.

This is painful in several workflows:

  1. Interactive analysis: Reopening the same trace in the UI after a page refresh or browser restart.
  2. AI agents: Repeated open/query/close cycles on the same trace, where each cycle re-parses from scratch.
  3. Iterative scripting: Python notebooks or scripts that create a new TraceProcessor instance on the same trace during development.

The parsed table representation is deterministic for a given trace + TP version + flags. Re-parsing is pure waste when the inputs haven't changed.

Decision

Add a parse cache that serializes parsed tables into an opaque binary format and transparently loads from cache on subsequent opens. The caching logic lives entirely in the shell layer — the TP core remains IO-free.

Design

Cache format

The parse cache format is an internal implementation detail of Trace Processor. It is not stable, not versioned for external consumers, and carries no forwards or backwards compatibility guarantees. The format may change between any two TP releases without notice.

Currently, the cache is a TAR archive of Arrow IPC files (one per intrinsic table), reusing the existing ExportToArrow() / Arrow TAR import infrastructure. This is a convenient implementation choice, not a public contract.

A separate, stable Arrow export feature (with explicit versioning and compatibility guarantees) will be provided independently for users who need a durable, interoperable representation. The parse cache is not that — it is purely an acceleration mechanism tied to a specific TP build.

Cache key

The cache is identified by:

SHA256(tp_version + sorted_relevant_global_flags + trace_identity)

Where trace_identity is:

ClientIdentity
CLI shellfile path + size + mtime (from stat())
Browser/UIFile.name + File.size + File.lastModified
Pythonfile path + size + mtime (from os.stat())

This is fast to compute (no file content hashing) and sufficient for practical invalidation. The chance of a false cache hit (different trace content with identical name + size + mtime) is negligible.

“Relevant global flags” are those that affect parsing behavior: --full-sort, --no-ftrace-raw, --crop-track-events, etc.

The inclusion of tp_version in the cache key means that upgrading TP automatically invalidates all existing caches. This is intentional — schema changes between versions make old caches unsafe to load.

Cache location

~/.cache/perfetto/parse-cache/<hash> by default, overridable via --parse-cache-dir. The cache directory is created on first write.

CLI: --parse-cache global flag

trace_processor_shell --parse-cache query -c "SELECT ..." trace.pb

Behavior:

  1. Compute cache key from the trace file's stat metadata + TP version + flags.
  2. If a valid cache exists, read it and feed it to Parse() instead of the original trace. Skip to step 5.
  3. Otherwise, read the original trace and feed it to Parse() as normal.
  4. After parsing completes, start writing the parse cache in a background thread via ExportToArrow().
  5. Execute the user's command (query, repl, serve, etc.).
  6. On exit, wait for the background cache write to complete before terminating.

The background write means the first load has no added latency for queries. The wait-on-exit means the cache is guaranteed to exist after the first invocation completes.

CLI: parse-cache subcommand

For explicit cache management by human users:

# Create a parse cache for a trace
trace_processor_shell parse-cache create trace.pb

# Show cache status for a trace (exists, size, staleness)
trace_processor_shell parse-cache info trace.pb

# Delete cache for a specific trace
trace_processor_shell parse-cache clear trace.pb

# Delete all parse caches
trace_processor_shell parse-cache clear --all

RPC: /parse with trace identity

To support caching for RPC clients (UI, Python), the first /parse call gains an optional trace_identity field:

message ParseRequest {
  optional bytes data = 1;
  // Optional. Sent only on the first /parse call. If the shell has a
  // valid parse cache for this identity, it loads from cache and ignores
  // subsequent /parse data.
  optional string trace_identity = 2;
}

message ParseResult {
  optional string error = 1;
  // If true, the shell loaded from cache. The client may skip sending
  // further /parse chunks. Old clients that ignore this field and
  // continue sending data are fine — the shell discards the bytes.
  optional bool skip_further_parse = 2;
}

Flow:

  1. Client sends first /parse chunk with trace_identity set.
  2. Shell checks cache:
    • Cache hit: Load cached data, respond with skip_further_parse = true. Discard data from this and all subsequent /parse calls.
    • Cache miss: Parse the data normally, respond with skip_further_parse = false. Continue accepting /parse chunks.
  3. After notify_eof, if cache was missed, write cache in background.
  4. Old clients that don't send trace_identity: no caching, existing behavior unchanged.
  5. New clients that ignore skip_further_parse and keep sending: shell discards bytes, everything still works.

This is fully backwards compatible in both directions.

Python API

Caching is configured via TraceProcessorConfig:

from perfetto.trace_processor import TraceProcessor, TraceProcessorConfig

config = TraceProcessorConfig(parse_cache=True)
tp = TraceProcessor(trace="large.pb", config=config)

Implementation: parse_cache=True causes the Python wrapper to pass --parse-cache to the shell subprocess. The Python client also sends trace_identity (from os.stat()) on the first /parse call so the shell can check its cache. If the response has skip_further_parse = true, Python skips sending remaining chunks.

No explicit cache management API in Python. Users who want manual control can use the CLI parse-cache subcommand.

What is NOT cached

  • Derived state: Views, PerfettoSQL module outputs, and custom SQL are recomputed on top of the cached tables. This is expected and fast.
  • Session-specific tables: __intrinsic_trace_file, __intrinsic_trace_import_logs, and similar metadata tables are excluded from the cache.
  • Empty tables: Tables with zero rows are not written to the cache.
  • Stdin/pipe traces: When there is no file identity (data piped from stdin or generated programmatically), caching is silently skipped.

Disk space

The cache size is roughly comparable to the parsed table data, which can be a significant fraction of the original trace size. Mitigations:

  • Caching is always opt-in (--parse-cache flag or parse-cache create). Users make a conscious choice to spend disk space.
  • parse-cache info shows cache sizes.
  • parse-cache clear --all provides easy cleanup.
  • After writing, the shell prints the cache size as a courtesy: Parse cache written: 8.2 GB at ~/.cache/perfetto/parse-cache/a1b2c3

Concurrency

  • Cache files are written atomically: write to a temporary file, then rename. This prevents partial reads from concurrent TP instances.
  • Multiple TP instances loading the same trace simultaneously may each write a cache. The last rename wins, which is fine since the content is identical.

Architecture summary

+------------------+     +-------------------+     +---------------+
| Clients          |     | Shell layer       |     | TP core       |
|                  |     | (IO lives here)   |     | (IO-free)     |
|  CLI user        |---->|                   |     |               |
|  Browser/UI      |---->| Cache check/write |---->| Parse()       |
|  Python API      |---->| File I/O          |     | ExportToArrow |
|                  |     | Background thread |     |               |
+------------------+     +-------------------+     +---------------+

TP core provides the primitives (ExportToArrow, import via Parse). The shell layer orchestrates caching. RPC clients provide trace identity metadata. No caching logic in TP core or the RPC layer itself.

Alternatives considered

1. Cache in TP core

Add cache awareness to TraceProcessor directly (e.g., in NotifyEndOfFile() or Config).

Pro:

  • Single implementation covers all callers.

Con:

  • Violates TP's IO-free design. Implicit file writes in a library are surprising and hard to control.
  • TP doesn't know about file paths, mtime, or disk layout.

2. SQLite as cache format

Use the existing ExportTraceToDatabase() as the cache format instead.

Pro:

  • Already implemented.
  • Self-describing, queryable directly.

Con:

  • Slower to load (SQL overhead, row-oriented).
  • Larger on disk for numeric-heavy data.
  • Not designed for fast bulk table restoration.

3. Automatic caching by default

Enable caching automatically without user opt-in.

Pro:

  • Zero-friction fast reloads.

Con:

  • Disk bloat can be tremendous and surprising. A 10 GB trace produces a comparably sized cache silently.
  • Dangerous on servers, shared filesystems, CI environments.
  • Users should make a conscious choice to spend disk space.

4. Content hashing for cache key

Use SHA256 of the full trace content as the cache key instead of path + size + mtime.

Pro:

  • Correct even if a file is modified without changing mtime.

Con:

  • Hashing a multi-GB file is slow and defeats the purpose of caching.
  • For RPC clients (browser), reading the full file just to compute a hash is wasteful.
  • The mtime + size heuristic is sufficient in practice.

Open questions

  • Should there be a maximum cache directory size with automatic eviction (LRU)? Or is manual parse-cache clear sufficient?
  • Should parse-cache create support creating caches for multiple traces at once (e.g., parse-cache create *.pb)?
  • For the UI: should the cache directory be configurable via the httpd endpoint, or is ~/.cache/perfetto/parse-cache/ always correct?