commit: 514553cd402fd8c1d6e6107391840dbcb2467925
[log]
author: Aaron Webster <awebster@gmail.com>
Thu May 21 12:07:16 2026 -0700
committer: Aaron Webster <awebster@gmail.com>
Thu May 21 12:07:16 2026 -0700
tree: 3d72465731c0825cb7440e66554460e0c1c28872
parent: 8ede6c1740c5b3b17bd00d5996e90d5e8facccd1 [diff]

Add Wireshark Lua dissector backend

Adds a parallel back end at compiler/back_end/lua/ that turns an Emboss
.emb into a runnable Wireshark Lua dissector. Mirrors the C++ backend's
shape: a py_binary driver, a starlark rule (lua_emboss_library) exposed
from the root build_defs.bzl, a (wireshark)-qualified attribute set, and
golden tests parallel to cpp_golden_test.

Generator highlights:

* One Proto per .emb, one local function per struct/bits, one value
  strings table per enum.
* Nested structs dissected via forward-declared dispatch.
* Bit-addressable (`bits`) blocks emitted as masked ProtoFields against
  a single container read.
* `--` doc comments become the ProtoField description; `#` hash
  comments are ignored.
* Endianness honored via `subtree:add` vs `subtree:add_le`.

Module-level attributes:

* `[(wireshark) protocol: "name"]`     name of the generated Proto
* `[(wireshark) root: "Struct"]`       which struct dispatches the top
* `[(wireshark) register_on: "..."]`   Wireshark-display-filter-style
                                       string of `<table> == <pattern>`
                                       terms separated by `or` / `||`.
                                       Each term becomes a
                                       DissectorTable.get(...):add(...)
                                       call so Wireshark routes packets
                                       from Ethernet/IP/UDP/TCP layers
                                       into the generated dissector.

Struct- and field-level:

* `[(wireshark) filter: "name"]`       overrides the auto-generated
                                       Wireshark filter-name segment.

Plumbing:

* New `emboss_lua_library` macro + `lua_emboss_library` rule + aspect
  in the root build_defs.bzl, modelled on cc_emboss_library.
* `embossc --generate lua` (in addition to the existing `cc`).
* scripts/regenerate_goldens.py also refreshes the Lua goldens.

Tests:

* compiler/back_end/lua/dissector_generator_test.py — 27 unit tests
  covering identifier sanitization, integer-width mapping, register_on
  parsing, enum value-strings emission, filter composition, doc-text
  extraction, attribute validation, root-struct selection, and nested
  struct dispatch.
* lua_golden_test targets in compiler/back_end/lua/BUILD covering
  enum, nested_structure, uint_sizes, int_sizes, and the new
  wireshark.emb fixture.

build_defs.bzl[diff]
compiler/back_end/lua/BUILD[Added - diff]
compiler/back_end/lua/__init__.py[Added - diff]
compiler/back_end/lua/attributes.py[Added - diff]
compiler/back_end/lua/build_defs.bzl[Added - diff]
compiler/back_end/lua/dissector_generator.py[Added - diff]
compiler/back_end/lua/dissector_generator_test.py[Added - diff]
compiler/back_end/lua/emboss_codegen_lua.py[Added - diff]
compiler/back_end/lua/one_golden_test.py[Added - diff]
compiler/back_end/lua/run_one_golden_test.py[Added - diff]
embossc[diff]
scripts/regenerate_goldens.py[diff]
testdata/BUILD[diff]
testdata/golden_lua/BUILD[Added - diff]
testdata/golden_lua/enum.emb.lua[Added - diff]
testdata/golden_lua/int_sizes.emb.lua[Added - diff]
testdata/golden_lua/nested_structure.emb.lua[Added - diff]
testdata/golden_lua/uint_sizes.emb.lua[Added - diff]
testdata/golden_lua/wireshark.emb.lua[Added - diff]
testdata/wireshark.emb[Added - diff]

20 files changed

tree: 3d72465731c0825cb7440e66554460e0c1c28872

README.md

Emboss

Emboss is a tool for generating code that reads and writes binary data structures. It is designed to help write code that communicates with hardware devices such as GPS receivers, LIDAR scanners, or actuators.

What does Emboss do?

Emboss takes specifications of binary data structures, and produces code that will efficiently and safely read and write those structures.

Currently, Emboss only generates C++ code, but the compiler is structured so that writing new back ends is relatively easy -- contact emboss-dev@google.com if you think Emboss would be useful, but your project uses a different language.

When should I use Emboss?

If you're sitting down with a manual that looks something like this or this, Emboss is meant for you.

When should I not use Emboss?

Emboss is not designed to handle text-based protocols; if you can use minicom or telnet to connect to your device, and manually enter commands and see responses, Emboss probably won't help you.

Emboss is intended for cases where you do not control the data format. If you are defining your own format, you may be better off using Protocol Buffers or Cap'n Proto or BSON or some similar system.

Why not just use packed structs?

In C++, packed structs are most common method of dealing with these kinds of structures; however, they have a number of drawbacks compared to Emboss views:

Access to packed structs is not checked. Emboss (by default) ensures that you do not read or write out of bounds.
It is easy to accidentally trigger C++ undefined behavior using packed structs, for example by not respecting the struct's alignment restrictions or by running afoul of strict aliasing rules. Emboss is designed to work with misaligned data, and is careful to use strict-aliasing-safe constructs.
Packed structs do not handle variable-size arrays, nor arrays of sub-byte-size fields, such as boolean flags.
Packed structs do not handle endianness; your code must be very careful to correctly convert stored endianness to native.
Packed structs do not handle variable-sized fields, such as embedded substructs with variable length.
Although unions can sometimes help, packed structs do not handle overlapping fields well.
Although unions can sometimes help, packed structs do not handle optional fields well.
Certain aspects of bitfields in C++, such as their exact placement within the larger containing block, are implementation-defined. Emboss always reads and writes bitfields in a portable way.
Packed structs do not have support for conversion to human-readable text format.
It is difficult to read the definition of a packed struct in order to generate documentation, alternate representations, or support in languages other than C and C++.

What does Emboss not do?

Emboss does not help you transmit data over a wire -- you must use something else to actually transmit bytes back and forth. This is partly because there are too many possible ways of communicating with devices, but also because it allows you to manipulate structures independently of where they came from or where they are going.

Emboss does not help you interpret your data, or implement any kind of higher-level logic. It is strictly meant to help you turn bit patterns into something suitable for your programming language to handle.

What state is Emboss in?

Emboss is currently under development. While it should be entirely ready for many data formats, it may still be missing features. If you find something that Emboss can't handle, please contact emboss-dev@google.com to see if and when support can be added.

Emboss is not an officially supported Google product: while the Emboss authors will try to answer feature requests, bug reports, and questions, there is no SLA (service level agreement).

Getting Started

Head over to the User Guide to get started.