{% set rfcid = “RFC-0033” %} {% include “docs/contribute/governance/rfcs/_common/_rfc_header.md” %}
Note: Formerly known as FTP-033.
This FTP amends and clarifies the behavior of FIDL decoders when encountering tables, extensible unions, enums, and bits — extensible messages[^1] — contain fields where the type is unknown.
Specifically, we propose:
strict
keyword that can prefix an extensible message declaration. This guarantees that messages will be received with no unknown fields, by rejecting them during validation.This RFC was amended by:
Extensible messages are a valuable mechanism to enable a data interchange format to evolve without breaking wire format (binary) compatibility. However, changing the schema poses design decisions for FIDL decoders, since questions arise in how to validate, parse and expose those fields to an end-user.
While each language has different mechanisms and norms for data structure access, specifying behavior for decoders and their APIs increases security by enforcing validation behavior, and improves overall ergonomics by increasing consistency across types and languages.
We also wish to enable bindings for constrained environments, where parsing unknown fields may not be necessary for correct operation and add undue performance burdens. This is also relevant for messages that have achieved maturity, and are not expected to evolve further.
An unknown field is one whose ordinal (table), tag (extensible union), value (enum), or specific bit (bits) is unknown to the reader. From here, we will use “tag” to refer to the unknown ordinal/tag/value/specific bit, for brevity.
switch()
in C/C++, match
in Rust):default
in C/C++, _
in Rust) can be omitted.// Bindings SHOULD NOT offer this API: switch(union.Which()) { case Tag1: ... case Tag2: ... case Tag3: ... default: ... // no unknown tag in bindings forces handling using default case } // Bindings SHOULD offer this API: switch(union.Which()) { case Tag1: ... case Tag2: ... case Tag3: ... case Tag_Unknown: ... // no default case: new tags cause a non-exhaustiveness warning }
strict
keyword that can prefix extensible message declarations, e.g., strict table T { ... }
or strict enum T { ... }
.[Transitional]
attribute to soft transition.Example syntax:
// One simply doesn't walk into Mordor and add a new file mode, so this is // reasonable to be strict. strict bits UnixFilePermission : uint16 { ... }; // It's too dangerous for clients to ignore data in this table if we // extend it later, but we wish to keep the wire format compatible if we // do change it, so it's not a struct. strict table SecurityPolicy { ... };
fidlc
.During the design phase, we also considered allowing the strict keyword to be placed in use sites of declarations, in addition to the proposed declaration site placement.
Example syntax could be:
protocol Important { SomeMethod(...) -> (strict other.library.Message response); }
Here, the other.library.Message
may not have been defined strict
, but we want to use it all the while requiring strict validation.
This adds some design complexity for binding authors, since other.library.Message
may be needed both in strict mode and flexible mode.
On the encoding/validation/decoding, exposing both strict and flexible mode for the same message depending on context is not dissimilar to how strings or vectors are handled. They have the same layout, but can have different bounds depending on where they are used. It is also similar to how extensible unions can be used in nullable or non-nullable contexts. Generally, bindings have chosen a type schema, with some way to indicate bounds, nullability, or as is being explored here, strictness mode.
The second issue with exposing both strict and flexible mode for the same message, is that of dealing with assembly of messages, and querying of messages in user code.
Consider for instance an enum with three members, A
, B
, and C
. In order to expose the flexible mode, we need a special enum member “unknown”. As a result, it is now possible to assemble an enum that does not pass strict validation, such that in the other context where this enum is needed, in the strict context, things will fail during encoding. Here again, the parallel with strings and vectors is important: without a highly specialized API, bindings allow creating strings and vectors that are too long, and then fail to be encoded.
The strategy to follow when faced with supporting both strict and flexible mode is to generate all the extra pieces for flexible mode, and ensure that where needed, strict validation is applied during encoding, decoding, and validation.
This FTP improves ergonomics in a few ways:
This FTP increases security.
See the Implementation Strategy section (we plan to use the FIDL compatibility test). Additionally, each language binding should have its own tests to assert correct behavior.
This FTP largely clarifies behavior, and has an associated implementation cost to ensure that language bindings conform to its recommendations.
Strictness ought to be viewed in a similar light as size bounds on vectors or strings; it is a constraint that is independent from a message's layout, and can be changed without ABI breakage.
We want FIDL authors to make an explicit choice to restrict (constrain) their messages.
Further, we do not want a mixed mode, where some messages (e.g., enums) are strict by default, and others (e.g., tables) are not.
It‘s an important-enough idea to deserve its own keyword. There’s enough precedent for similar features in other languages that it translates well to FIDL.
During the design phase, several different alternatives were proposed. The likeliest contender was final
: it denotes “final word on the subject,” has precedence in C++, Java, C# (among others).
However, because we may want to use the keyword “final” on protocols to indicate that one cannot use it in composition (i.e., the traditional use of “final”), we opted for another keyword to indicate strict validation.
This leaves the door open to introduce syntax such as:
final strict protocol Important { MyMethod(SomeTable arg); };
Which would indicate that protocol Important
cannot be composed AND that all validation must be strict.
Other explored keywords were: sealed
, rigid
, fixed
, closed
, known
, and standardized
.
We could define all extensible messages to always be strict. Currently, enums and bits are only strict, so this alternative would extend that to tables and extensible unions.
Under such a scenario, changes to extensible structures (e.g., adding a new field) would require readers to be updated prior to writers being updated. This severely limits the use of these extensible data structures, and is too constraining for higher level use cases.
Furthermore, if that were the design choice, we would not need to use envelopes for tables and extensible unions (i.e., no need for number of bytes nor the number of handles). Indeed, under a strict only interpretation, unknown fields would be rejected, and otherwise the schema would determine the number of bytes and handles to be consumed in a fashion similar to the rest of the messages FIDL processes.
We could define all extensible messages to always be flexible.
This would be very surprising for enums (and bits), and counter to expectations. This leads us to two bad sub-alternatives:
Continuing the exploration to other extensible messages (tables and extensible unions), there is room and a need for strictness.
Consider, for instance, a secure logging protocol LogEntry
defined as a table. Implementations of this protocol would likely want to guarantee that clients do not send fields the server does not understand, for fear that these clients may have expectations about how these new fields may control the handling of the log entry. As an example, a newer version may add a field “pii ranges
” providing ranges of the log entry that contain PII and must be logged specifically (e.g., replaced by a unique ID, with the original data vaulted under that unique ID). To protect old servers from accepting such payload, and likely mishandling those log entries, authors would choose the strict mode for their LogEntry
, thus protecting themselves from potential misuse down the line.
Some of this rationale was guided by go/proto3-unknown-fields, which describes why proto3 dropped support for preserving unknown fields, then later reversed the decision.
Enums & bits are included in extensible messages, since new members can be added or removed after the message is defined.