blob: b48a8e5e8a4253e75bfda18bfe295016f8b563cc [file] [log] [blame] [view]
# [FIDL Tuning Proposal](README.md) 033
Handling of Unknown Fields & Strictness
=======================================
Field | Value
----------|--------------------------
Status | Accepted
Authors | apang@google.com, pascallouis@google.com
Submitted | 2019-02-07
Reviewed | 2019-03-07
[TOC]
## Summary
This FTP amends and clarifies the behavior of FIDL decoders when
encountering tables, extensible unions, enums, and bits —
_extensible messages_[[^1]](#Footnote1) — contain fields where the
type is unknown.
Specifically, we propose:
* Defining a **strict and flexible behavior** for extensible messages,
specifying how decoders encountering unknown fields (including handles)
ought to behave;
* A **`strict` keyword that can prefix an extensible message
declaration**.
This guarantees that messages will be received with no unknown fields,
by rejecting them during validation.
* Default **extensible messages to be flexible,** i.e.,
where unknown values are allowed and exposed via bindings;
* Define and **recommend APIs that bindings provide for clients to
inspect messages with unknown fields**.
## Motivation
Extensible messages are a valuable mechanism to enable a data interchange
format to evolve without breaking wire format (binary) compatibility.
However, changing the schema poses design decisions for FIDL decoders,
since questions arise in how to validate, parse and expose those fields to
an end-user.
While each language has different mechanisms and norms for data structure
access, specifying behavior for decoders and their APIs increases security
by enforcing validation behavior, and improves overall ergonomics by
increasing consistency across types and languages.
We also wish to enable bindings for constrained environments, where
parsing unknown fields may not be necessary for correct operation and add
undue performance burdens.
This is also relevant for messages which have achieved maturity, and are
not expected to evolve further.
## Design
An _unknown field_ is one whose ordinal (table), tag (extensible union),
value (enum), or specific bit (bits) is unknown to the reader.
From here, we will use "tag" to refer to the unknown
ordinal/tag/value/specific bit, for brevity.
* Messages containing unknown tags MUST be validated and parsed
successfully.
* However, see below for an exception for
[strict](#strict-handling-of-messages) messages.
* Decoders MUST process unknown handles in messages.
* The default processing behavior MUST be to close all handles.
* Bindings MAY offer a mechanism for clients to process unknown
handles specially.
* Bindings MUST provide a mechanism to detect whether an unknown tag was
received as part of a message.
* Bindings SHOULD provide a mechanism to detect whether a field with a
given tag exists as part of a received message.
* Bindings MAY provide a mechanism to read the tag, raw data, and
(untyped) handles in an unknown field.
* If the target language offers a mechanism to exhaustively check tags
at compile time (e.g., a `switch()` in C/C++, `match` in Rust):
* That language binding SHOULD offer a special "unknown" tag that
can be included as part of the exhaustive check, so that a catch-all case
(e.g., `default` in C/C++, `_` in Rust) can be omitted.
* The intention of this recommendation is to prevent the catch-all
case from being required for proper compilation, because if it is, tags
added in the future will not elicit compiler warnings.
* This FTP does not define a mechanism for how this should be
fulfilled, since implementation strategies can differ between languages.
* Example:
```c
// Bindings SHOULD NOT offer this API:
switch(union.Which()) {
case Tag1: ...
case Tag2: ...
case Tag3: ...
default: ...
// no unknown tag in bindings forces handling using default case
}
// Bindings SHOULD offer this API:
switch(union.Which()) {
case Tag1: ...
case Tag2: ...
case Tag3: ...
case Tag_Unknown: ...
// no default case: new tags cause a non-exhaustiveness warning
}
```
### Strict Handling of Messages
* We introduce a `strict` keyword that can prefix extensible message
declarations, e.g., `strict table T { ... }` or `strict enum T { ... }`.
* Strict messages containing unknown fields MUST be considered invalid.
* Bindings MUST NOT offer a special "unknown" tag for exhaustive tag
checking of strict messages, if they support such a mechanism for flexible
messages.
* Transitions from strict messages to flexible messages, and vice versa,
MUST be supported as a non-breaking source-level (API) change, possibly
using the [`[Transitional]` attribute][transitional] to soft transition.
* Such a transition MUST NOT change the wire format (ABI).
* Strict messages are _not_ transitive.
If a message is marked as strict, only that message is strict.
Sub-messages contained within that message are not strict.
Example syntax:
```fidl
// One simply doesn't walk into Mordor and add a new file mode, so this is
// reasonable to be strict.
strict bits UnixFilePermission : uint16 {
...
};
// It's too dangerous for clients to ignore data in this table if we
// extend it later, but we wish to keep the wire format compatible if we
// do change it, so it's not a struct.
strict table SecurityPolicy {
...
};
```
## Implementation Strategy
1. Update the FIDL compatibility test to validate that existing language
bindings conform to this specification.
1. Add test cases for messages with (1) only known fields, (2) only
unknown fields, and (3) at least one known and one unknown field.
2. Ensure the FIDL compatibility test has test cases for empty messages of
all appropriate types.
3. Add support for strict messages in `fidlc`.
4. Update language bindings to support strict messages.
5. Add test cases for strict messages to the FIDL compatibility test.
#### Looking Ahead: Use Site Modifiers
During the design phase, we also considered allowing the strict keyword to
be placed in use sites of declarations, in addition to the proposed
declaration site placement.
Example syntax could be:
```fidl
protocol Important {
SomeMethod(...) -> (strict other.library.Message response);
}
```
Here, the `other.library.Message` may not have been defined `strict`, but
we want to use it all the while requiring strict validation.
This adds some design complexity for binding authors, since
`other.library.Message` may be needed both in strict mode and flexible
mode.
On the encoding/validation/decoding, exposing both strict and flexible
mode for the same message depending on context is not dissimilar to how
strings or vectors are handled.
They have the same layout, but can have different bounds depending on
where they are used.
It is also similar to how extensible unions can be used in nullable or
non-nullable contexts.
Generally, bindings have chosen a type schema, with some way to indicate
bounds, nullability, or as is being explored here, strictness mode.
The second issue with exposing both strict and flexible mode for the same
message, is that of dealing with assembly of messages, and querying of
messages in user code.
Consider for instance an enum with three members, `A`, `B`, and `C`.
In order to expose the flexible mode, we need a special enum member
"unknown".
As a result, it is now possible to assemble an enum which does not pass
strict validation, such that in the other context where this enum is
needed, in the strict context, things will fail during encoding.
Here again, the parallel with strings and vectors is important: without a
highly specialized API, bindings allow creating strings and vectors which
are too long, and then fail to be encoded.
The strategy to follow when faced with supporting both strict and flexible
mode is to generate all the extra pieces for flexible mode, and ensure
that where needed, strict validation is applied during encoding, decoding,
and validation.
## Ergonomics
This FTP improves ergonomics in a few ways:
* We better set users' expectations for FIDL behavior across languages.
* Strict messages enable users to avoid writing unnecessary code to
handle unknown fields.
## Documentation and Examples
* The grammar & language specifications need to be updated for strict
fields.
* The FIDL style guide should be updated to give guidance on when to
declare a message as strict.
## Backwards Compatibility
* This change does not affect ABI compatibility.
* If changes are needed to decoders or bindings to conform to this FTP,
those changes may cause source-level (API) breakage, which should be
addressed on a case-by-case basis.
## Performance
* Forcing decoders and bindings to conform to this FTP may impose a
(likely insignificant) performance penalty, by forcing them to process all
unknown fields and closing all handles.
* Bindings may need an additional level of indirection (and thus use
additional memory/binary size) to offer the "unknown" tag for exhaustive
tag checks.
## Security
This FTP increases security.
* We specify validation behavior for messages with unknown content.
* Strict messages enable a decoder to validate and discard unknown
content before clients inspect them, decreasing the possibility of bugs.
## Testing
See the [Implementation Strategy](#implementation-strategy) section (we
plan to use the FIDL compatibility test).
Additionally, each language binding should have its own tests to assert
correct behavior.
## Drawbacks, Alternatives, and Unknowns
This FTP largely clarifies behavior, and has an associated implementation
cost to ensure that language bindings conform to its recommendations.
#### Alternative: default to strict, or mixed mode
Strictness ought to be viewed in a similar light as size bounds on vectors
or strings; it is a constraint which is independent from a message's layout,
and can be changed without ABI breakage.
We want FIDL authors to make an explicit choice to restrict (constrain)
their messages.
Further, we do not want a mixed mode, where some messages (e.g., enums)
are strict by default, and others (e.g., tables) are not.
#### Alternative: [Strict] attribute instead of a new keyword
It's an important-enough idea to deserve its own keyword.
There's enough precedent for similar features in other languages that it
translates well to FIDL.
#### Alternative: other keywords
During the design phase, several different alternatives were proposed.
The likeliest contender was `final`: it denotes "final word on the
subject," has precedence in C++, Java, C# (among others).
However, because we may want to use the keyword "final" on protocols to
indicate that one cannot use it in composition (i.e., the traditional use
of "final"), we opted for another keyword to indicate strict validation.
This leaves the door open to introduce syntax such as:
```fidl
final strict protocol Important {
MyMethod(SomeTable arg);
};
```
Which would indicate that protocol `Important` cannot be composed AND that
all validation must be strict.
Other explored keywords were: `sealed`, `rigid`, `fixed`, `closed`,
`known`, and `standardized`.
#### Alternative: only strict
We could define all extensible messages to always be strict.
Currently, enums and bits are only strict, so this alternative would
extend that to tables and extensible unions.
Under such a scenario, changes to extensible structures (e.g., adding a
new field) would require readers to be updated _prior_ to writers being
updated.
This severely limits the use of these extensible data structures, and is
too constraining for higher level use cases.
Furthermore, if that were the design choice, we would not need to use
envelopes for tables and extensible unions (i.e., no need for number of
bytes nor the number of handles).
Indeed, under a strict only interpretation, unknown fields would be
rejected, and otherwise the schema would determine the number of bytes and
handles to be consumed in a fashion similar to the rest of the messages
FIDL processes.
#### Alternative: only flexible
We could define all extensible messages to always be flexible.
This would be very surprising for enums (and bits), and counter to
expectations.
This leads us to two bad sub-alternatives:
* Have an exception for enums (and bits) to make them strict — as
noted above, this is confusing and makes the language rules harder to
understand.
* Keep these messages flexible — which would be counter to
expectations, open the door to bugs (e.g., reading an invalid value), and
certainly cause lots of plain-vanilla validation code to be written by
hand vs being provided by bindings.
Continuing the exploration to other extensible messages (tables and
extensible unions), there is room and a need for strictness.
Consider, for instance, a secure logging protocol `LogEntry` defined
as a table.
Implementations of this protocol would likely want to guarantee that
clients do not send fields the server does not understand, for fear that
these clients may have expectations about how these new fields may control
the handling of the log entry.
As an example, a newer version may add a field "`pii ranges`" providing
ranges of the log entry which contain PII and must be logged specifically
(e.g., replaced by a unique ID, with the original data vaulted under that
unique ID).
To protect old servers from accepting such payload, and likely mishandling
those log entries, authors would choose the strict mode for their
`LogEntry`, thus protecting themselves from potential misuse down the line.
## Prior Art and References
Some of this rationale was guided by
[go/proto3-unknown-fields](http://go/proto3-unknown-fields), which
describes why proto3 dropped support for preserving unknown fields, then
later reversed the decision.
* FTP-037: Transactional Message Header v3 (not yet posted)
-------------------------------
##### Footnote1
Enums & bits are included in extensible messages, since new members can be
added or removed after the message is defined.
<!-- xrefs -->
[transitional]: ../attributes.md#transitional