{% set rfcid = “RFC-0026” %} {% include “docs/contribute/governance/rfcs/_common/_rfc_header.md” %}
Note: Formerly known as FTP-026.
Given the amount of feedback and comments on this RFC, we‘ve decided to withdraw (i.e. self-reject) the proposal. That said, it still has some great ideas: we’ll be taking those ideas and publishing them as separate RFCs with smaller scope, to enable clearer discussion and separate independent features into their own RFCs.
RFC-0032 was spun out of this RFC.
This RFC has two goals:
A side-effect of both (1) and (2) is that optionality (nullability) can be efficiently implemented for all types, not just structs, handles, vectors, strings, tables and (extensible) unions[1]
Envelopes are the foundation for extensible, evolvable data structures (tables and extensible unions). Making envelopes more efficient enables those extensible structures to be used in more contexts where performance and wire size matter.
FIDL also has several pervasive types that are used for dynamically-sized data: vectors & strings. These types are required to be out-of-line since the size of the FIDL primary object is expected to be statically known. If envelopes can be used to represent all out-of-line data, we can simplify both the protocol and implementation, reducing implementation cost and room for error.
Additionally, FIDL would benefit from a holistic, consistent approach to optionality. This leads to better ergonomics, optionality for more types than the current mechanisms allow for, and a simplified user mental model. Envelopes fulfill these goals by enabling optionality for all types in a uniform manner.
Envelopes can refer to data that is either:
An out-of-line envelope is:
As a C struct:
typedef struct { uint64_t size:48; // Low bit will be 0 uint16_t handle_count; } fidl_out_of_line_envelope_t;
The out-of-line envelope has the following changes vs the existing envelope format:
vector<string>
includes the size of the outer vector's inner string sub-objects.size % 8 == 0
, which means thathandle_count
is 16 bits, instead of 32 bits.handle_count
includes the handle count for all recursive sub-objects.size
or handle_count
field.size
& handle_count
fields both being zero.Decoders MAY overwrite the envelope with a pointer to the envelope data, assuming they know the static type (schema) of the envelope‘s contents. See the Decoder Callback section for recommendations on how to process an envelope if the content’s type is unknown.
An out-of-line envelope explicitly has the size occupying the least significant bits, and the handle count occupying the most significant bits. As discussed in the Envelope section,
We call the lowest bit of the envelope the tag bit.
Since the tag bit is one for inline data, an inline envelope also cannot be an actual pointer on architectures that require 64-bit alignment, since pointers will be a multiple of 8 and also require the lowest three bits to be zero. This is useful for a decoder to be able to distinguish inline envelopes from an actual pointer, since decoders typically overwrite out-of-line envelopes — but not inline envelopes — with a pointer to the envelope's content.
Inline envelopes are encoded as:
As a C struct:
typedef struct { uint8_t tag:1; // == 1 uint32_t reserved:31; union { _Bool bool; uint32_t uint32; int32_t int32; uint16_t uint16; int16_t int16; uint8_t uint8; int8_t int8; float float32; zx_handle_t handle; // Only when decoded (see Handles for more details) }; } fidl_inline_envelope_t;
int8
, uint8
, int16
, uint16
, int32
, uint32
, float32
, bool
, or a handle.An encoder MUST:
bool
, (u
)int8
, (u
)int16
, (u
)int32
, float32
or handle. (Informally: if the type is fixed-size and <= 32 bits.)There are three contexts for handle declaration:
struct S { handle h; };
struct S { handle? h; };
table T { handle h; }
For (1), a non-optional handle in a non-extensible container, we propose keeping the existing wire format, which is a uint32
. There is no need for a non-optional handle in a non-extensible container to be an envelope, since envelopes are designed to carry optional or dynamically-sized data.
For (3), a handle in an extensible container: since envelopes are the foundation for extensible containers, an envelope must be used to encode the handle. To encode a handle, an encoder MUST encode it as an out-of-line envelope, with size
set to 0, and handle_count
set to 1:
This encoding instructs a decoder to look up the handle value in the out-of-line handle table. If a decoder wishes to decode in-place, the decoder SHOULD:
See the Examples section for an example encoded/decoded handle.
We choose this dual encoded/decoded form since it is compatible with both the out-of-line and inline envelope encodings. While this does result in specialized code for handles in envelopes, we believe that having more uniform, i.e. fewer, data encodings is a better trade-off than simpler code that requires more encodings.
For (2), an optional handle in a non-extensible container: we also propose using the same envelope representation as context (3) for the wire format, i.e. the dual out-of-line-encoded/inline-decoded form. Unfortunately, this representation of an optional handle is less compact than the existing optional handle wire format, which is a uint32
. However, we still advocate using the envelope-based representation, since
uint32
wire format for optional handles would result in three encodings and three separate code paths for handles: non-optional, optional, and handle-in-envelope. Using the envelope representation for optionals eliminates one encoding and one code path, which increases uniformity and decreases specialized code.The encoding for (2) — optional handles in a non-extensible container — is explicitly listed in the Design Decisions section below, since the more compact uint32
representation for an optional handle could be worth considering.
The current wire format for non-nullable Strings and Vectors are stored as 16 bytes:
uint64
for the number of elements (vector) or number of bytes (string),uint64
for the presence/absence/pointer.We propose using an envelope to represent both strings and vectors, either nullable or non-nullable:
For vectors, note that the vector element count is not the same as the envelope's size:
vector<Table>
, vector<vector<string>>
), the envelope's size includes the size of all recursive sub-objects.Nullable strings/vectors, and strings/vectors inside extensible containers, are represented the same way as non-nullable strings and vectors: the zero envelope is used to indicate an absent string/vector.
Conversely, if a string/vector is non-nullable, a validator MUST error if it encounters a zero envelope.
This may be a source-breaking change for code that uses the C bindings, which expect the memory layout for a fidl_vector_t
and fidl_string_t
to exactly match the wire format. We can, however, implement a transitional plan before a wire format change (e.g. change the C API to use functions or macros) that enable this to be a soft transition.
Note that it's still possible to represent this new string/vector layout as a C struct via flexible array members (e.g. struct { uint64 element_count; element_type data[]; };
).
Currently, structs, strings, vectors, handles, unions, tables and extensible unions can be optional (nullable).
Using envelopes everywhere enables all types to be optional:
Note that for small-sized types, inline data can store optional types as compactly as non-optional types, depending on the container's alignment requirements.
The encoded form of an envelope can be represented by a union of either an inline or out-of-line envelope. Similarly, a decoded envelope can either be inline, a pointer to the envelope data, or a callback-determined value (see the Decoder Callback section for details).
typedef union { fidl_inline_envelope_t inline; // Low bit is 1 fidl_out_of_line_envelope_t out_of_line; // Low bit is 0 } fidl_encoded_envelope_t; typedef union { fidl_inline_envelope_t inline; // Low bit is 1 void* data; // Low bit is 0 uintptr_t callback_data; // Value determined by callback (see Decoder Callback) } fidl_decoded_envelope_t; static_assert(sizeof(fidl_encoded_envelope_t) == sizeof(void*)); static_assert(sizeof(fidl_decoded_envelope_t) == sizeof(void*));
Receivers — validators & decoders — may not know the type of an envelope when they‘re used in an evolvable data structure, such as a table or extensible union. If a receiver doesn’t know the type of an envelope:
Note that embedding the size in the out-of-line envelope enables rapid linear seeking through a FIDL message if many unknown types need to be skipped.
As mentioned in the Unknown Data section, an unknown envelope may be overwritten by a decoder: if this happens, the decoder will lose the size and handle count information. As an alternative, a decoder MAY have a callback attached to it that can process the envelope and override the default behavior. The callback API can look similar to the following function prototype:
void set_unknown_envelope_callback( unknown_envelope_callback_t callback, // a callback void* context // client-specific data storage ); typedef uintptr_t (*unknown_envelope_callback_t)( const void* message, // pointer to the envelope's containing message size_t offset, // offset in the message where the unknown envelope is size_t size, // the envelope's size size_t handle_count, // the envelope's handle count const char* bytes, // pointer to the envelope's data void* context // a context pointer set via set_unknown_envelope_callback() );
The callback returns a uintptr_t
, which the decoder can use to overwrite the unknown envelope with. This enables the decoder to copy the size and handle count from the unknown envelope, and overwrite the envelope with a pointer to the decoder's own custom data structure.
This RFC requires that out-of-line envelopes have the correct (recursive) size for present out-of-line data. This requirement can impose additional burden on an encoder, since if the envelope's type is expected to be known by the receiver, the size field is unnecessary since the decoder can compute the size[3]. Thus, the encoder is arguably performing additional work for no apparent benefit. This argument also applies to the handle count.
However, we still recommend that the size and handle count MUST be present, for several reasons:
UINT48_MAX
) or reserving one of the three LSBs in the size field to indicate that the size is unknown, in which case the decoder must traverse the out-of-line payload and calculate the size itself. This change would not affect the wire format, since the structure of the fields remain the same. It can also be landed as a soft transition since decoders can implement the logic first, before encoders are updated.Overall, the RFC authors believe that requiring an encoding for an unknown size is possible premature optimization, and advocate starting with a simple, more consistent, uniform design. If we feel that this decision should be re-visited in the future — e.g. a zero-copy vectored I/O encoder becomes available so encoders don't have to patch up envelopes to write the correct size — there is a clear path to implementing it in as a soft transition.
An optional uint
stored inline:
uint32? u = 0xdeadbeef; // an optional uint: stored inline.
C++ representation:
vector<uint8_t> object{ 0x01, 0x00, 0x00, 0x00, // inline tag 0xEF, 0xBE, 0xAD, 0xDE, // inline data };
An optional vector<uint16>
stored out-of-line:
vector<uint16>? v = { 10, 11, 12, 13, 14 }; // an optional vector<uint16>; stored out-of-line.
The out-of-line size is 24:
sizeof(uint16_t)
),C++ representation:
vector<uint8_t> object{ 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, // envelope size (24) 0x00, 0x00, // handle count }; vector<uint8_t> sub_objects{ // element count 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // vector data 0x0A, 0x00, 0x0B, 0x00, 0x0C, 0x00, 0x0D, 0x00, 0x0E, 0x00, // padding 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, };
A table
with three fields:
table T { 1: int8 i; 2: reserved; 3: int64 j; } = { .i: 241, .j: 71279031231 };
C++ representation:
// a table is a vector<envelope>, which is represented with an // out-of-line envelope vector<uint8_t> object{ 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, // envelope size (40) 0x00, 0x00, // handle count }; vector<uint8_t> sub_objects{ // vector element count (max table ordinal) 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // vector[0], 1: int8, stored inline 0x01, 0x00, 0x00, 0x00, // inline tag 0xF1, 0x00, 0x00, 0x00 // 241 // vector[1], 2: reserved 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // zero envelope // vector[2], 3: int64, stored out-of-line 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, // envelope size 0x00, 0x00, // handle count // vector[2] content 0xBF, 0xB3, 0x8F, 0x98, 0x10, 0x00, 0x00, 0x00 // 71279031231 };
A handle:
handle h; // decoded to 0xCAFEF00D
C++ representation:
vector<uint8_t> encoded_form{ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // envelope size 0x01, 0x00, // handle count }; vector<uint8_t> decoded_form{ 0x01, 0x00, 0x00, 0x00, // inline tag 0x0D, 0xF0, 0xFE, 0xCA, // inline data };
This RFC is a breaking wire format change. Both FIDL peers need to understand the new wire format — and communicate that understanding to its peer — for both parties to use the new format.
A soft transition is possible. Two approaches are:
uint32
reserved/flags field in the transactional message header. We can reserve 1 bit for the initiating peer to indicate that it understands the new wire format, and soft transition in stages:[WireFormat=EnvelopeV2]
attribute (or similar) that indicates that the message/interface should use the new wire format.[WireFormat]
attribute seems to align better with a wire format change, it should be easier to implement a WireFormat change on a struct, since the struct could be used in different interfaces, and bindings would need extra logic to determine the context for which the struct is used.[WireFormat]
attribute affect the wire format of the interface‘s method arguments only, without recursively affecting the argument’s structs.[WireFormat]
attribute, we can drop the old wire format, assume all structs & interfaces use the new wire format, and ignore the attribute.Both these soft transition approaches involve a lot of development time, testing time, and room for error. Implementing the code to do either approach correctly, executing on the plan, and following up successfully to remove old code is a large effort.
It is likely that we will have code to handle both the old & new wire format at the same time; otherwise, it would not be possible to progressively land CLs as we implement support for the new wire format. Given that the code to handle both wire formats will exist, we recommend prototyping whether a soft transition is feasible using either approach. If not, c'est la vie; hard transition it is.
For either a soft or hard transition, any instances in Fuchsia where FIDL messages are hand-rolled would need to also be upgraded to the new wire format.
We should also use this wire format change to fold in other changes that need to happen (e.g. a proposed ordinal size change).
Note that this is an easier transition than FIDL1 to FIDL2, which changed language bindings significantly. We do not propose calling this FIDL3 since there are no user-visible changes[4]
The proposed wire format change is API (source) compatible, with one exception: C bindings will be a breaking API change if we move the vector/string element count to be out-of-line. We can mitigate this by planning ahead and abstracting the current C bindings with macros or functions, before the new wire format lands.
The wire format change is ABI-incompatible, but it is possible to achieve ABI compatibility with existing code via the strategies outlined in the Implementation Strategy section.
This RFC significantly shrinks the size required for envelopes, which seems like it would be an overall significant net benefit. However, the overall performance implications are less clear. In favor of better performance:
However:
While this RFC makes recommendations, we are actively seeking input and consensus on the following decisions:
The authors took a lot of inspiration from existing uses of tagged pointers, which have a long history in dynamic and functional languages. In particular, the Objective-C 64-bit runtime makes heavy use of them for better performance (even going so far as using specialized 5/6-bit encodings for inline strings).
Since current 64-bit platforms tend to use 48 bits (or less) to encode a pointer, we considered stealing more bits from the decoded pointer with bit-shifting to attempt to encode an out-of-line object's size along in the pointer. However, some architectures are already expanding their physical address space past 48 bits (ARM64, x64-64 5-level paging), so stealing more pointer bits may not be very future-proof.
Envelopes enable optionality for all types; however, exposing this optionality to end-users can (and perhaps should) be done separately.
As of 1/28/19, there appears to be 37 uses of optional handles in the Fuchsia code base. This is a conservative number, as it does not count optional protocol handles, nor protocol request handles.
This only applies to envelopes in non-extensible containers, i.e. structs and static unions. Extensible containers must encode the recursive size since decoders may not know the type, and need to know how much data to ignore.
Except allowing optionality on more types, if we wish to do that simultaneously.