docs/contribute/governance/rfcs/0131_fidl_wire_format_principles.md - fuchsia - Git at Google

 <!-- mdformat off(templates not supported) -->
 {% set rfcid = "RFC-0131" %}
 {% include "docs/contribute/governance/rfcs/_common/_rfc_header.md" %}
 # {{ rfc.name }}: {{ rfc.title }}
 <!-- SET the `rfcid` VAR ABOVE. DO NOT EDIT ANYTHING ELSE ABOVE THIS LINE. -->

 <!-- mdformat on -->

 ## Summary

 We describe the current (as of Sept 2021) design principles underpinning the
 [FIDL wire format][wire-format].

 ## Motivation

 The FIDL wire format specifies how messages are to be encoded (and decoded), as
 well as the format for transport level metadata such as the the transactional
 message header. Implicit in the wire format specification are theoretical
 boundaries which optimal implementation of it may attain. Just like data
 structures imply certain big-O bounds on operations, so does the wire format.

 In Fuchsia, interprocess communication (at least control plane) is ubiquitously
 over FIDL or intended to be. As a result, the wire format has a significant
 impact on the overall target performance of the operating system. Similarly, the
 wire format has an important role as part of the many layered defense of privacy
 and security.

 In March 2017, the design for "FIDL 2.0" was being completed. FIDL 2.0 is a more
 static version of FIDL, compared with later developments. See also [RFC-0027:
 You only pay for what you use][RFC-0027] for additional historical context.

 The goals for the wire format specification were as follows[^1]:

 [^1]: Authored by Jeff Brown <jeffbrown@google.com>.

 * Efficiently transfer messages between processes.
 * General purpose, for use with device drivers, high-level services, and
   applications.
 * Optimized for Zircon IPC only; portability is not a goal. (This goal was since
   relaxed.)[^2]
 * Optimized for direct memory access; inter-machine transport is not a goal.
 * Optimized for 64-bit only; no accommodation for 32-bit environments.
 * Uses uncompressed native datatypes _with host-endianness_, first-fit packing of
   elements, and correct alignment to support in-place access of message
   contents.
 * Compatible with C structure in-memory layout (with suitable field ordering and
   packing annotations).
 * Structures are fixed size and inlined; variable-sized data is stored
   out-of-line.
 * Structures are not self-described; FIDL files describe their contents.
 * No versioning of structures, but interfaces can be extended with new methods
   for protocol evolution. (This goal was since relaxed.)[^2]
 * No offset calculations required, very little arithmetic which may overflow.
 * Support fast single-pass encoding and validation (as a combined operation).
 * Support fast single-pass decoding and validation (as a combined operation).

 [^2]: Some goals have since been relaxed (portability, no versioning of
     structures), or tightened ([endianness][RFC-0030]).

 While the the ongoing evolution of the wire format has followed very specific
 design principles, some outlined above, these were not necessarily written down
 along with rationale. This RFC is an attempt at clearly writing these design
 principles down.

 ## Design

 We describe the various design principles underpinning the FIDL wire format.

 ### Low level first {#low-level-first}

 > When faced with making a design tradeoff to support low level programming at
 > the expense of high level programming (or the reverse), we typically opt for
 > enabling low level programming.

 FIDL must satisfy the requirements of low level protocols in Fuchsia, sometimes
 used during the boot process when a `malloc` is not yet available for instance.
 The alternative, should FIDL not satisfy these requirements, is to manually
 design protocols. However, in high level programming, if FIDL is not able to
 satisfy the requirements, there are a lot of other options to choose from
 (Protobuf, Cap'n Proto, JSON, Yaml, and the like).

 ### Single pass, and no heap allocation {#single-pass-no-heap}

 > It must be possible to encode and decode in a single pass, without allocation
 > beyond stack space (i.e. no dynamic heap allocation).

 This principle somewhat follows being over specialized towards low level use
 cases, and ensuring that any software on the system can fully participate in the
 FIDL ecosystem.

 Because FIDL provides "decode + validate", the single pass requirement should be
 compared to similar systems offering both deserialization and validation, which
 is most often done in two passes (with validation occurring on the decoded
 form).

 A corollary of the no allocation requirement is that encoding and decoding is
 done in-place, i.e. with in-place modifications.

 ### As efficient as hand-rolled data structures

 > It must be possible to write an implementation of the wire format which is as
 > efficient as hand-rolled data structures.

 This is a specialization of the "you pay for what you use" principle, whereby
 the convenience and ergonomics that FIDL aims to provide must not be offered at
 the expense of performance. In practice, many implementations choose to be less
 efficient to provide additional ergonomics, but the wire format does not dictate
 this choice.

 ### Canonical representation

 > There must be a single unambiguous representation of a FIDL value, i.e. there
 > is one and only one encoded representation of a FIDL value, and one and only
 > one decoded representation of a FIDL value.

 By forcing a single representation, the wire format is naturally more strict,
 which means that implementations have to expect less variance in inputs and
 follow a more straight-line path. This helps ensure correctness, through
 reduction of surprises coming from data divergences. A canonical form makes it
 possible to check for equality of two values without the need to understand the
 schema, i.e. a `memcmp` suffices for value types (things are a little [more
 complicated for resource
 types](/docs/reference/fidl/language/bindings-spec.md#equality-comparison)).

 See also the [drawbacks of a canonical form](#drawback-canonical-representation).

 ### Specify every byte

 > When encoding or decoding, it must be possible to traverse every single byte
 > of a message in a [single pass and without any heap
 > allocation](#single-pass-no-heap).

 To ensure that no data leaks from one process to another unbeknownst to the
 sender, we both ensure that all bytes can be efficiently traversed, and that all
 bytes have a specified value (e.g. padding must be 0). As an example, this can
 help to ensure that no personally identifiable information (PII) is
 inadvertently shared across process boundaries, or help avoid leaking
 uninitialized memory that could contain pointer values, which could be used to
 defeat address space layout randomization (ASLR). Another example is considering
 "trailing junk" invalid since all data and handles must be accounted for.

 ### Validation everywhere

 > As part of our [defense in depth](/docs/concepts/principles/secure.md), we
 > want the FIDL wire format to enforce strict validation (e.g. bound checks,
 > strings are well-formed UTF-8 code unit sequences, handles are of the correct
 > type and rights) everywhere it is used.

 Strict validation is considered worthwhile
 in ensuring the security of the platform, and helps API authors state
 assumptions and invariants of a design onto the API schema. It is also our
 experience that absent of validation in lower layers, applications tend to
 validate invariants themselves, leading to code that is less clear, tends to
 be less efficient, and more prone to bugs.

 Since strict validation can be the source of high performance costs, and that
 FIDL is geared towards being used in [low-level](#low-level-first) layers, a
 corollary is that such validation must be done efficiently, and designed to fit
 in a [single pass](#single-pass-no-heap).

 ### No reflective functionality out of the box {#avoid-reflection}

 > Without explicit opt-in, a peer must not be allowed to perform reflection on a
 > protocol, be it exposed methods, or exposed types.

 For instance, if a peer calls the wrong FIDL method, the connection is closed,
 preventing any information to be extracted about the peer. It might seem
 convenient to build such functionality, but that may compromise privacy and be
 difficult to undo (users would start building load-bearing functionality off
 of this feature).

 Similarly, structures lacking a self-descriptive format are in line with this
 principle, and meant to avoid disclosing more than necessary in an ecosystem
 where interacting peers ought to distrust each other. (There are also
 significant performance gains with avoiding a self-descriptive format, which
 aligns with the low level first approach.)

 As we have changed the FIDL wire format to allow evolution, e.g.
 [tables][RFC-0047], we have had to navigate carefully the balance between
 forbidding reflection, and adding just enough to allow handling without a
 schema.

 ## Implementation

 Keep calm, and follow the principles. As seen in [RFC-0017].

 ## Performance

 Most guiding principles of the FIDL wire format are aimed at performance, and
 over specialize towards low level use cases. Performance is a central concern.

 ## Ergonomics

 No change to ergonomics.

 ## Backwards Compatibility

 Some of the principles stated here are in conflict with the primary goal of FIDL
 which is providing a foundation for stable ABI, e.g. implementing backwards
 compatible protocols is challenging in the absence of reflexive features. Among
 other things, the design of the FIDL wire format strikes a balance between
 performance (often a result of rigidity) and evolvability concerns (often a
 result of flexibility). Balancing these is where the fun lies.

 ## Security considerations

 The role of FIDL in the multi layered approach to security on Fuchsia is
 explained in this RFC.

 ## Privacy considerations

 The role of FIDL in the multi layered approach to privacy on Fuchsia is
 explained in this RFC.

 ## Testing

 No change to testing.

 ## Documentation

 Amend as needed:

 * [FIDL Overview](/docs/concepts/fidl/overview.md)
 * [FIDL design principles](/docs/contribute/contributing-to-fidl/design-principles.md)
 * [FIDL wire format][wire-format]

 ## Drawbacks, alternatives, and unknowns

 As described in the text.

 ### Drawbacks of a canonical form {#drawback-canonical-representation}

 Requiring a canonicalized form can constrain the problem of finding a good
 representation for data, to the point of discarding otherwise interesting or
 pursuable forms.

 When working on [sparser tables][RFC-0116], canonicalization was one of the
 toughest constraints to satisfy, and directly conflicted with the need for the
 format to be performant. For instance, we could have explored writing members in
 the order provided by the users, without needing a second pass which reorders
 those members to satisfy canonicalization requirements.

 ## Prior art and references

 As described in the text.

 <!-- link labels -->
 [RFC-0017]: 0017_folding_ftp_into_rfc.md
 [RFC-0027]: 0027_you_only_pay_what_you_use.md
 [RFC-0030]: 0030_fidl_is_little_endian.md
 [RFC-0047]: 0047_tables.md
 [RFC-0116]: 0116_fidl_sparser_tables.md
 [wire-format]: /docs/reference/fidl/language/wire-format/README.md
	<!-- mdformat off(templates not supported) -->
	{% set rfcid = "RFC-0131" %}
	{% include "docs/contribute/governance/rfcs/_common/_rfc_header.md" %}
	# {{ rfc.name }}: {{ rfc.title }}
	<!-- SET the `rfcid` VAR ABOVE. DO NOT EDIT ANYTHING ELSE ABOVE THIS LINE. -->

	<!-- mdformat on -->

	## Summary

	We describe the current (as of Sept 2021) design principles underpinning the
	[FIDL wire format][wire-format].

	## Motivation

	The FIDL wire format specifies how messages are to be encoded (and decoded), as
	well as the format for transport level metadata such as the the transactional
	message header. Implicit in the wire format specification are theoretical
	boundaries which optimal implementation of it may attain. Just like data
	structures imply certain big-O bounds on operations, so does the wire format.

	In Fuchsia, interprocess communication (at least control plane) is ubiquitously
	over FIDL or intended to be. As a result, the wire format has a significant
	impact on the overall target performance of the operating system. Similarly, the
	wire format has an important role as part of the many layered defense of privacy
	and security.

	In March 2017, the design for "FIDL 2.0" was being completed. FIDL 2.0 is a more
	static version of FIDL, compared with later developments. See also [RFC-0027:
	You only pay for what you use][RFC-0027] for additional historical context.

	The goals for the wire format specification were as follows[^1]:

	[^1]: Authored by Jeff Brown <jeffbrown@google.com>.

	* Efficiently transfer messages between processes.
	* General purpose, for use with device drivers, high-level services, and
	applications.
	* Optimized for Zircon IPC only; portability is not a goal. (This goal was since
	relaxed.)[^2]
	* Optimized for direct memory access; inter-machine transport is not a goal.
	* Optimized for 64-bit only; no accommodation for 32-bit environments.
	* Uses uncompressed native datatypes _with host-endianness_, first-fit packing of
	elements, and correct alignment to support in-place access of message
	contents.
	* Compatible with C structure in-memory layout (with suitable field ordering and
	packing annotations).
	* Structures are fixed size and inlined; variable-sized data is stored
	out-of-line.
	* Structures are not self-described; FIDL files describe their contents.
	* No versioning of structures, but interfaces can be extended with new methods
	for protocol evolution. (This goal was since relaxed.)[^2]
	* No offset calculations required, very little arithmetic which may overflow.
	* Support fast single-pass encoding and validation (as a combined operation).
	* Support fast single-pass decoding and validation (as a combined operation).

	[^2]: Some goals have since been relaxed (portability, no versioning of
	structures), or tightened ([endianness][RFC-0030]).

	While the the ongoing evolution of the wire format has followed very specific
	design principles, some outlined above, these were not necessarily written down
	along with rationale. This RFC is an attempt at clearly writing these design
	principles down.

	## Design

	We describe the various design principles underpinning the FIDL wire format.

	### Low level first {#low-level-first}

	> When faced with making a design tradeoff to support low level programming at
	> the expense of high level programming (or the reverse), we typically opt for
	> enabling low level programming.

	FIDL must satisfy the requirements of low level protocols in Fuchsia, sometimes
	used during the boot process when a `malloc` is not yet available for instance.
	The alternative, should FIDL not satisfy these requirements, is to manually
	design protocols. However, in high level programming, if FIDL is not able to
	satisfy the requirements, there are a lot of other options to choose from
	(Protobuf, Cap'n Proto, JSON, Yaml, and the like).

	### Single pass, and no heap allocation {#single-pass-no-heap}

	> It must be possible to encode and decode in a single pass, without allocation
	> beyond stack space (i.e. no dynamic heap allocation).

	This principle somewhat follows being over specialized towards low level use
	cases, and ensuring that any software on the system can fully participate in the
	FIDL ecosystem.

	Because FIDL provides "decode + validate", the single pass requirement should be
	compared to similar systems offering both deserialization and validation, which
	is most often done in two passes (with validation occurring on the decoded
	form).

	A corollary of the no allocation requirement is that encoding and decoding is
	done in-place, i.e. with in-place modifications.

	### As efficient as hand-rolled data structures

	> It must be possible to write an implementation of the wire format which is as
	> efficient as hand-rolled data structures.

	This is a specialization of the "you pay for what you use" principle, whereby
	the convenience and ergonomics that FIDL aims to provide must not be offered at
	the expense of performance. In practice, many implementations choose to be less
	efficient to provide additional ergonomics, but the wire format does not dictate
	this choice.

	### Canonical representation

	> There must be a single unambiguous representation of a FIDL value, i.e. there
	> is one and only one encoded representation of a FIDL value, and one and only
	> one decoded representation of a FIDL value.

	By forcing a single representation, the wire format is naturally more strict,
	which means that implementations have to expect less variance in inputs and
	follow a more straight-line path. This helps ensure correctness, through
	reduction of surprises coming from data divergences. A canonical form makes it
	possible to check for equality of two values without the need to understand the
	schema, i.e. a `memcmp` suffices for value types (things are a little [more
	complicated for resource
	types](/docs/reference/fidl/language/bindings-spec.md#equality-comparison)).

	See also the [drawbacks of a canonical form](#drawback-canonical-representation).

	### Specify every byte

	> When encoding or decoding, it must be possible to traverse every single byte
	> of a message in a [single pass and without any heap
	> allocation](#single-pass-no-heap).

	To ensure that no data leaks from one process to another unbeknownst to the
	sender, we both ensure that all bytes can be efficiently traversed, and that all
	bytes have a specified value (e.g. padding must be 0). As an example, this can
	help to ensure that no personally identifiable information (PII) is
	inadvertently shared across process boundaries, or help avoid leaking
	uninitialized memory that could contain pointer values, which could be used to
	defeat address space layout randomization (ASLR). Another example is considering
	"trailing junk" invalid since all data and handles must be accounted for.

	### Validation everywhere

	> As part of our [defense in depth](/docs/concepts/principles/secure.md), we
	> want the FIDL wire format to enforce strict validation (e.g. bound checks,
	> strings are well-formed UTF-8 code unit sequences, handles are of the correct
	> type and rights) everywhere it is used.

	Strict validation is considered worthwhile
	in ensuring the security of the platform, and helps API authors state
	assumptions and invariants of a design onto the API schema. It is also our
	experience that absent of validation in lower layers, applications tend to
	validate invariants themselves, leading to code that is less clear, tends to
	be less efficient, and more prone to bugs.

	Since strict validation can be the source of high performance costs, and that
	FIDL is geared towards being used in [low-level](#low-level-first) layers, a
	corollary is that such validation must be done efficiently, and designed to fit
	in a [single pass](#single-pass-no-heap).

	### No reflective functionality out of the box {#avoid-reflection}

	> Without explicit opt-in, a peer must not be allowed to perform reflection on a
	> protocol, be it exposed methods, or exposed types.

	For instance, if a peer calls the wrong FIDL method, the connection is closed,
	preventing any information to be extracted about the peer. It might seem
	convenient to build such functionality, but that may compromise privacy and be
	difficult to undo (users would start building load-bearing functionality off
	of this feature).

	Similarly, structures lacking a self-descriptive format are in line with this
	principle, and meant to avoid disclosing more than necessary in an ecosystem
	where interacting peers ought to distrust each other. (There are also
	significant performance gains with avoiding a self-descriptive format, which
	aligns with the low level first approach.)

	As we have changed the FIDL wire format to allow evolution, e.g.
	[tables][RFC-0047], we have had to navigate carefully the balance between
	forbidding reflection, and adding just enough to allow handling without a
	schema.

	## Implementation

	Keep calm, and follow the principles. As seen in [RFC-0017].

	## Performance

	Most guiding principles of the FIDL wire format are aimed at performance, and
	over specialize towards low level use cases. Performance is a central concern.

	## Ergonomics

	No change to ergonomics.

	## Backwards Compatibility

	Some of the principles stated here are in conflict with the primary goal of FIDL
	which is providing a foundation for stable ABI, e.g. implementing backwards
	compatible protocols is challenging in the absence of reflexive features. Among
	other things, the design of the FIDL wire format strikes a balance between
	performance (often a result of rigidity) and evolvability concerns (often a
	result of flexibility). Balancing these is where the fun lies.

	## Security considerations

	The role of FIDL in the multi layered approach to security on Fuchsia is
	explained in this RFC.

	## Privacy considerations

	The role of FIDL in the multi layered approach to privacy on Fuchsia is
	explained in this RFC.

	## Testing

	No change to testing.

	## Documentation

	Amend as needed:

	* [FIDL Overview](/docs/concepts/fidl/overview.md)
	* [FIDL design principles](/docs/contribute/contributing-to-fidl/design-principles.md)
	* [FIDL wire format][wire-format]

	## Drawbacks, alternatives, and unknowns

	As described in the text.

	### Drawbacks of a canonical form {#drawback-canonical-representation}

	Requiring a canonicalized form can constrain the problem of finding a good
	representation for data, to the point of discarding otherwise interesting or
	pursuable forms.

	When working on [sparser tables][RFC-0116], canonicalization was one of the
	toughest constraints to satisfy, and directly conflicted with the need for the
	format to be performant. For instance, we could have explored writing members in
	the order provided by the users, without needing a second pass which reorders
	those members to satisfy canonicalization requirements.

	## Prior art and references

	As described in the text.

	<!-- link labels -->
	[RFC-0017]: 0017_folding_ftp_into_rfc.md
	[RFC-0027]: 0027_you_only_pay_what_you_use.md
	[RFC-0030]: 0030_fidl_is_little_endian.md
	[RFC-0047]: 0047_tables.md
	[RFC-0116]: 0116_fidl_sparser_tables.md
	[wire-format]: /docs/reference/fidl/language/wire-format/README.md