| {% set rfcid = "RFC-0034" %} |
| {% include "docs/contribute/governance/rfcs/_common/_rfc_header.md" %} |
| # {{ rfc.name }}: {{ rfc.title }} |
| <!-- SET the `rfcid` VAR ABOVE. DO NOT EDIT ANYTHING ELSE ABOVE THIS LINE. --> |
| |
| Note: Formerly known as [FTP](../deprecated-ftp-process.md)-034. |
| |
| ## Rejection rationale |
| |
| ## Summary |
| |
| We propose: |
| |
| 1. Strings in FIDL will be null terminated (all the while retaining the size of |
| the string as is the case today); |
| |
| 1. Change the C bindings to use `uint8_t*` for string data. |
| |
| |
| ## Motivation |
| |
| The current FIDL string encoding makes it easy to accidentally write insecure |
| code. The C and low-level C++ bindings are used in some of the most privileged |
| code in the Fuchsia system so extra weight should be given to risks exposed at |
| those layers. |
| |
| We have accidentally come across [such |
| bugs](https://fuchsia-review.googlesource.com/c/fuchsia/+/251775), but a |
| systematic audit of code in the Fuchsia tree is difficult and won't prevent them |
| reoccurring in our tree or in third-party code. |
| |
| ## Design |
| |
| This proposes to change the encoding of FIDL strings to include a single |
| terminator byte with value zero. This does not make FIDL strings compatible with |
| C strings but is a FIDL string is used as a C string it will be interpreted as |
| shorter than intended rather than longer than intended, which is safer. |
| |
| ### Wire Format |
| |
| The new |
| [definition](/docs/reference/fidl/language/wire-format/README.md#Strings) of |
| strings for wire encoding would be (changes highlighted): |
| |
| |
| * Variable-length sequence of UTF-8 encoded characters representing text. |
| * Nullable; null strings and empty strings are distinct. |
| * Can specify a maximum size, e.g. `string:40` for a maximum 40 byte string |
| (not including null-terminator). |
| * String content has a null-terminator. |
| * Encoders and decoders MUST validate that there is a null byte after the last |
| byte of the string, as indicated by the length. |
| * Stored as a 16 byte record consisting of: |
| * `size`: 64-bit unsigned number of code units (bytes) excluding |
| null-terminator |
| * `data`: 64-bit presence indication or pointer to out-of-line string data |
| * When encoded for transfer, `data` indicates presence of content: |
| * `0`: string is null |
| * `UINTPTR_MAX`: string is non-null, data is the next out-of-line object |
| * When decoded for consumption, `data` is a pointer to content: |
| * `0`: string is null |
| * `<valid pointer>`: string is non-null, `data` is at indicated memory |
| address |
| |
| Strings are denoted as follows: |
| |
| * `string`: non-nullable string (validation error occurs if null data is |
| encountered) |
| * `string?`: nullable string |
| * `string:N`, `string:N?`: string with maximum length of **N** code units |
| |
| This will constitute a breaking wire format change, specifically a string whose |
| length is divisible by 8 will be 8 bytes longer (it'll align to 8 bytes, and be |
| null terminated, causing another 7 bytes to be added as padding to align to 8 |
| again). |
| |
| Encoders will need to be updated to add null termination to encoded strings and |
| to validate that there are no null characters within string content. Decoders |
| will need to be updated to check that there are no null characters within string |
| content but that there is a null terminator where the length indicates. |
| |
| ### C Bindings |
| |
| Currently the C bindings represent strings as a `char*` and a `size_t`. If that |
| `char*` is passed to a function that expects a C string it may be interpreted |
| incorrectly. The bindings will be changed to use `uint8_t*` instead so that |
| passing a string data pointer to `strchr()` or `printf("%s")` will fail to |
| compile. |
| |
| ## Implementation Strategy |
| |
| Note: this section lists paths to source code that have changed since this |
| document was written. The references were correct at the time of writing. |
| |
| This is a breaking wire-format change. Its deployment will need to be carefully |
| coordinated across all uses of FIDL. |
| |
| Behind some build-time flag(s) the following code will need to be updated: |
| |
| * `//zircon/system/ulib/fidl/walker.h` (to validate strings correctly) |
| * `//zircon/system/host/fidl/lib/flat_ast.cpp` (update `StringType::Shape`) |
| * `//zircon/system/host/fidl/lib/c_generator.cpp` (update |
| `EmitLinerarizeMessage`, `ProduceInterfaceClientImplementation`, etc) |
| * `//sdk/lib/fidl/cpp/string.cc` |
| * `//garnet/public/lib/fidl/rust/fidl/src/encoding.rs` |
| * `//third_party/go/src/syscall/zx/fidl/encoding.go` (update `marshalString`, |
| `unmarshalString`) |
| * `//third_party/go/src/syscall/zx/fidl/encoding_new.go` (update `mString`) |
| * `//sdk/dart/fidl/lib/src/types.dart` (update `StringType`) |
| * llcpp bindings |
| * out-of-tree bindings for other languages |
| |
| These should be tested with the `fidl_compatibility_test`, running tests and |
| confirming expected system stability. |
| |
| Actually landing the change will need to be coordinated with the release team |
| and external teams like Chromium. |
| |
| ## Ergonomics |
| |
| This makes the C and low-level C++ bindings much easier to use correctly and has |
| no impact on other bindings. |
| |
| ## Documentation and Examples |
| |
| The wire format documentation should be updated (as outlined above). |
| |
| ## Backwards Compatibility |
| |
| This change is API compatible but ABI incompatible. |
| |
| ## Performance |
| |
| This will have no impact on build performance but will have the following minor |
| performance impacts: |
| |
| * Each string will take on average one extra byte to transmit |
| |
| ## Security |
| |
| This change is specifically intended to fix potential security holes for |
| memory-unsafe languages using FIDL. |
| |
| ## Testing |
| |
| This will be tested using the compatibility test suite. That suite should be |
| extended to ensure that edge cases like 7, 8 and 9 byte long strings are handled |
| correctly. |
| |
| ## Drawbacks, Alternatives, and Unknowns |
| |
| This will increase the size of FIDL messages by on average one byte per string. |
| For strings whose length is not divisible by 8 there will be no change. For |
| strings whose length is divisible by 8 the length will be increased by 8 bytes. |
| |
| ### Alternatives |
| |
| #### Do Nothing |
| |
| We could keep everything as it is and rely on code review and documentation to |
| ensure people don't write wrong code. The issue will also be somewhat mitigated |
| by the adoption of the new low-level-C++ bindings. The potential dangers of the |
| class of bugs caused by this subtle issue too serious to just leave things as |
| they are. |
| |
| #### Null terminate but ban embedded null bytes / use [modified UTF-8](https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8) |
| |
| (This was the original proposal) |
| |
| `'\0'` is a valid UTF-8 character and exists in UTF-8 strings that FIDL users |
| want to exchange. Banning null bytes would make FIDL inappropriate for many uses |
| and using modified UTF-8 would impose additional overhead to convert normal |
| UTF-8 with possibly null bytes to modified UTF-8. |
| |
| |
| #### Just use `uint8_t` instead of `char` |
| |
| If FIDL string data pointers were `uint8_t` instead of `char` then they couldn't |
| be passed to standard string functions or `printf` without casting. This would |
| help reveal this class of bug but wouldn't prevent them. |
| |
| |
| #### Fill padding with valid ASCII in debug builds |
| |
| Seven out of eight strings will be followed by zero padding bytes. In debug |
| builds we could change those zeros to valid ASCII characters so that when |
| printed or parsed these bugs show up. It'll be very easy for these to fall |
| through the cracks. |
| |
| |
| #### Always move strings out of line |
| |
| The C/C++ bindings could allocate space for strings and null terminators and |
| copy and terminate all strings that are received. This would impose an |
| unacceptable performance cost on the C and low-level C++ bindings. |
| |
| |
| #### Move strings out of line for ASan builds |
| |
| The address sanitizer will check that code doesn't overrun heap allocations. For |
| ASan builds we could allocate space for strings on the heap, copy the bytes |
| there and return that pointer to the caller. This is non-trivial to integrate |
| into the C bindings, could mean bug-hiding significant differences in behavior |
| between ASan and release builds and wouldn't help developers working outside the |
| Fuchsia build system. |
| |
| |
| #### Stop using C and C++ |
| |
| If Google can't write safe C++ in 2019, C++ is not safe for use. Unfortunately C |
| and C++ are the languages of choice for many device driver authors and many |
| vendors may choose to port their existing C/C++ drivers to our platform. |
| |
| |
| #### Don't expose raw characters to C |
| |
| We could expose an opaque structure to C and require any string access in C to |
| copy strings out of the decoded message buffer. |
| |
| |
| ## Prior Art and References |
| |
| [DBus](https://dbus.freedesktop.org/doc/dbus-specification.html#idm636), [Capt'n |
| Proto](https://capnproto.org/encoding.html#blobs) and CORBA[^1] send length and |
| null terminate, protobuf does not null terminate. |
| |
| <!-- footnotes --> |
| |
| [^1]: [CORBA Specification 3.3, |
| part 2](https://www.omg.org/spec/CORBA/3.3/Interoperability/PDF), |
| section 9.3.2.7. |