Interface Ordinal Hashing
“60% of the time, it's the answer to an interview question”
Field | Value |
---|---|
Status | Accepted |
Authors | {apang,ianloic}@google.com |
Submitted | 2018-10-26 |
Reviewed | 2018-11-29 |
We propose removing the programmer's ability to manually specify the ordinal for interface methods [1]. Instead, the compiler generates the ordinal based on a hash of the fully-qualified method name, i.e. the library name, interface name & method name. Method renames will be ABI-compatible via a new Selector
attribute (see below).
We specifically restrict this FTP to propose ordinal hashing for interfaces only; not enums, tables nor extensible unions. We believe the use-cases for those structures are different enough that they need further investigation and a different FTP.
Currently, a FIDL author would write:
library foo; interface Science { 1: Hypothesize(); 2: Investigate(); 3: Explode(); 4: Reproduce(); };
This FTP would enable the ordinal indexes to be dropped:
interface Science { Hypothesize(); // look, no ordinals! Investigate(); Explode(); Reproduce(); };
Under-the-hood, the compiler effectively generates ordinals that look like this:
interface Science { // ordinal = SHA-256 of the fully-qualified method name, // i.e. "foo.Science/MethodName", truncated to 32 bits 0xf0b6ede8: Hypothesize(); 0x1c50e6df: Investigate(); 0xff408f25: Explode(); 0x0c2a400e: Reproduce(); };
OrdinalRange
attribute so that interface inheritance could be more predictable; it was rejected.FragileBase
[2] is the current stop-gap solution, but doesn‘t solve the core problem of ensuring that ordinals don’t clash.The hashed ordinal is derived by a SHA-256 hash of:
library name (encoded as UTF-8; no trailing \0) ".", ASCII 0x2e interface name (encoded as UTF-8; no trailing \0) "/", ASCII 0x2f method name (encoded as UTF-8; no trailing \0)
For example, the following FIDL declaration:
library foo; interface Science { Hypothesize(); Investigate(); Explode(); Reproduce(); };
will have the following byte patterns used to calculate the ordinal hash:
foo.Science/Hypothesize foo.Science/Investigate foo.Science/Explode foo.Science/Reproduce
The .
and /
separators are used since fidlc
already outputs fully-qualified method names in this format (c.f. fidlc
's NameName() method).
Once the SHA-256 hash is computed:
echo -n foo.Science.Hypothesize | shasum -a 256 | head -c8
)In pseudo-code:
full_hash = sha256(library_name + "." + interface_name + "/" + method_name) ordinal = full_hash[0] | full_hash[1] << 8 | full_hash[2] << 16 | full_hash[3] << 24; ordinal &= 0x7fffffff;
We define a Selector
attribute that will be used by the compiler to compute the hashed ordinal instead of using the method name. If a method name does not have the Selector
attribute, the method name will be used as the Selector
. (The interface and library names are still used in the hash computation.)
Selector
can be used to rename a method without breaking ABI compatibility, which was one advantage of manually-specified ordinals. For example, if we wish to rename the Investigate
method to Experiment
in the Science
interface, we can write:
interface Science { [Selector="Investigate"] Experiment(); };
We allow the Selector
attribute on methods only. Renaming libraries is considered rare, and preserving ABI-compatibility in this situation is not a high priority. Similarly for renaming an interface. Additionally, the interaction of a renamed interface with a Discoverable
attribute would be confusing: which name is the discoverable one?
If a hashed ordinal results in a clash or conflict with another hashed ordinal in the same interface, the compiler will emit an error, and rely on a human to specify a Selector
attribute to resolve the conflict [3].
For example, if the method name Hypothesize
conflicts with the method name Investigate
, we could add Selector
to Hypothesize
to avoid the conflict:
interface Science { [Selector="Hypothesize_"] Hypothesize(); Investigate(); // should no longer conflict with Hypothesize_ };
We will update the FIDL readability rubric to recommend appending “_” to the method name for the Selector
to resolve clashes. fidlc
will also suggest this fix.
Note that ordinals are only required to be unique per-interface, similarly to manually-specified ordinals. If we wish ordinals to be unique across all interfaces, that should be proposed in another FTP.
Back-of-the-envelope calculations show that, with 31 bits and 100 methods on an interface, the chance of collision is .0003%, so we expect hash collisions to be extremely rare.
There were other suggestions for Selector
:
WireName
(abarth
)OriginalName
(ctiller
)Salt
(abarth
; slightly different since it suggested adding a compiler-specified salt instead of an alternate name)OrdinalName
We chose Selector
since we believe it more closely reflects the intent of the attribute than either WireName or OriginalName.
We chose to have the programmer specify the ordinal name, rather than the ordinal index, for several reasons:
Zero is an invalid ordinal. If a method name hashes to zero, the compiler will treat it as a hash conflict and require the user to specify a Selector
that does not hash to zero.
We considered having fidlc
automatically re-hash the name by deterministically transforming it, but felt that:
and that therefore this approach didn't warrant complicating both the ergonomics and the compiler implementation.
This FTP also covers events, which are considered a subset of methods by the FIDL language docs [4].
We believe that only fidlc
needs to be modified to support ordinal hashing; code generation back-ends do not need to be modified. This is because fidlc
computes the ordinals and emits them in the JSON IR to back-ends.
Bindings do not need to change.
We intend to implement this in distinct phases:
Selector
attribute.The above is a soft transition; changing fidlc
to use the hashed ordinals (step 4b) should not break the rollers, since rollers build from a single version of the entire tree.
In jeremymanson@google.com's implementation of this FTP, he chose to prefer a manually-specified ordinal over a hashed ordinal, which diverges from step 4b above. This keeps all existing interfaces that use manually-specified ordinals ABI-compatible, and only uses hashed ordinals when ordinals aren't specified.
Advantages:
Disadvantages:
Selector
, that serves two purposes: renaming and conflict resolution.Note that the authors originated this FTP largely for ergonomics reasons.
We expect to make changes to the FIDL attributes, grammar, language and wire format docs. The API readability rubric doc should also be updated, as noted in the Selector
section.
fidlc
change is binary (either hashed xor manual ordinals will be used), andfidlc
is used to build the entire tree, soWe expect a negligible slowdown to fidlc
, as it now has to hash all method names in an interface to compute them.
We expect an insignificant runtime performance impact. Compilers may have generated jump tables for manually-specified ordinals that were previously small and contiguous, which will become binary searches through a sparse ordinal space when hashed ordinals are used. The same mechanism may also impact binary size in an insignificant fashion. (Table-driven dispatch will likely ameliorate both the size and speed concerns.)
We do not expect runtime security issues, since ordinal hashing has no runtime changes except changing the ordinal values sent over-the-wire.
The use of a cryptographic hash (SHA-256) may lead some to believe the hash needs to be cryptographically strong; we do not believe there are security issues since:
Truncation of the SHA-256 hash may also concern some, but again, we do not believe there are security issues since the FIDL compiler statically checks for hash collisions [5].
ianloic@google.com has analyzed existing FIDL interfaces and determined that there are zero hash collisions.
We'll carefully consider how to test the case of an actual hash collision, since artificially generating hash collisions with a good hash is difficult (by design).
Otherwise, the typical battery of unit tests, CQ tests, compatibility tests and manual testing should suffice to ensure that ordinal hashing is robust.
This FTP intentionally only addresses ordinal hashing for interfaces. It does not propose changes to the manually-enumerated ordinals for enums, tables nor extensible unions.
Perfect hashing was suggested by jeffbrown@google.com, and was considered. The FTP authors are not very familiar with perfect hashing schemes, but believe that the addition of extra methods over time would change the hashes of existing methods and therefore break ABI compatibility, making perfect hashing unsuitable. Dynamic perfect hashing may be possible, but raises the same question of changing hashes, and is also less well-known and more complicated than standard hashing, which doesn't warrant further investigation.
Another approach to removing manual ordinals is to send the full method name across the wire, which is done in many (most?) other RPC systems (see [References] below). This has runtime performance implications that arguably conflict with FIDL's intended use-cases.
We considered being able to specify the hash used so it can be changed later, if SHA-256 ended up having problems that another hash would solve. This design is common in security applications, where a widely-used cryptographic hash will have vulnerabilities discovered later. However, specifying the hash would likely require changes to the wire format, and require all language bindings to implement code to select hash algorithms, significantly complicating both compiler and bindings code. We did not think that trade-off was worthwhile. We recognize that git
also took this attitude toward SHA-1, and is now somewhat back-tracking on the decision, but think we think our use case is different enough to justify hard-coding the hash algorithm.
ordinal
to selector
, which is an existing concept that serves the same purpose in other languages and component systems.0xFFFFxxxx
.FragileBase
.Selector
attributes on them.Interestingly, we do not know of any other method dispatch or RPC system that uses a hash of the method name to identify which method to call.
Most RPC systems call methods by name (e.g. gRPC/Protobuf service, Thrift, D-Bus). For in-process method calls, Objective-C uses a guaranteed-unique char* pointer value, called a selector, to identify the method that should be called on a class. The Objective-C runtime can map selectors to stringified method names and vice versa. For out-of-process method calls, Objective-C distributed objects uses the method name for invocation. COM directly uses the C++ vtable for in-process calls, and therefore depends on ABI and compiler support to support method dispatch. apang@google.com suggested ordinal hashing for tables in ctiller@google.com's Phickle proposal. ianloic@google.com and apang@google.com met on Thu 2018/10/18 to whiteboard this.
Mojo/FIDL1 also didn‘t require the programmer to specify ordinals; instead, they were sequentially generated (similarly to FlatBuffers’s implicit tag numbering for table fields).
Previously, you could create a FIDL interface that inherited from whichever other FIDL interface you liked. However, the interface and the superinterface share the same ordinal space, which means if you added a method to an interface you might break a subinterface in some other, far away library.
There are several proposals kicking around FIDL-land for resolving the inheritance / ordinal collision problem, but until we figured out how we want to solve this problem, we've switched the default for interfaces to forbid inheritance. An interface can still opt in to allowing subinterfaces using the [FragileBase]
attribute.
If you run into this issue, the compiler should print out an error message with a brief explanation. I (abarth@google.com) have added the [FragileBase]
attribute everywhere we use FIDL interface inheritance in the Platform Source Tree (hopefully!).
Please let me know if you have any questions or run into any trouble. --abarth@google.com
We do not believe that there'll be sufficient ordinal clashes to warrant any extra implementation and cognitive complexity added by automatic conflict resolution. We can revisit this decision without breaking backward-compatibility if data shows that ordinal clashing becomes problematic.
If only results are declared, the method is referred to as an event. It then defines an unsolicited message from the server.
jln@google.com writes, “Yes it‘s ok to truncate SHA-2 and no, it doesn’t matter where you truncate.”