{% set rfcid = “RFC-0029” %} {% include “docs/contribute/governance/rfcs/_common/_rfc_header.md” %}
Note: Formerly known as FTP-029.
We propose:
0xFFFFFFFFFFFFFFFF
(from 0xFFFFFFFF
);[FragileBase]
attribute.This RFC was superseded by:
We've looked for some time at the issue of breakage at a distance, which the composition model introduces.
In short, when a Child
protocol composes a Parent
protocol, it is possible for the Parent
to introduce a method after the fact that clashes with a method defined in the Child
. This change is likely made in the Parent
without awareness of the breakage caused “at a distance” in the Child
protocol.
With the ordinal hashing scheme we have today, there is a roughly 2 in 10,000 chances to incur a collision for protocols on the order of 1,000 methods. This is quite likely given the large number of protocols (and expected protocols) to be built and expressed in FIDL. We've resorted to a temporary annotation [FragileBase]
to indicate this problem, and make developers aware of this issue.
By increasing the number of bits dedicated to a method ordinal, we can reduce the probability of collision to a level satisfactory to consider this issue of breakage at a distance nonexistent (in practical terms).
The generalized birthday problem well describes the probability of collisions. We have a D = 2^k
possibilities (“days”) and are looking for the chances that amongst N methods (“people”), two of these methods happen to collide (“same birthday”).
If we consider a threshold of 1 in a million chances as being the largest collision probability we are willing to tolerate, we end up with a maximum number of methods of:
Given the above, 47 bits or above are reasonable choices. We choose 63 bits for a few additional reasons:
We add a high bit for reserved ordinals hence the resulting size for ordinals being 64 bits.
We considered limiting to 52 bits (53 bits total with control messages) because it is the largest positive integer representable in IEEE754, which makes it advantageous to represent ordinals in languages that only support doubles (e.g., JavaScript). However, those languages would still need to manipulate ordinals to place them on the wire, and therefore would need a further mechanism to access the individual bytes composing doubles.
The hash calculation introduced in RFC-0020: Interface Ordinal Hashing is slightly altered. It should both produce a 64 bit number, and the string over which the hash is calculated is <library>/<top level declaration>.<member>
(per RFC-0043: Documentation Comment Format).
The hashed ordinal is derived by a SHA-256 hash of:
0x2f
0x2e
For example, the following FIDL declaration:
library foo; protocol Science { Hypothesize(); Investigate(); Explode(); Reproduce(); };
Will have the following byte patterns used to calculate the ordinal hash:
foo/Science.Hypothesize foo/Science.Investigate foo/Science.Explode foo/Science.Reproduce
Once the SHA-256 hash is computed, the upper 63 bits of the SHA-256 hash are extracted. We extract only 63 bits, since the 64th bit is reserved for system usage.
In pseudo-code:
full_hash = sha256(library_name + "/" + protocol_name + "." + method_name) ordinal = full_hash[0] | full_hash[1] << 8 | full_hash[2] << 16 | full_hash[3] << 24 | full_hash[4] << 32 | full_hash[5] << 40 | full_hash[6] << 48 | full_hash[7] << 56; ordinal &= 0x7fff ffff ffff ffff; // i.e., 63 ones
The transactional message header consists of four 4 bytes fields today: transaction id, reserved, flags, and the current ordinal. The reserved field is used by the epitaph's error code. We therefore propose to increase the ordinal fields from 4 bytes to 8 bytes, by using the flags field (unused today).
The new definition for the header is:
zx_txid_t txid
, transaction ID (32 bits)txid
s with the high bit set are reserved for use by zx_channel_call()txid
s with the high bit unset are reserved for use by userspacetxid
allocationuint32 reserved0
, reserved for future use.uint64 ordinal
0xfffffffffffff
are reserved.No change, and given the choice of max 52 bits for developer defined ordinals, this will be parsable by standard JSON parsers into a 64 bit floating point number without loss of precision.
We have ordinals in tables, and extensible unions. We are not proposing changing those: in both these cases, there is no breakage at a distance scenario today (e.g., no extensible union composes other extensible unions).
Similar strategy to what has been followed for explicit-to-hashed ordinals.
No change.
Need to modify wire format.
Not backwards compatible.
No impact expected. Method and event dispatch must now be done on a 64 bit integer (vs a 32 bit integer), and this is expected to make no difference.
No impact on security. See Alternative: Identifying Protocol and Method for a security motivated use case whose performance could be improved with another scheme.
Unit testing for ordinal calculation. Follows similar pattern to RFC-0020: Interface Ordinal Hashing.
We envision sandboxing services that would shield another service from unauthorized use, i.e., a sort of firewall for FIDL messages. In building such a sandboxing service, it would be useful to efficiently reference a set of messages (“the allowed messages”). One could imagine defining this set of messages using protocols.
In this scenario, having two identifiers, one for the protocol, and one for the method, would be useful (as opposed to the proposed scheme above, which only provides one identifier).
Let's consider this alternative, where we have:
Hence, the total size of the ordinal would be:
Since we need to reserve 1 bit for system ordinals.
For instance, with the example library:
library alternative; protocol Parent { Get(); // alternative/Parent.Get } protocol Child { compose Parent; Get(); // alternative/Child.Get }
Both “Get” methods would have the same M bits (it's the hash of “Get”). However, the P bits would differ; one would be the hash of alternative/Parent
, whereas the other would be the hash of alternative/Child
.
From a feasibility standpoint, using a similar numerical approach to the above, we have:
As a result, we do not consider this alternative to be feasible.
Additionally, considering the sandboxing use case further, matching against one protocol identifier is insufficient due to protocol composition (methods could come from multiple source protocols). Hence, while an optimized path may benefit from a single identifier, the general case requires a lookup through some data structure to make this efficient.
size | num_methods | r | p(collision) -----+-------------+---+------------- 31 | 1000 | | 0.0002325707643 39 | 1000 | x | 0.0000009085847943 47 | 1000 | x | 0.000000003549160959 52 | 1000 | x | 0.0000000001109112802 63 | 1000 | x | 0.00000000000005415589852 31 | 10000 | | 0.02301183054 39 | 10000 | | 0.00009093624028 47 | 10000 | x | 0.0000003552357776 52 | 10000 | x | 0.00000001110111996 63 | 10000 | x | 0.000000000005420468761 31 | 50000 | | 0.4412566126 39 | 50000 | | 0.002271108402 47 | 50000 | | 0.00000888156712 52 | 50000 | x | 0.0000002775501665 63 | 50000 | x | 0.000000000135522561 31 | 100000 | | 0.9025370676 39 | 100000 | | 0.009053622963 47 | 100000 | | 0.00003552615045 52 | 100000 | | 0.000001110211306 63 | 100000 | x | 0.0000000005420956651 31 | 1000000 | | 1.0 39 | 1000000 | | 0.5972719635 47 | 1000000 | | 0.003546406718 52 | 1000000 | | 0.0001110160287 63 | 1000000 | x | 0.00000005421005294 size: max. num_methods 31: 66 39: 1049 47: 16777 52: 94906 63: 4294968
Using http://mpmath.org to calculate the various probabilities.
from mpmath import mp mp.dps = 50 mp.pretty = True # Given n random integers drawn from a discrete uniform distribution with # range [1,d], what is the probability p(n; d) that at least two numbers are # the same? def p_collision(n, d): # 1 - ((d - 1) / d) ^ (n * (n - 1) / 2) base = mp.fdiv(mp.fsub(d, 1), d) expo = mp.fdiv(mp.fmul(n, mp.fsub(n, 1)), 2) return mp.fsub(1, mp.power(base, expo)) print("size | num_methods | r | p(collision)") for num_methods in [1000, 10000, 50000, 100000, 1000000]: print("-----+-------------+---+-------------") for size in [31, 39, 47, 52, 63]: p = p_collision(num_methods, mp.power(2, size)) # 1 in 1,000,000 result = " " if p < mp.fdiv(1, 1000000): result = "x" print("%4d | %11d | %s | %s" % (size, num_methods, result, mp.nstr(p, 10, min_fixed = -mp.inf))) def find_max_num_methods(size): low = 1 target = 1 upper = 10000000 while True: p = p_collision(target, mp.power(2, size)) if p < mp.fdiv(1, 1000000): low = target else: upper = target target = ((upper - low) / 2) + low if upper - low < 2: return low print("size: max. num_methods") for size in [31, 39, 47, 52, 63]: print("%d: %s" % (size, find_max_num_methods(size)))