| # [FTP](../README.md)-029: Increasing Method Ordinals |
| |
| Field | Value |
| ----------|-------------------------- |
| Status | Accepted |
| Authors | pascallouis@google.com |
| Submitted | 2019-02-14 |
| Reviewed | 2019-02-28 |
| |
| ## Summary |
| |
| We propose: |
| |
| 1. To **increase the size of method ordinals to 64 bits** (up from 32 bits), |
| while maintaining the range with the highest order bit set for Fuchsia |
| reserved usage; |
| 2. Update the **epitaph ordinal to `0xFFFFFFFFFFFFFFFF`** (from `0xFFFFFFFF`); |
| 3. Packing the increased ordinal size by **using the flags field of the |
| transactional header**; |
| 4. And **discontinue the use of the `[FragileBase]` attribute**. |
| |
| ## Motivation |
| |
| #### Combatting Breakage at a Distance |
| |
| We've looked for some time at the issue of breakage at a distance which the |
| composition model introduces. |
| |
| In short, when a `Child` protocol composes a `Parent` protocol, it is |
| possible for the `Parent` to introduce a method after the fact which clashes |
| with a method defined in the `Child`. |
| This change is likely made in the `Parent` without awareness of the breakage |
| caused "at a distance" in the `Child` protocol. |
| |
| With the ordinal hashing scheme we have today, there is a roughly 2 in 10,000 |
| chances to incur a collision for protocols on the order of 1,000 methods. |
| This is quite likely given the large number of protocols (and expected |
| protocols) to be built and expressed in FIDL. |
| We've resorted to a temporary annotation `[FragileBase]` to indicate this |
| problem, and make developers aware of this issue. |
| |
| By increasing the number of bits dedicated to a method ordinal, we can reduce |
| the probability of collision to a level satisfactory to consider this issue |
| of breakage at a distance nonexistent (in practical terms). |
| |
| #### Sizing of Method Ordinals |
| |
| The [generalized birthday problem][birthday] well describes the probability |
| of collisions. |
| We have a `D = 2^k` possibilities ("days") and are looking for the chances |
| that amongst **N** methods ("people"), two of these methods happen to collide |
| ("same birthday"). |
| |
| If we consider a threshold of 1 in a million chances as being the largest |
| collision probability we are willing to tolerate, we end up with a maximum number |
| of methods of: |
| |
| * 31 bits: ~60 |
| * 39 bits: ~1k |
| * 47 bits: ~16k |
| * 52 bits: ~95k |
| * 63 bits: ~4.3M |
| |
| Given the above, 47 bits or above are reasonable choices. |
| We choose 63 bits for a few additional reasons: |
| |
| * In practice, it's safe to have ordinals as numbers in JSON with standard |
| parsers (e.g., [in Go](https://play.golang.org/p/Rf523ZjyAK8), or in |
| Python). |
| (In a v2 of the JSON IR, we plan to wrap ordinals as strings.) |
| * With room left for control messages, there is room for other flags |
| should these need to be allocated later. |
| Today, the only control message which exists is the epitaph one. |
| |
| We add a high bit for reserved ordinals hence the resulting size for ordinals |
| being **64 bits**. |
| |
| We considered limiting to 52 bits (53 bits total with control messages) |
| because it is the largest positive integer representable in IEEE754, which |
| makes it advantageous to represent ordinals in languages which only support |
| doubles (e.g., JavaScript). |
| However, those languages would still need to manipulate ordinals to place |
| them on the wire, and therefore would need a further mechanism to access the |
| individual bytes composing doubles. |
| |
| ## Design |
| |
| #### Hash Calculation |
| |
| The hash calculation introduced in [FTP-020: Interface Ordinal |
| Hashing](ftp-020.md) is slightly altered. |
| It should both produce a 64 bit number, and the string over which the hash is |
| calculated is `<library>/<top level declaration>.<member>` (per |
| [FTP-043: Documentation Comment Format](ftp-043.md)). |
| |
| The hashed ordinal is derived by a SHA-256 hash of: |
| |
| * *library name* — encoded as UTF-8; no trailing \0 |
| * "/" — the forward slash character, ASCII `0x2f` |
| * *protocol name* — encoded as UTF-8; no trailing \0 |
| * "." — the period character, ASCII `0x2e` |
| * **method name** — encoded as UTF-8; no trailing \0 |
| |
| For example, the following FIDL declaration: |
| |
| ```fidl |
| library foo; |
| |
| protocol Science { |
| Hypothesize(); |
| Investigate(); |
| Explode(); |
| Reproduce(); |
| }; |
| ``` |
| |
| Will have the following byte patterns used to calculate the ordinal hash: |
| |
| ``` |
| foo/Science.Hypothesize |
| foo/Science.Investigate |
| foo/Science.Explode |
| foo/Science.Reproduce |
| ``` |
| |
| Once the SHA-256 hash is computed, the upper 63 bits of the SHA-256 hash are |
| extracted. |
| We extract only 63 bits, since the 64th bit is reserved for system usage. |
| |
| In pseudo-code: |
| |
| ``` |
| full_hash = sha256(library_name + "/" + |
| protocol_name + "." + |
| method_name) |
| ordinal = full_hash[0] | |
| full_hash[1] << 8 | |
| full_hash[2] << 16 | |
| full_hash[3] << 24 | |
| full_hash[4] << 32 | |
| full_hash[5] << 40 | |
| full_hash[6] << 48 | |
| full_hash[7] << 56; |
| ordinal &= 0x7fff ffff ffff ffff; // i.e., 63 ones |
| ``` |
| |
| #### Packing Hash in Header |
| |
| The transactional message header consists of four 4 bytes fields today: |
| transaction id, reserved, flags, and the current ordinal. |
| The reserved field is used by the epitaph's error code. |
| We therefore propose to increase the ordinal fields from 4 bytes to 8 bytes, |
| by using the flags field (unused today). |
| |
| The new definition for the header is: |
| |
| * `zx_txid_t txid`, transaction ID (32 bits) |
| * `txid`s with the high bit set are reserved for use by [**zx_channel_call()**][channel_call] |
| * `txid`s with the high bit unset are reserved for use by userspace |
| * See [**zx_channel_call()**][channel_call] for more details on `txid` allocation |
| * `uint32 reserved0`, reserved for future use. |
| * `uint64 ordinal` |
| * The zero ordinal is invalid. |
| * Ordinals above `0xfffffffffffff` are reserved. |
| |
| #### JSON IR |
| |
| No change, and given the choice of max 52 bits for developer defined ordinals, |
| this will be parsable by standard JSON parsers into a 64 bit floating point |
| number without loss of precision. |
| |
| #### Other Ordinals |
| |
| We have ordinals in tables, and extensible unions. |
| We are not proposing changing those: in both these cases, there is no breakage |
| at a distance scenario today (e.g., no extensible union composes other |
| extensible unions). |
| |
| ## Implementation strategy |
| |
| Similar strategy to what has been followed for explicit-to-hashed ordinals. |
| |
| ## Ergonomics |
| |
| No change. |
| |
| ## Documentation and examples |
| |
| Need to modify wire format. |
| |
| ## Backwards compatibility |
| |
| Not backwards compatible. |
| |
| ## Performance |
| |
| No impact expected. |
| Method and event dispatch must now be done on a 64 bit integer (vs a 32 bit |
| integer), and this is expected to make no difference. |
| |
| ## Security |
| |
| No impact on security. |
| See [Alternative: Identifying Protocol and |
| Method](#alternate_identifiying-protocol-and-method) for a security |
| motivated use case whose performance could be improved with another scheme. |
| |
| ## Testing |
| |
| Unit testing for ordinal calculation. |
| Follows similar pattern to [FTP-020: Interface Ordinal Hashing](ftp-020.md). |
| |
| ## Alternative: Identifying Protocol and Method |
| |
| We envision sandboxing services which would shield another service from |
| unauthorized use, i.e., a sort of firewall for FIDL messages. |
| In building such a sandboxing service, it would be useful to efficiently |
| reference a set of messages ("the allowed messages"). |
| One could imagine defining this set of messages using protocols. |
| |
| In this scenario, having two identifiers, one for the protocol, and one for |
| the method, would be useful (as opposed to the proposed scheme above which |
| only provides one identifier). |
| |
| Let's consider this alternative, where we have: |
| |
| * **P** bits for hash of protocol name |
| * **M** bits for hash of method name |
| |
| Hence, the total size of the ordinal would be: |
| |
| * **P** + **M** + 1 |
| |
| Since we need to reserve 1 bit for system ordinals. |
| |
| For instance, with the example library: |
| |
| ```fidl |
| library alternative; |
| |
| protocol Parent { |
| Get(); // alternative/Parent.Get |
| } |
| |
| protocol Child { |
| compose Parent; |
| Get(); // alternative/Child.Get |
| } |
| ``` |
| |
| Both "Get" methods would have the same **M** bits (it's the hash of "Get"). |
| However, the **P** bits would differ; one would be the hash of |
| `alternative/Parent`, whereas the other would be the hash of |
| `alternative/Child`. |
| |
| From a feasibility standpoint, using a similar numerical approach to the |
| above, we have: |
| |
| * How many protocols do we expect? On the order of 100k is reasonable. |
| * → would need **P** = 52 bits |
| * How many methods do we expect? On the order of 1k is reasonable. |
| * → would need **M** = 39 bits |
| * So this scheme would require **92 bits** |
| |
| As a result, we do not consider this alternative to be feasible. |
| |
| Additionally, considering the sandboxing use case further, matching against |
| one protocol identifier is insufficient due to protocol composition (methods |
| could come from multiple source protocols). |
| Hence, while an optimized path may benefit from a single identifier, the |
| general case requires a lookup through some data structure to make this |
| efficient. |
| |
| ## Addendum: Collision Probabilities |
| |
| #### Results |
| |
| ``` |
| size | num_methods | r | p(collision) |
| -----+-------------+---+------------- |
| 31 | 1000 | | 0.0002325707643 |
| 39 | 1000 | x | 0.0000009085847943 |
| 47 | 1000 | x | 0.000000003549160959 |
| 52 | 1000 | x | 0.0000000001109112802 |
| 63 | 1000 | x | 0.00000000000005415589852 |
| 31 | 10000 | | 0.02301183054 |
| 39 | 10000 | | 0.00009093624028 |
| 47 | 10000 | x | 0.0000003552357776 |
| 52 | 10000 | x | 0.00000001110111996 |
| 63 | 10000 | x | 0.000000000005420468761 |
| 31 | 50000 | | 0.4412566126 |
| 39 | 50000 | | 0.002271108402 |
| 47 | 50000 | | 0.00000888156712 |
| 52 | 50000 | x | 0.0000002775501665 |
| 63 | 50000 | x | 0.000000000135522561 |
| 31 | 100000 | | 0.9025370676 |
| 39 | 100000 | | 0.009053622963 |
| 47 | 100000 | | 0.00003552615045 |
| 52 | 100000 | | 0.000001110211306 |
| 63 | 100000 | x | 0.0000000005420956651 |
| 31 | 1000000 | | 1.0 |
| 39 | 1000000 | | 0.5972719635 |
| 47 | 1000000 | | 0.003546406718 |
| 52 | 1000000 | | 0.0001110160287 |
| 63 | 1000000 | x | 0.00000005421005294 |
| |
| size: max. num_methods |
| 31: 66 |
| 39: 1049 |
| 47: 16777 |
| 52: 94906 |
| 63: 4294968 |
| ``` |
| |
| #### Code |
| |
| Using [http://mpmath.org](http://mpmath.org) to calculate the various probabilities. |
| |
| ``` |
| from mpmath import mp |
| mp.dps = 50 |
| mp.pretty = True |
| |
| # Given n random integers drawn from a discrete uniform distribution with |
| # range [1,d], what is the probability p(n; d) that at least two numbers are |
| # the same? |
| def p_collision(n, d): |
| # 1 - ((d - 1) / d) ^ (n * (n - 1) / 2) |
| base = mp.fdiv(mp.fsub(d, 1), d) |
| expo = mp.fdiv(mp.fmul(n, mp.fsub(n, 1)), 2) |
| return mp.fsub(1, mp.power(base, expo)) |
| |
| print("size | num_methods | r | p(collision)") |
| for num_methods in [1000, 10000, 50000, 100000, 1000000]: |
| print("-----+-------------+---+-------------") |
| for size in [31, 39, 47, 52, 63]: |
| p = p_collision(num_methods, mp.power(2, size)) |
| # 1 in 1,000,000 |
| result = " " |
| if p < mp.fdiv(1, 1000000): |
| result = "x" |
| print("%4d | %11d | %s | %s" % (size, num_methods, result, mp.nstr(p, 10, min_fixed = -mp.inf))) |
| |
| def find_max_num_methods(size): |
| low = 1 |
| target = 1 |
| upper = 10000000 |
| while True: |
| p = p_collision(target, mp.power(2, size)) |
| if p < mp.fdiv(1, 1000000): |
| low = target |
| else: |
| upper = target |
| target = ((upper - low) / 2) + low |
| if upper - low < 2: |
| return low |
| |
| print("size: max. num_methods") |
| for size in [31, 39, 47, 52, 63]: |
| print("%d: %s" % (size, find_max_num_methods(size))) |
| |
| ``` |
| |
| <!-- xrefs --> |
| [channel_call]: /docs/reference/syscalls/channel_call.md |
| [birthday]: https://en.wikipedia.org/wiki/Birthday_problem#Generalizations1 |