docs/contribute/governance/fidl/ftp/ftp-029.md - fuchsia - Git at Google

 # [FTP](../README.md)-029: Increasing Method Ordinals

 Field     | Value
 ----------|--------------------------
 Status    | Accepted
 Authors   | pascallouis@google.com
 Submitted | 2019-02-14
 Reviewed  | 2019-02-28

 ## Summary

 We propose:

 1. To **increase the size of method ordinals to 64 bits** (up from 32 bits),
    while maintaining the range with the highest order bit set for Fuchsia
    reserved usage;
 2. Update the **epitaph ordinal to `0xFFFFFFFFFFFFFFFF`** (from `0xFFFFFFFF`);
 3. Packing the increased ordinal size by **using the flags field of the
    transactional header**;
 4. And **discontinue the use of the `[FragileBase]` attribute**.

 ## Motivation

 #### Combatting Breakage at a Distance

 We've looked for some time at the issue of breakage at a distance which the
 composition model introduces.

 In short, when a `Child` protocol composes a `Parent` protocol, it is
 possible for the `Parent` to introduce a method after the fact which clashes
 with a method defined in the `Child`.
 This change is likely made in the `Parent` without awareness of the breakage
 caused "at a distance" in the `Child` protocol.

 With the ordinal hashing scheme we have today, there is a roughly 2 in 10,000
 chances to incur a collision for protocols on the order of 1,000 methods.
 This is quite likely given the large number of protocols (and expected
 protocols) to be built and expressed in FIDL.
 We've resorted to a temporary annotation `[FragileBase]` to indicate this
 problem, and make developers aware of this issue.

 By increasing the number of bits dedicated to a method ordinal, we can reduce
 the probability of collision to a level satisfactory to consider this issue
 of breakage at a distance nonexistent (in practical terms).

 #### Sizing of Method Ordinals

 The [generalized birthday problem][birthday] well describes the probability
 of collisions.
 We have a `D = 2^k` possibilities ("days") and are looking for the chances
 that amongst **N** methods ("people"), two of these methods happen to collide
 ("same birthday").

 If we consider a threshold of 1 in a million chances as being the largest
 collision probability we are willing to tolerate, we end up with a maximum number
 of methods of:

 *   31 bits: ~60
 *   39 bits: ~1k
 *   47 bits: ~16k
 *   52 bits: ~95k
 *   63 bits: ~4.3M

 Given the above, 47 bits or above are reasonable choices.
 We choose 63 bits for a few additional reasons:

 *   In practice, it's safe to have ordinals as numbers in JSON with standard
     parsers (e.g., [in Go](https://play.golang.org/p/Rf523ZjyAK8), or in
     Python).
     (In a v2 of the JSON IR, we plan to wrap ordinals as strings.)
 *   With room left for control messages, there is room for other flags
     should these need to be allocated later.
     Today, the only control message which exists is the epitaph one.

 We add a high bit for reserved ordinals hence the resulting size for ordinals
 being **64 bits**.

 We considered limiting to 52 bits (53 bits total with control messages)
 because it is the largest positive integer representable in IEEE754, which
 makes it advantageous to represent ordinals in languages which only support
 doubles (e.g., JavaScript).
 However, those languages would still need to manipulate ordinals to place
 them on the wire, and therefore would need a further mechanism to access the
 individual bytes composing doubles.

 ## Design

 #### Hash Calculation

 The hash calculation introduced in [FTP-020: Interface Ordinal
 Hashing](ftp-020.md) is slightly altered.
 It should both produce a 64 bit number, and the string over which the hash is
 calculated is `<library>/<top level declaration>.<member>` (per
 [FTP-043: Documentation Comment Format](ftp-043.md)).

 The hashed ordinal is derived by a SHA-256 hash of:

 * *library name* &mdash; encoded as UTF-8; no trailing \0
 * "/" &mdash; the forward slash character, ASCII `0x2f`
 * *protocol name* &mdash; encoded as UTF-8; no trailing \0
 * "." &mdash; the period character, ASCII `0x2e`
 * **method name** &mdash; encoded as UTF-8; no trailing \0

 For example, the following FIDL declaration:

 ```fidl
 library foo;

 protocol Science {
     Hypothesize();
     Investigate();
     Explode();
     Reproduce();
 };
 ```

 Will have the following byte patterns used to calculate the ordinal hash:

 ```
 foo/Science.Hypothesize
 foo/Science.Investigate
 foo/Science.Explode
 foo/Science.Reproduce
 ```

 Once the SHA-256 hash is computed, the upper 63 bits of the SHA-256 hash are
 extracted.
 We extract only 63 bits, since the 64th bit is reserved for system usage.

 In pseudo-code:

 ```
 full_hash = sha256(library_name + "/" +
                    protocol_name + "." +
                    method_name)
 ordinal = full_hash[0]       |
           full_hash[1] << 8  |
           full_hash[2] << 16 |
           full_hash[3] << 24 |
           full_hash[4] << 32 |
           full_hash[5] << 40 |
           full_hash[6] << 48 |
           full_hash[7] << 56;
 ordinal &= 0x7fff ffff ffff ffff; // i.e., 63 ones
 ```

 #### Packing Hash in Header

 The transactional message header consists of four 4 bytes fields today:
 transaction id, reserved, flags, and the current ordinal.
 The reserved field is used by the epitaph's error code.
 We therefore propose to increase the ordinal fields from 4 bytes to 8 bytes,
 by using the flags field (unused today).

 The new definition for the header is:

 *   `zx_txid_t txid`, transaction ID (32 bits)
     *   `txid`s with the high bit set are reserved for use by [**zx_channel_call()**][channel_call]
     *   `txid`s with the high bit unset are reserved for use by userspace
     *   See [**zx_channel_call()**][channel_call] for more details on `txid` allocation
 *   `uint32 reserved0`, reserved for future use.
 *   `uint64 ordinal`
     *   The zero ordinal is invalid.
     *   Ordinals above `0xfffffffffffff` are reserved.

 #### JSON IR

 No change, and given the choice of max 52 bits for developer defined ordinals,
 this will be parsable by standard JSON parsers into a 64 bit floating point
 number without loss of precision.

 #### Other Ordinals

 We have ordinals in tables, and extensible unions.
 We are not proposing changing those: in both these cases, there is no breakage
 at a distance scenario today (e.g., no extensible union composes other
 extensible unions).

 ## Implementation strategy

 Similar strategy to what has been followed for explicit-to-hashed ordinals.

 ## Ergonomics

 No change.

 ## Documentation and examples

 Need to modify wire format.

 ## Backwards compatibility

 Not backwards compatible.

 ## Performance

 No impact expected.
 Method and event dispatch must now be done on a 64 bit integer (vs a 32 bit
 integer), and this is expected to make no difference.

 ## Security

 No impact on security.
 See [Alternative: Identifying Protocol and
 Method](#alternate_identifiying-protocol-and-method) for a security
 motivated use case whose performance could be improved with another scheme.

 ## Testing

 Unit testing for ordinal calculation.
 Follows similar pattern to [FTP-020: Interface Ordinal Hashing](ftp-020.md).

 ## Alternative: Identifying Protocol and Method

 We envision sandboxing services which would shield another service from
 unauthorized use, i.e., a sort of firewall for FIDL messages.
 In building such a sandboxing service, it would be useful to efficiently
 reference a set of messages ("the allowed messages").
 One could imagine defining this set of messages using protocols.

 In this scenario, having two identifiers, one for the protocol, and one for
 the method, would be useful (as opposed to the proposed scheme above which
 only provides one identifier).

 Let's consider this alternative, where we have:

 *   **P** bits for hash of protocol name
 *   **M** bits for hash of method name

 Hence, the total size of the ordinal would be:

 *   **P** + **M** + 1

 Since we need to reserve 1 bit for system ordinals.

 For instance, with the example library:

 ```fidl
 library alternative;

 protocol Parent {
     Get(); // alternative/Parent.Get
 }

 protocol Child {
     compose Parent;
     Get(); // alternative/Child.Get
 }
 ```

 Both "Get" methods would have the same **M** bits (it's the hash of "Get").
 However, the **P** bits would differ; one would be the hash of
 `alternative/Parent`, whereas the other would be the hash of
 `alternative/Child`.

 From a feasibility standpoint, using a similar numerical approach to the
 above, we have:

 *   How many protocols do we expect? On the order of 100k is reasonable.
     *   → would need **P** = 52 bits
 *   How many methods do we expect? On the order of 1k is reasonable.
     *   → would need **M** = 39 bits
 *   So this scheme would require **92 bits**

 As a result, we do not consider this alternative to be feasible.

 Additionally, considering the sandboxing use case further, matching against
 one protocol identifier is insufficient due to protocol composition (methods
 could come from multiple source protocols).
 Hence, while an optimized path may benefit from a single identifier, the
 general case requires a lookup through some data structure to make this
 efficient.

 ## Addendum: Collision Probabilities

 #### Results

 ```
 size | num_methods | r | p(collision)
 -----+-------------+---+-------------
   31 |        1000 |   | 0.0002325707643
   39 |        1000 | x | 0.0000009085847943
   47 |        1000 | x | 0.000000003549160959
   52 |        1000 | x | 0.0000000001109112802
   63 |        1000 | x | 0.00000000000005415589852
   31 |       10000 |   | 0.02301183054
   39 |       10000 |   | 0.00009093624028
   47 |       10000 | x | 0.0000003552357776
   52 |       10000 | x | 0.00000001110111996
   63 |       10000 | x | 0.000000000005420468761
   31 |       50000 |   | 0.4412566126
   39 |       50000 |   | 0.002271108402
   47 |       50000 |   | 0.00000888156712
   52 |       50000 | x | 0.0000002775501665
   63 |       50000 | x | 0.000000000135522561
   31 |      100000 |   | 0.9025370676
   39 |      100000 |   | 0.009053622963
   47 |      100000 |   | 0.00003552615045
   52 |      100000 |   | 0.000001110211306
   63 |      100000 | x | 0.0000000005420956651
   31 |     1000000 |   | 1.0
   39 |     1000000 |   | 0.5972719635
   47 |     1000000 |   | 0.003546406718
   52 |     1000000 |   | 0.0001110160287
   63 |     1000000 | x | 0.00000005421005294

 size: max. num_methods
 31: 66
 39: 1049
 47: 16777
 52: 94906
 63: 4294968
 ```

 #### Code

 Using [http://mpmath.org](http://mpmath.org) to calculate the various probabilities.

 ```
 from mpmath import mp
 mp.dps = 50
 mp.pretty = True

 # Given n random integers drawn from a discrete uniform distribution with
 # range [1,d], what is the probability p(n; d) that at least two numbers are
 # the same?
 def p_collision(n, d):
     # 1 - ((d - 1) / d) ^ (n * (n - 1) / 2)
     base = mp.fdiv(mp.fsub(d, 1), d)
     expo = mp.fdiv(mp.fmul(n, mp.fsub(n, 1)), 2)
     return mp.fsub(1, mp.power(base, expo))

 print("size | num_methods | r | p(collision)")
 for num_methods in [1000, 10000, 50000, 100000, 1000000]:
     print("-----+-------------+---+-------------")
     for size in [31, 39, 47, 52, 63]:
         p = p_collision(num_methods, mp.power(2, size))
         # 1 in 1,000,000
         result = " "
         if p < mp.fdiv(1, 1000000):
             result = "x"
         print("%4d | %11d | %s | %s" % (size, num_methods, result, mp.nstr(p, 10, min_fixed = -mp.inf)))

 def find_max_num_methods(size):
     low = 1
     target = 1
     upper = 10000000
     while True:
         p = p_collision(target, mp.power(2, size))
         if p < mp.fdiv(1, 1000000):
             low = target
         else:
             upper = target
         target = ((upper - low) / 2) + low
         if upper - low < 2:
             return low

 print("size: max. num_methods")
 for size in [31, 39, 47, 52, 63]:
     print("%d: %s" % (size, find_max_num_methods(size)))

 ```

 <!-- xrefs -->
 [channel_call]: /docs/reference/syscalls/channel_call.md
 [birthday]: https://en.wikipedia.org/wiki/Birthday_problem#Generalizations1
	# [FTP](../README.md)-029: Increasing Method Ordinals

	Field \| Value
	----------\|--------------------------
	Status \| Accepted
	Authors \| pascallouis@google.com
	Submitted \| 2019-02-14
	Reviewed \| 2019-02-28

	## Summary

	We propose:

	1. To increase the size of method ordinals to 64 bits (up from 32 bits),
	while maintaining the range with the highest order bit set for Fuchsia
	reserved usage;
	2. Update the epitaph ordinal to `0xFFFFFFFFFFFFFFFF` (from `0xFFFFFFFF`);
	3. Packing the increased ordinal size by **using the flags field of the
	transactional header**;
	4. And discontinue the use of the `[FragileBase]` attribute.

	## Motivation

	#### Combatting Breakage at a Distance

	We've looked for some time at the issue of breakage at a distance which the
	composition model introduces.

	In short, when a `Child` protocol composes a `Parent` protocol, it is
	possible for the `Parent` to introduce a method after the fact which clashes
	with a method defined in the `Child`.
	This change is likely made in the `Parent` without awareness of the breakage
	caused "at a distance" in the `Child` protocol.

	With the ordinal hashing scheme we have today, there is a roughly 2 in 10,000
	chances to incur a collision for protocols on the order of 1,000 methods.
	This is quite likely given the large number of protocols (and expected
	protocols) to be built and expressed in FIDL.
	We've resorted to a temporary annotation `[FragileBase]` to indicate this
	problem, and make developers aware of this issue.

	By increasing the number of bits dedicated to a method ordinal, we can reduce
	the probability of collision to a level satisfactory to consider this issue
	of breakage at a distance nonexistent (in practical terms).

	#### Sizing of Method Ordinals

	The [generalized birthday problem][birthday] well describes the probability
	of collisions.
	We have a `D = 2^k` possibilities ("days") and are looking for the chances
	that amongst N methods ("people"), two of these methods happen to collide
	("same birthday").

	If we consider a threshold of 1 in a million chances as being the largest
	collision probability we are willing to tolerate, we end up with a maximum number
	of methods of:

	* 31 bits: ~60
	* 39 bits: ~1k
	* 47 bits: ~16k
	* 52 bits: ~95k
	* 63 bits: ~4.3M

	Given the above, 47 bits or above are reasonable choices.
	We choose 63 bits for a few additional reasons:

	* In practice, it's safe to have ordinals as numbers in JSON with standard
	parsers (e.g., [in Go](https://play.golang.org/p/Rf523ZjyAK8), or in
	Python).
	(In a v2 of the JSON IR, we plan to wrap ordinals as strings.)
	* With room left for control messages, there is room for other flags
	should these need to be allocated later.
	Today, the only control message which exists is the epitaph one.

	We add a high bit for reserved ordinals hence the resulting size for ordinals
	being 64 bits.

	We considered limiting to 52 bits (53 bits total with control messages)
	because it is the largest positive integer representable in IEEE754, which
	makes it advantageous to represent ordinals in languages which only support
	doubles (e.g., JavaScript).
	However, those languages would still need to manipulate ordinals to place
	them on the wire, and therefore would need a further mechanism to access the
	individual bytes composing doubles.

	## Design

	#### Hash Calculation

	The hash calculation introduced in [FTP-020: Interface Ordinal
	Hashing](ftp-020.md) is slightly altered.
	It should both produce a 64 bit number, and the string over which the hash is
	calculated is `<library>/<top level declaration>.<member>` (per
	[FTP-043: Documentation Comment Format](ftp-043.md)).

	The hashed ordinal is derived by a SHA-256 hash of:

	* library name — encoded as UTF-8; no trailing \0
	* "/" — the forward slash character, ASCII `0x2f`
	* protocol name — encoded as UTF-8; no trailing \0
	* "." — the period character, ASCII `0x2e`
	* method name — encoded as UTF-8; no trailing \0

	For example, the following FIDL declaration:

	```fidl
	library foo;

	protocol Science {
	Hypothesize();
	Investigate();
	Explode();
	Reproduce();
	};
	```

	Will have the following byte patterns used to calculate the ordinal hash:

	```
	foo/Science.Hypothesize
	foo/Science.Investigate
	foo/Science.Explode
	foo/Science.Reproduce
	```

	Once the SHA-256 hash is computed, the upper 63 bits of the SHA-256 hash are
	extracted.
	We extract only 63 bits, since the 64th bit is reserved for system usage.

	In pseudo-code:

	```
	full_hash = sha256(library_name + "/" +
	protocol_name + "." +
	method_name)
	ordinal = full_hash[0] \|
	full_hash[1] << 8 \|
	full_hash[2] << 16 \|
	full_hash[3] << 24 \|
	full_hash[4] << 32 \|
	full_hash[5] << 40 \|
	full_hash[6] << 48 \|
	full_hash[7] << 56;
	ordinal &= 0x7fff ffff ffff ffff; // i.e., 63 ones
	```

	#### Packing Hash in Header

	The transactional message header consists of four 4 bytes fields today:
	transaction id, reserved, flags, and the current ordinal.
	The reserved field is used by the epitaph's error code.
	We therefore propose to increase the ordinal fields from 4 bytes to 8 bytes,
	by using the flags field (unused today).

	The new definition for the header is:

	* `zx_txid_t txid`, transaction ID (32 bits)
	* `txid`s with the high bit set are reserved for use by [zx_channel_call()][channel_call]
	* `txid`s with the high bit unset are reserved for use by userspace
	* See [zx_channel_call()][channel_call] for more details on `txid` allocation
	* `uint32 reserved0`, reserved for future use.
	* `uint64 ordinal`
	* The zero ordinal is invalid.
	* Ordinals above `0xfffffffffffff` are reserved.

	#### JSON IR

	No change, and given the choice of max 52 bits for developer defined ordinals,
	this will be parsable by standard JSON parsers into a 64 bit floating point
	number without loss of precision.

	#### Other Ordinals

	We have ordinals in tables, and extensible unions.
	We are not proposing changing those: in both these cases, there is no breakage
	at a distance scenario today (e.g., no extensible union composes other
	extensible unions).

	## Implementation strategy

	Similar strategy to what has been followed for explicit-to-hashed ordinals.

	## Ergonomics

	No change.

	## Documentation and examples

	Need to modify wire format.

	## Backwards compatibility

	Not backwards compatible.

	## Performance

	No impact expected.
	Method and event dispatch must now be done on a 64 bit integer (vs a 32 bit
	integer), and this is expected to make no difference.

	## Security

	No impact on security.
	See [Alternative: Identifying Protocol and
	Method](#alternate_identifiying-protocol-and-method) for a security
	motivated use case whose performance could be improved with another scheme.

	## Testing

	Unit testing for ordinal calculation.
	Follows similar pattern to [FTP-020: Interface Ordinal Hashing](ftp-020.md).

	## Alternative: Identifying Protocol and Method

	We envision sandboxing services which would shield another service from
	unauthorized use, i.e., a sort of firewall for FIDL messages.
	In building such a sandboxing service, it would be useful to efficiently
	reference a set of messages ("the allowed messages").
	One could imagine defining this set of messages using protocols.

	In this scenario, having two identifiers, one for the protocol, and one for
	the method, would be useful (as opposed to the proposed scheme above which
	only provides one identifier).

	Let's consider this alternative, where we have:

	* P bits for hash of protocol name
	* M bits for hash of method name

	Hence, the total size of the ordinal would be:

	* P + M + 1

	Since we need to reserve 1 bit for system ordinals.

	For instance, with the example library:

	```fidl
	library alternative;

	protocol Parent {
	Get(); // alternative/Parent.Get
	}

	protocol Child {
	compose Parent;
	Get(); // alternative/Child.Get
	}
	```

	Both "Get" methods would have the same M bits (it's the hash of "Get").
	However, the P bits would differ; one would be the hash of
	`alternative/Parent`, whereas the other would be the hash of
	`alternative/Child`.

	From a feasibility standpoint, using a similar numerical approach to the
	above, we have:

	* How many protocols do we expect? On the order of 100k is reasonable.
	* → would need P = 52 bits
	* How many methods do we expect? On the order of 1k is reasonable.
	* → would need M = 39 bits
	* So this scheme would require 92 bits

	As a result, we do not consider this alternative to be feasible.

	Additionally, considering the sandboxing use case further, matching against
	one protocol identifier is insufficient due to protocol composition (methods
	could come from multiple source protocols).
	Hence, while an optimized path may benefit from a single identifier, the
	general case requires a lookup through some data structure to make this
	efficient.

	## Addendum: Collision Probabilities

	#### Results

	```
	size \| num_methods \| r \| p(collision)
	-----+-------------+---+-------------
	31 \| 1000 \| \| 0.0002325707643
	39 \| 1000 \| x \| 0.0000009085847943
	47 \| 1000 \| x \| 0.000000003549160959
	52 \| 1000 \| x \| 0.0000000001109112802
	63 \| 1000 \| x \| 0.00000000000005415589852
	31 \| 10000 \| \| 0.02301183054
	39 \| 10000 \| \| 0.00009093624028
	47 \| 10000 \| x \| 0.0000003552357776
	52 \| 10000 \| x \| 0.00000001110111996
	63 \| 10000 \| x \| 0.000000000005420468761
	31 \| 50000 \| \| 0.4412566126
	39 \| 50000 \| \| 0.002271108402
	47 \| 50000 \| \| 0.00000888156712
	52 \| 50000 \| x \| 0.0000002775501665
	63 \| 50000 \| x \| 0.000000000135522561
	31 \| 100000 \| \| 0.9025370676
	39 \| 100000 \| \| 0.009053622963
	47 \| 100000 \| \| 0.00003552615045
	52 \| 100000 \| \| 0.000001110211306
	63 \| 100000 \| x \| 0.0000000005420956651
	31 \| 1000000 \| \| 1.0
	39 \| 1000000 \| \| 0.5972719635
	47 \| 1000000 \| \| 0.003546406718
	52 \| 1000000 \| \| 0.0001110160287
	63 \| 1000000 \| x \| 0.00000005421005294

	size: max. num_methods
	31: 66
	39: 1049
	47: 16777
	52: 94906
	63: 4294968
	```

	#### Code

	Using [http://mpmath.org](http://mpmath.org) to calculate the various probabilities.

	```
	from mpmath import mp
	mp.dps = 50
	mp.pretty = True

	# Given n random integers drawn from a discrete uniform distribution with
	# range [1,d], what is the probability p(n; d) that at least two numbers are
	# the same?
	def p_collision(n, d):
	# 1 - ((d - 1) / d) ^ (n * (n - 1) / 2)
	base = mp.fdiv(mp.fsub(d, 1), d)
	expo = mp.fdiv(mp.fmul(n, mp.fsub(n, 1)), 2)
	return mp.fsub(1, mp.power(base, expo))

	print("size \| num_methods \| r \| p(collision)")
	for num_methods in [1000, 10000, 50000, 100000, 1000000]:
	print("-----+-------------+---+-------------")
	for size in [31, 39, 47, 52, 63]:
	p = p_collision(num_methods, mp.power(2, size))
	# 1 in 1,000,000
	result = " "
	if p < mp.fdiv(1, 1000000):
	result = "x"
	print("%4d \| %11d \| %s \| %s" % (size, num_methods, result, mp.nstr(p, 10, min_fixed = -mp.inf)))

	def find_max_num_methods(size):
	low = 1
	target = 1
	upper = 10000000
	while True:
	p = p_collision(target, mp.power(2, size))
	if p < mp.fdiv(1, 1000000):
	low = target
	else:
	upper = target
	target = ((upper - low) / 2) + low
	if upper - low < 2:
	return low

	print("size: max. num_methods")
	for size in [31, 39, 47, 52, 63]:
	print("%d: %s" % (size, find_max_num_methods(size)))

	```

	<!-- xrefs -->
	[channel_call]: /docs/reference/syscalls/channel_call.md
	[birthday]: https://en.wikipedia.org/wiki/Birthday_problem#Generalizations1