Why try to design new input APIs?

There is an equivalent to the TextField interface on essentially all other major operating systems. However, TextField has some features that aren't found in these equivalent interfaces. Why does TextField add revisions and transactions to the text input protocol?

The answer is simple — pre-existing protocols have race conditions. For instance, let‘s say the input method wants to perform the operation “delete the word behind the caret” with a text input protocol. It first calls a function like text_field.get_text_behind_cursor(), calculates how long the word behind the cursor is, and then calls a function like text_field.delete_text_behind_cursor(word_len). However, the text field is on a different process, and during the fraction of a second it took to calculate the word’s length, the user tapped on the screen to change the position of the caret. A delete request is sent, but the word behind the cursor has a longer length, and half of a word gets deleted. Disaster!

A naive text input protocol also has rendering race conditions. For example, if an input method wants to delete the previous word and then insert a new word, it has to call separate functions, delete_text_behind_cursor(word_len) and insert_text_at_cursor("my text"). If the rendering thread decides to draw the text field in between these two operations, the user briefly sees a flicker of the word being deleted before it's inserted. Not as bad as messing up an edit, but still frustrating.

In the process of trying to solve these problems, we experimented with a number of different solutions, including ones as radical as using CRDTs to resolve conflicts. In the end, arrived at a simpler solution. All edit requests, such as delete_text_behind_cursor in our protocol above, must be submitted as part of an atomic transaction; they are all applied together, and renders cannot happen halfway through a transaction. This prevents flickering. The protocol also prevents editing race conditions by using revision numbers. The text field sends a revision number along with each state update, which increments any time there‘s an edit. The input method states its last seen revision number when it sends a transaction. If the transaction’s revision doesn‘t match the text field’s current revision number, then the transaction is rejected, allowing the text field to re-attempt an edit with the new state.

This approach certainly has some downsides. For instance, if the text field is receiving edits frequently (faster than the FIDL latency plus input method processing time, which is on the order of ~tens of microseconds) then there‘s no guarantee an input method’s edit will ever get in. However, edits are much less frequent than that, and the alternative was to allow the input method to hold a lock, which is dangerous, latency-prone, wouldn‘t work with multiple input methods, and a difficult proposition for many text fields that don’t have a way to queue application-side edits while a lock is held.

In the future, we could reduce the likelihood of revision mismatches by only rejecting a transaction if we know the input method has requested (since the previous revision) information that has changed. However, we determined this approach to be premature optimization for now. If needed, this can be implemented without changing the ABI or breaking existing input methods.