Banjo is an interface definition language (IDL) used to express interfaces used between drivers. It is a derivative of FIDL, with a forked syntax from 2018. While the syntax is similar, unlike FIDL, banjo was designed for synchronous in-process communication, and the resulting codegen amounts to a very barebones struct of function pointers, associated with a context pointer.
A non-exhaustive list of problems with banjo include:
We aim to solve these problems by evolving banjo into something better. The three key features of the new transport will be:
We are expecting to find a solution with the following characteristics:
We reserve the right to change our minds depending on the benchmark results of early prototypes. If we cannot outperform mechanisms provided by the kernel, we will need to try alternative designs. We also need to prove out our assumptions that the mechanisms provided by the kernel are insufficient for our needs.
We will try to track progress towards a new banjo with the following milestones:
We will likely need to work with the FIDL team to allow LLCPP bindings to be abstracted away from zircon channels and ports to allow us to repurpose the bindings mostly as-is on a new transport with minimal user visible differences. We don't anticipate any changes necessary to the frontend IDL, but changes to FIDL IR may be necessary.
Additionally, migrating 300+ drivers will take a lot of effort and time, and will require various teams throughout the organization to be involved to ensure nothing breaks.
A major change like this has long-term implications on performance characteristics of our system by inducing additional overhead. Luckily, we have built in some evolutionary support directly into our framework's architecture to enable us to move towards another technology if the solution we build is unable to meet future needs. We can do this by implementing new component runners and having drivers target the new runner, which may have a different driver runtime. Switching every driver over to the new driver runner will likely be impractical, however, so we will end up needing to maintain both in parallel, which has costs of its own. As such, we really want to get this approach mostly right to avoid needing to take this course.
Switching drivers to a new threading model also is a large cost to pay, and may induce new bugs along the way. Many drivers lack tests. Additionally, for drivers that do have tests, unit tests may also lose their validity after the switch and may have to be rewritten alongside the transition. We have written a great deal of our driver tests as integration tests which should continue to be valid even after migration without any changes. We will continue to try to invest in more integration tests and e2e tests prior to migration to prevent introduction of new bugs.
Estimating the migration timeline for the migration is another large risk. It is hard to accurately estimate the cost here without having built a replacement and trialed migration on at least one driver. We will need to continually be cognizant of the cost as we implement our design, and automate as much of the migration as possible.