Overnet provides a mesh network with a service discovery mechanism that can forward around FIDL communications. This document serves as a rough architectural map and a glossary of terms that have special meaning in Overnet.
An Overnet mesh is a collection of nodes belonging to a single entity. Nodes are addressable units within a mesh, and are coupled with an address known as a NodeId. In the current implementation of Overnet, NodeId's are randomly assigned at node instantiation time, but this may change as the authentication story becomes more complete.
Within the Overnet core library, Node's are created by creating a Router
instance. Upon instantiation a node cannot communicate with other nodes in the mesh. To do so, a Link must be added to some other node.
Each node exports a list of services that may be connected to from other nodes.
Each node tracks a list of peer nodes that it can route to.
An Overnet Link provides connectivity between two nodes. No particular transport technology is mandated for a Link, indeed it's expected that a single mesh include multiple link technologies between nodes. To keep Link implementation simple, all a link needs to do is provide a mechanism to transport up to 1400 byte datagrams unreliably. Unreliably here means some packets may be dropped, repeated, reordered, or corrupted, but the majority should be transmitted successfully.
Links are typically implemented externally to the Overnet core library. This provides some convenient implementation firewalls, and allows Overnet embedders to compose different suites of protocols more easily, meaning not every technology needs to be ported simultaneously! Currently the following link transports are implemented:
Link frames come in two categories - messages and control. Additionally link frames can contain some lightweight additional payload.
Link frames carry a label that is not defined in FIDL: it cannot be as FIDL negotiation cannot occur before the header is sent. Details of this serialization are contained in frame_label.rs. Essentially, we have a one byte bit field contains which fields are present, and then fields are encoded as needed along side it. One interesting point for this label is that it's encoded in reverse at the end of the frame. This has the advantage of simplifying the encoding pipeline by not needing to inject bytes at the beginning of the frame, necessitating either complex logic or a copy.
Link control frames are not routable - they can only be sent to the links direct peer node.
Control frame payloads are defined by LinkControlFrame. Control frames are sent using a simple protocol that requires an ack for each datagram sent before the next one can be - as such the control protocol is inappropriate for bulk data transfer.
Currently control frames are used to send an initial introduction frame, allowing link parameters to be set, and to exchange routing information.
Until the first control frame is acked, control frames transport the source node id. This is how the peer node id for a link is discovered. After the ack, control frames omit sending the source node id to save some space.
Link message frames include source and destination node ids. If the source node id is the sender of a message, or the destination node id is the peer of a link, that node id can be elided in the protocol. This provides some substantial bandwidth savings in common cases.
Link frames can be annotated with some additional side band information.
The PING/PONG mechanism allows link latency to be measured, which provides a good metric for route selection.
Each node in the mesh constructs a Peer node to every other node. This Peer node consists of a client QUIC connection. Consequently, each node in the mesh also carries at least one server QUIC connection to every other node. More server QUIC connections may exist briefly due to stale connections. Note that this arrangement means that each node typically has two peer objects and associated QUIC connections for every other node on the mesh - one client and one server.
The QUIC connection is arranged such that:
Peer QUIC streams are segmented into datagrams. These datagrams are typed into several categories - Hello, Data, Control, and Signal. The categories mean something different for the control channel and other channels.
The control channel gets used for non-proxying miscellany:
update_node_description
.connect_to_service
.When the control channel comes up, the first thing each side sends is a header containing FIDL flags and the current FIDL magic number. This exchange sets parameters for all FIDL messages used by Overnet for the remainder of the connection, over all streams. This header is not encapsulated in the normal datagram framing.
After the FIDL header, the framing protocol begins. All messages on the control channel are of type Data, other types are disallowed.
The next message is a ConfigRequest
(from the client), followed by a ConfigResponse
from the server. These are currently empty tables and are intended to provide a compatibility handshake for future Overnet protocol expansion.
The remainder of the messages exchanged are of type PeerMessage
from the client, and PeerReply
from the server.
Peer objects are proxied over QUIC datagrams. Control logic is implemented to adapt Zircon channels, sockets, and event pairs into peer QUIC streams.
To create a stream, first a QUIC stream ID is created. Then, over a pre-existing stream, a message is sent that binds some object to that stream. For Zircon handles, the message should contain a ZirconHandle
object from the zircon_proxy.fidl protocol. Currently the two ways of doing this are via a ZirconChannelMessage
on a channel based stream, or via a ConnectToService
request on the control channel.
The kind of handle bound determines the usage of the Data datagram frame - a channel will transfer ZirconChannelMessage
instances encoded via FIDL, a socket will exchange bytes. Datagram oriented sockets will match Overnet Data frames 1:1 with datagrams from the socket. Stream oriented sockets will ignore the framing and concatenate the bytes from the stream together. Note that for channels the ZirconChannelMessage
is encoded with FIDL options negotiated at peer connection time, whilst the payload it carries in the bytes
field will contain a FIDL message encoded with potentially different options by the application layer.
All proxyable Zircon objects, and consequently all currently supported Overnet stream types, support sending signals. These signals are transported via the Signal datagram frame type, using SignalUpdate
messages.
Proxy stream endpoints can be transferred between nodes. The primary proxy stream is always bidirectional. The first frame sent is a marker Hello frame sent from the initiator of the stream to the first endpoint.
Control frames transmit StreamControl
messages, and are used to facilitate transfers.
A
wants to transfer control of a stream endpoint to another node C
, it begins by sending a begin_transfer
message. This message captures the destination node, and a transfer_key
used to label the transfer operation.A
to C
, and any messages received from the peer node B
are forwarded on this stream (since there is a time period where B
does not know that a transfer is in progress).B
, upon receipt of a begin_transfer
message, the transfer is acknowledged with an ack_transfer
reply. The only further message allowed to be received is a required shutdown
message with status OK
. If the transfer is to the receiving node, then the two handles (the two ends of the proxied stream) are “rejoined” and the transfer is completed by Overnet ceasing to proxy anything. If the transfer is to another node, then an open_transfer
request is sent on the peer control channel B
->C
, and normal proxying resumes. The transfer_key
is sent along with this request so that the transfer can be located later.A
, upon receipt of the ack_transfer
will finally construct a StreamRef
indicating that it was a transfer_initiator
and so sending the drain stream id, the new destination node, and the transfer_key
. With this information the receiving peer can tie together the transfer and reconstruct an ordered set of messages.Note that at 3 in the algorithm below, B
may have decided to also transfer control to C
or a different node D
. In that case, each peer will receive a begin_transfer
message where they expected to receive an ack_transfer
. We need to behave differently on each side to successfully complete this 4-way transfer, so we arbitrarily label the QUIC Client the transfer_initiator
and the QUIC Server the transfer_awaiter
. (There's nothing important about Client/Server here, it could have equally be higher/lower numbered node id or any other deterministic decision that can be agreed upon by each peer).
A
and B
now immediately agree that the transfer_awaiter
assigns the overall transfer_key
, and the transfer_initiator
transfer_key
is dropped. At this point the StreamRef
can be formed and both sides can continue the transfer.
It's expected that there are few nodes on a given Overnet mesh, which makes it practical to employ a total knowledge routing algorithm.
Each Link A
communicates with its peer node B
the set of nodes that B
can reach by sending a packet to A
, along with some metrics for each route. Note that this is the list of nodes that A
can reach, absent the list of nodes that A
would choose to reach via B
. Communication is via the link control protocol. The mesh wide routing tables are eventually consistent, and route loops are guaranteed ephemeral.
Each Node maintains a list of other nodes in the system, along with a preferred route to reach that node. When this routing table changes, updates are sent to other links.
Each packet received by a link for a node other than the node receiving consults the routing table for the next step in the chain.
Routes are figured out to produce forwarding tables in routes.rs. Communication of routing tables (both incoming and outgoing) is in the Link code. Finally, local link RTTs are assembled and fed into the route planner in link_status_updater.rs.
Overnet on Fuchsia is provided by the overnetstack component. Applications can use Overnet by accessing the capabilities overnetstack exports. It's expected that there be one overnetstack instance per device, or per user identity on a device, and that instance service many other Fuchsia components.
Since non-Fuchsia operating systems do no provide Zircon channels or sockets, an emulation is provided for them. That emulation is in-process only, and to provide inter-process exchange of these objects, we leverage Overnet. As such, each binary on host embeds Overnet with a single link protocol to connect to ascendd. Ascendd is a binary that provides a unix domain socket server for other binaries to connect to. It also provides link protocols to reach Overnet instances off device. Ascendd is again embeddable - ffx does this to co-host ascendd and the ffx daemon for instance.
QUIC is used in two places in the Overnet stack.
It's primarily and always used for communications between peers. In the original (now deleted, C++) Overnet prototype, a custom packet based protocol was engineered. At the time QUIC was bound tightly to HTTP semantics, which Overnet preferred not to share. During prototyping a necessary feature set emerged: Overnet needed to transport large datagrams reliably and in order to provide proxying of Zircon objects. As QUIC and HTTP/3 evolved, the QUIC standard shed many of the HTTP-isms that were inappropriate, and since the reliable byte streams it provided could easily be turned into datagram streams by adding framing, it had exactly the feature set that was needed for the peer protocol. As a bonus, there was a well worked out TLS implementation in place, and third party libraries implementing the protocol, so that Overnet did not need to reinvent that either.
Later, when it was time to implement a secured UDP protocol, it was noticed that QUIC had added datagram support. Since we already had working abstractions around QUIC, it became natural to leverage code we were already linking to implement the link protocol too.