This document serves as an overview of the internal workings of the FIDL compiler fidlc
. fidlc
is a command line tool that takes in a number of FIDL files, and outputs FIDL JSON IR.
The main inputs to the compiler are the arguments to the --files
flags, which describe a list of files grouped by library. The compiler parses each file individually to obtain a parse tree for each file:
SourceFile
, which owns the buffer backing the fileLexer
, which takes the SourceFile
as an argument to its constructor. This class exposes a Lex()
method, which returns the next Token
in the file; it can be called repeatedly to get the sequence of Token
s in the file.Lexer
to initialize a Parser
and then calls the Parse()
method which constructs a parse tree. The code refers to parse trees as a raw AST. This function returns a raw::File
which is the class representing the root node of the raw AST.flat::Library
.raw::File
parse tree corresponding to that library, converting raw::
nodes to their flat::
counterparts. For example, a raw::StructDeclaration
becomes a flat::Struct
, and a raw::Constant
becomes a flat::Constant
. The converted flat::
AST nodes are then stored under a single flat::Library
. Initially these flat AST nodes contain the same information as the raw AST nodes, but they also contain fields such as a value
field for values and a typeshape
field for types which are later set during the compilation step.Note: One of the key distinction between raw and flat ASTs is that each raw tree represents a single file, whereas each flat tree represents a single FIDL library which may contain multiple files.
Finally, we end up with a flat AST that is processed and ready for backend generation either to C bindings or to a JSON IR.
The Lexer
works primarily by keeping track of two char *
pointer into the file data. The current_
pointer marks the current location that the class is at. Thetoken_start_
pointer marks the start of the lexeme that is currently being worked on. Each time the Lex()
method is called, the current_
pointer is advanced until a complete lexeme has been traversed. Then, a Token
is constructed using the data in between the two pointers.
The lexer also keeps track of a third previous_end_
pointer so it can get the data between lexemes (generally whitespace) when it constructs the Token
. This example shows of how the pointers change during a call to Lex()
on the short FIDL snippet const bool flag = true;
:
Initial state after lexing the const
keyword:
const bool flag = true; ^current_ ^token_start_ ^previous_end_
Whitespaces are skipped until the next lexeme:
const bool flag = true; ^current_ ^token_start_ ^previous_end_
The current_
pointer is updated until the end of the lexeme:
const bool flag = true; ^current_ ^token_start_ ^previous_end_
At this point, the next Token
that gets returned is ready to be constructed. A Token
is created with its previous_end_
argument set to the data between previous_end_
and token_start_
. The location_
is set to the data between token_start_
and current_
. The kind is set to Identifier
. Before returning, the pointers are reset and end up in a state similar to the initial state. This process can then be repeated for the next token:
const bool flag = true; ^current_ ^token_start_ ^previous_end_
Internally the two pointers are manipulated with these main methods:
Skip()
. Skips over any unnecessary characters (e.g. whitespace) by moving both the current_
and the token_start_
pointers forward.Consume()
. Returns the current character and advances current_
.Reset()
. Returns the data between token_start_
and current_
. Then, sets token_start_
to the value of current_
.The Parser
's goal is to convert a Token
stream into a parse tree (the raw::
AST) with the Parse()
method. The Token
stream is generated by calling Lex
repeatedly on a FIDL file. This parser is implemented with recursive descent. Each node of the raw AST has a corresponding ParseFoo()
method that consumes Token
s from the Lexer
and returns a unique_ptr
to an instance of that node. In case there is a failure, a nullptr
is returned.
Note: A node of the raw AST is a start and end token pair with pointers to any children. The start and end tokens of each child are contained within the start and end tokens of its parent. The root node, which is a raw::File
, has a start token equal to the first token of the file and an end token equal to the last token of the file.
The Parser
keeps track of the:
SourceElements
, which are stored in active_ast_scopes_
.Token
s that are being processed. last_token_
and previous_token_
, respectively.Parser::Lex
method. The current token (last_token_
) is always the next token that is about to be consumed, which effectively gives the parser a one token lookahead (i.e. LL(1)).The Parser
determines what kind of node the current Token
belongs to based on the Token::Kind
of last_token_
, using the Peek()
method. Then, the Parser
updates its state and constructs the nodes through the use of the ASTScope
class as well as the ConsumeToken
and MaybeConsumeToken
helper methods.
This example shows a simple non-recursive case line by line. The parser method looks like this:
std::unique_ptr<raw::StringLiteral> Parser::ParseStringLiteral() { ASTScope scope(this); ConsumeToken(OfKind(Token::Kind::kStringLiteral)); if (!Ok()) return Fail(); return std::make_unique<raw::StringLiteral>(scope.GetSourceElement()); }
It consumes a single token and returns a leaf node, raw::StringLiteral
:
class StringLiteral : public SourceElement { public: explicit StringLiteral(SourceElement const& element) : SourceElement(element) {} virtual ~StringLiteral() {} }
The method starts by creating a new ASTScope
which initializes the SourceElement
that will later be used to create the returned node by pushing it onto active_ast_scopes_
. The start of the SourceElement
gets set to the first token that is consumed and the end gets set in the call to GetSourceElement()
before returning.
The new SourceElement
for the StringLiteral
gets initialized inside the ASTScope
constructor:
parser_->active_ast_scopes_.push_back(raw::SourceElement(Token(), Token()));
Then, ConsumeToken
is called to process the next token. This method takes a predicate constructed with OfKind()
and calls that predicate on last_token_
's kind or subkind. OfKind()
returns a function that checks that its input matches the given kind or subkind. If the predicate fails (in this case, if the current token is not a string literal), the error gets stored in the class and the method returns. The Ok()
call catches the error and stops the compiler with a parsing error. If the predicate succeeds, the start token of any SourceElement
s in the stack that are uninitialized are set to the current token:
for (auto& scope : active_ast_scopes_) { if (scope.start_.kind() == Token::Kind::kNotAToken) { scope.start_ = token; } }
In this example, the start token is set to the top element of the stack since it was initialized at the start of the method. The Parser
then advances to the next token by setting previous_token_ = last_token_
and last_token_ = Lex()
.
Finally, the resulting StringLiteral
node is returned using scope.GetSourceElement()
. This sets the end token of the SourceElement
at the top of the stack to the previous_token_
, and then returns the SourceElement
. The final node has the same start and end token because StringLiteral
s are only ever a single token long, but other methods may consume many tokens before calling GetSourceElement()
. When the method returns, the destructor of ASTScope
pops the top element off of active_ast_scopes_
.
At this point, the compiler has successfully constructed a raw::File
for each input file and proceeds to compile these raw ASTs into flat ASTs in three steps:
raw::File
per file into one flat::Library
per library.flat::
node itself.Note: Within the code, the term “compilation” may refer to these three steps as a whole (for example, all three steps are performed within the Library::Compile
method), or it could refer specifically to the “resolution” step (for example, the logic for performing resolution is contained in the Library::CompileDecl
method ). In general this document tries to use “resolution” when referring to the third step to disambiguate between the two possible meanings.
After each file is parsed into a raw::File
and an empty AST (flat::Library
) has been initialized for each library, the AST needs to be updated with all of data from the raw::File
s that correspond to it. This is done recursively with the ConsumeFoo()
methods. Each ConsumeFoo()
method generally takes the corresponding raw AST node as input, updates the Library
class, and then return a bool
to indicate success or failure. These methods are responsible for:
Transport
attribute is only specified for protocols.textures.Foo
will error if the textures
library was not imported)Library
's foo_declarations_
attribute. Initially the values of the flat AST nodes are unset but they are calculated later during compilation.declarations_
vector. Const declarations and enum/bit fields (which declare a value) are also added to the constants_
vector, whereas all other declarations (which declare a type) get their corresponding type template added to the library's typespace.Once all of the declarations for a given Library
have been added to the declarations_
vector, the compiler can proceed to resolve each individual declaration. However, it must do this in the correct order (so that any dependencies of a declaration are resolved before it); this is accomplished by first sorting the declarations into a separate declarations_order_
vector, and then iterating through it to compile each declaration. The sorting is done in the SortDeclarations()
method, and makes use of DeclDependencies()
to determine the dependencies for a given declaration.
Given the sorted declarations, the compilation happens through the CompileFoo
methods, generally corresponding to the AST nodes (e.g. CompileStruct
, CompileConst
), with CompileDecl
as the entrypoint. The main purpose of CompileDecl
is for:
TypeDecl
s to determine their TypeShape
Const
s to resolve their value, andTypeConstructor
s to set their Type
.Once this step is complete, the flat::Library
contains all the necessary information for any code generation. The FIDL compiler can directly generate C bindings, or can generate a JSON IR that can be consumed by a separate backend
Before marking compilation as successful, the FIDL compiler also does a few additional checks: for example, it will check that any constraints on attributes are satisfied, and that all declared library dependencies were actually used.
The FIDL compiler emits a JSON intermediate representation (IR). The JSON IR is consumed by a separate program, named the back-end, that generates the language bindings from the JSON IR.
The officially supported FIDL language back-ends are:
For historical reasons, C bindings are directly generated from the FIDL compiler - the C bindings do not support all features and types that are used with the C bindings need to be annotated with the Layout = "Simple"
attribute.
fidlc
is also responsible for generating “coding tables”, which are instances of fidl_type_t
used to represent the FIDL messages at runtime and are used by all of the bindings for the C family of languages (C, LLCPP, HLCPP). To do this, the flat AST is converted to an intermediate representation called the “coded” AST with the CodedTypesGenerator
, which iterates through the flat::Decl
s in a given flat::Library
and converts each raw::Decl
node to the corresponding coded::Type
node (e.g. flat::Struct
becomes a coded::StructType
).
The coding tables are then generated by the TablesGenerator
, which will generate C code for each coded::Type
that constructs the equivalent fidl_type_t
type. For example, for a coded::StructType
called MyStruct
, the TablesGenerator
would write out C code that constructs an equivalent fidl::FidlCodedStruct
, something like:
const fidl_type_t MyStruct = fidl_type_t(::fidl::FidlCodedStruct( my_struct_fields, 1u, 32u, "mylibrary/MyStruct"));
Coding tables for FIDL libraries in //sdk/fidl
are generated into the out directory under fidling/gen/sdk/fidl/LIBRARY/LIBRARY.fidl.tables.c
. For example out/default/fidling/gen/sdk/fuchsia.sys/fuchsia.sys.fidl.tables.c
. The README gives additional context. The fidl_type_t
definitions (such as for FidlCodedStruct
) can be found in internal.h.
The Decl
is the base of all flat AST nodes, just like SourceElement
is the base of all parser tree nodes, and corresponds to all possible declarations that a user can make in a FIDL file. There are two types of Decl
s:
Const
s, which declare a value, and have a value
attribute that gets resolved during compilation, andTypeDecl
s, which declare a message type or interface and have a typeshape
attribute that gets set during compilation.TypeDecl
s representing an aggregate type (e.g. structs, tables, and unions) also have a static Shape()
method which contains the logic for determining the Typeshape
of that given type.
A FieldShape
describes the offset and padding information for members of an aggregate type like a struct or union. Generally these fields will require both a Typeshape
as well as a FieldShape
A Name
represents a scope variable name, and consists of the library the name belongs to (or nullptr
for global names), and the variable name itself as a string (for anonymous names) or a SourceLocation
. Name
s can also optionally include a member_name_
field when referring to the field of a declared variable (for example MyEnum.MyField
).
A SourceElement
represents a block of code inside a fidl file and is parameterized by a start_
and end_
Token
. All parser tree (“raw” AST) nodes inherit from this class.
Wrapper around a file which is responsible for owning the data in that file. Also see Virtual SourceFile
Wrapper around a StringView
and the SourceFile
it comes from. It provides methods to get the surrounding line of the StringView
as well as its location in the form of a "[filename]:[line]:[col]"
string
Wrapper around a set of SourceFile
s that all relate to a single Library
.
A token is essentially a lexeme (in the form of a SourceLocation
stored as the location_
attribute), enhanced with two other pieces information that are useful to the parser during the later stages of compilation:
previous_end_
. A SourceLocation
which starts at the end of the previous token and ends at the start of this token. It contains data that is uninteresting to the parser such as whitespace.Kind::LeftParen
, Kind::Dot
, Kind::Comma
, etc...const
, struct
) are considered identifiers, but also have a subkind defined to identify which keyword it is (e.g. Subkind::Const
, Subkind::Struct
). All other tokens have a subkind of None
.kNotAToken
.A struct representing an instance of a type. For example, the vector<int32>:10?
type corresponds to an instance of the VectorType
TypeDecl
with max_size_ = 10
and maybe_arg_type
set to the Type
corresponding to int32
. Built-in types all have a static Shape()
method which will return the Typeshape
given the parameters for an instance of that type. User defined types (e.g. structs or unions) will all have a type of IdentifierType
- the corresponding TypeDecl
, like Struct
provides the static Shape()
method instead.
See Decl
Information about how objects of a type should be laid out in memory, including their size, alignment, depth, etc.
The typespace is a map from Type
names to a TypeTemplate
for that Type
. During compilation, the typespace is initialized to include all of the built-in types (e.g. "vector"
maps to VectorTypeTemplate
), whereas user-defined types get added during the compilation process. This also ensures that a single type template instance exists per type and avoids name collisions/shadowing of types (e.g. fxbug.dev/7646).
Instances of TypeTemplates provide a Create()
method to create a new instance of a specific Type
- therefore there is a TypeTemplate subclass for each built-in FIDL type (e.g. ArrayTypeTemplate
, PrimitiveTypeTemplate
, etc.) as well as a single class for all user defined types (TypeDeclTypeTemplate
), and one for type aliases (TypeAliasTypeTemplate
). Create()
takes as parameters the possible type parameters: argument type, nullability, and size.
For example, to create an object representing the type vector<int32>:10?
the compiler would call the Create()
method of the VectorTypeTemplate
with an argument type of int32
, max size of 10
, and a nullability of types::Nullability::kNullable
. This call returns an instance of a VectorType
with those parameters. Note that not all 3 of these parameters apply to all of the types (e.g. PrimitiveType
s, like int32
have none of the 3). The Create()
method of the type template for each type automatically checks that only the relevant parameters are passed.
The concrete type for user defined types is the IdentifierType
, which gets generated by the TypeDeclTypeTemplate
.
A subclass of SourceFile
that has a fake “filename” and is initialized with no backing data. It exposes an AddLine()
method to add data to the file, and is used as a backing SourceFile
for content that does not appear directly in any of the input SourceFile
s, like for generated anonymous Name
s.