|  | # Fidl Compiler Frontend | 
|  |  | 
|  | This document serves as an overview of the internal workings of the FIDL compiler `fidlc`. | 
|  | `fidlc` is a command line tool that takes in a number of FIDL files, and outputs | 
|  | [FIDL JSON IR][schema]. | 
|  |  | 
|  | ## Overview | 
|  |  | 
|  | The main inputs to the compiler are the arguments to the `--files` flags, which describe a list of | 
|  | files grouped by library. The compiler parses each file individually to obtain a parse tree for | 
|  | each file: | 
|  |  | 
|  | 1. Each file is loaded into a [`SourceFile`](#sourcefile), which owns the buffer backing the file | 
|  | 2. The compiler initializes a [`Lexer`](#lexing), which takes the `SourceFile` as an argument to | 
|  | its constructor. This class exposes a `Lex()` method, which returns the next [`Token`](#token) | 
|  | in the file; it can be called repeatedly to get the sequence of `Token`s in the file. | 
|  | 3. The compiler uses the `Lexer` to initialize a [`Parser`](#parsing) and then calls the `Parse()` | 
|  | method, which constructs a parse tree. The code refers to parse trees as a raw AST. This function | 
|  | returns a `raw::File`, which is the class representing the root node of the raw AST. | 
|  | 4. Once the compiler has parsed each file into a parse tree, it then groups the parse trees into a | 
|  | single AST (referred to in the code as a flat AST) for each library. The root of this flat AST | 
|  | is a `flat::Library`. | 
|  | * For each library, the compiler traverses every `raw::File` parse tree corresponding to that | 
|  | library, converting `raw::` nodes to their `flat::` counterparts. For example, a | 
|  | `raw::StructDeclaration` becomes a `flat::Struct`, and a `raw::Constant` becomes a | 
|  | `flat::Constant`. The converted `flat::` AST nodes are then stored under a single | 
|  | `flat::Library`. Initially these flat AST nodes contain the same information as the raw AST | 
|  | nodes, but they also contain fields such as a `value` field for values and a `typeshape` field | 
|  | for types that are later set during the compilation step. | 
|  | 5. Once the AST has been fully initialized, the compiler evaluates constants and determines the | 
|  | memory alignment and size information for the declared types. | 
|  |  | 
|  | Note: One of the key distinction between raw and flat ASTs is that each raw tree represents a | 
|  | single file, whereas each flat tree represents a single FIDL library that may contain multiple | 
|  | files. | 
|  |  | 
|  | Finally, we end up with a flat AST that is processed and ready for backend generation either to C | 
|  | bindings or to a JSON IR. | 
|  |  | 
|  | ## Lexing {#lexing} | 
|  |  | 
|  | The `Lexer` works primarily by keeping track of two `char *` pointer into the file data. The | 
|  | `current_` pointer marks the current location that the class is at. The`token_start_` pointer marks | 
|  | the start of the lexeme that is currently being worked on. Each time the `Lex()` method is called, | 
|  | the `current_` pointer is advanced until a complete lexeme has been traversed. Then, a `Token` is | 
|  | constructed using the data in between the two pointers. | 
|  |  | 
|  | The lexer also keeps track of a third `previous_end_` pointer so it can get the data between | 
|  | lexemes (generally whitespace) when it constructs the `Token`. This example shows of how the | 
|  | pointers change during a call to `Lex()` on the short FIDL snippet `const bool flag = true;`: | 
|  |  | 
|  | Initial state after lexing the `const` keyword: | 
|  |  | 
|  | ``` | 
|  | const bool flag = true; | 
|  | ^current_ | 
|  | ^token_start_ | 
|  | ^previous_end_ | 
|  | ``` | 
|  |  | 
|  | Whitespaces are skipped until the next lexeme: | 
|  |  | 
|  | ``` | 
|  | const bool flag = true; | 
|  | ^current_ | 
|  | ^token_start_ | 
|  | ^previous_end_ | 
|  | ``` | 
|  |  | 
|  | The `current_` pointer is updated until the end of the lexeme: | 
|  |  | 
|  | ``` | 
|  | const bool flag = true; | 
|  | ^current_ | 
|  | ^token_start_ | 
|  | ^previous_end_ | 
|  | ``` | 
|  |  | 
|  | At this point, the next `Token` that gets returned is ready to be constructed. A `Token` is created | 
|  | with its `previous_end_` argument set to the data between `previous_end_` and `token_start_`. The | 
|  | `location_` is set to the data between `token_start_` and `current_`. The kind is set to | 
|  | `Identifier`. Before returning, the pointers are reset and end up in a state similar to the initial | 
|  | state. This process can then be repeated for the next token: | 
|  |  | 
|  | ``` | 
|  | const bool flag = true; | 
|  | ^current_ | 
|  | ^token_start_ | 
|  | ^previous_end_ | 
|  | ``` | 
|  |  | 
|  | Internally the two pointers are manipulated with these main methods: | 
|  |  | 
|  | * `Skip()`. Skips over any unnecessary characters (e.g. whitespace) by | 
|  | moving both the `current_` and the `token_start_` pointers forward. | 
|  | * `Consume()`. Returns the current character and advances `current_`. | 
|  | * `Reset()`. Returns the data between `token_start_` and `current_`. Then, sets `token_start_` | 
|  | to the value of `current_`. | 
|  |  | 
|  | ## Parsing {#parsing} | 
|  |  | 
|  | The `Parser`'s goal is to convert a `Token` stream into a parse tree (the `raw::` AST) with the | 
|  | `Parse()` method. The `Token` stream is generated by calling `Lex` repeatedly on a FIDL file. This | 
|  | parser is implemented with [recursive | 
|  | descent](https://en.wikipedia.org/wiki/Recursive_descent_parser). Each node of the raw AST has a | 
|  | corresponding `ParseFoo()` method that consumes `Token`s from the `Lexer` and returns a | 
|  | `unique_ptr` to an instance of that node. In case there is a failure, a `nullptr` is returned. | 
|  |  | 
|  | Note: A node of the raw AST is a [start and end token pair](#sourceelement) with pointers to any | 
|  | children. The start and end tokens of each child are contained within the start and end tokens | 
|  | of its parent. The root node, which is a `raw::File`, has a start token equal to the | 
|  | first token of the file and an end token equal to the last token of the file. | 
|  |  | 
|  | The `Parser` keeps track of the: | 
|  |  | 
|  | * Current nodes that are being built using a stack of [`SourceElements`](#sourceelement), which are | 
|  | stored in `active_ast_scopes_`. | 
|  | * Current and previous `Token`s that are being processed. `last_token_` and `previous_token_`, | 
|  | respectively. | 
|  | * Its own state machine inside of the `Parser::Lex` method. The current token (`last_token_`) is | 
|  | always the next token that is about to be consumed, which effectively gives the parser a one | 
|  | token lookahead (i.e. LL(1)). | 
|  |  | 
|  | The `Parser` determines what kind of node the current `Token` belongs to based on the `Token::Kind` | 
|  | of `last_token_`, using the `Peek()` method. Then, the `Parser` updates its state and constructs the | 
|  | nodes through the use of the `ASTScope` class as well as the `ConsumeToken` and `MaybeConsumeToken` | 
|  | helper methods. | 
|  |  | 
|  | This example shows a simple non-recursive case line by line. The parser method looks like this: | 
|  |  | 
|  | ```c++ | 
|  | std::unique_ptr<raw::StringLiteral> Parser::ParseStringLiteral() { | 
|  | ASTScope scope(this); | 
|  | ConsumeToken(OfKind(Token::Kind::kStringLiteral)); | 
|  | if (!Ok()) | 
|  | return Fail(); | 
|  |  | 
|  | return std::make_unique<raw::StringLiteral>(scope.GetSourceElement()); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | It consumes a single token and returns a leaf node, `raw::StringLiteral`: | 
|  |  | 
|  | ```c++ | 
|  | class StringLiteral : public SourceElement { | 
|  | public: | 
|  | explicit StringLiteral(SourceElement const& element) : SourceElement(element) {} | 
|  |  | 
|  | virtual ~StringLiteral() {} | 
|  | } | 
|  | ``` | 
|  |  | 
|  | The method starts by creating a new `ASTScope`, which initializes the `SourceElement` that will | 
|  | later be used to create the returned node by pushing it onto `active_ast_scopes_`. The start of | 
|  | the `SourceElement` gets set to the first token that is consumed and the end gets set in the call | 
|  | to `GetSourceElement()` before returning. | 
|  |  | 
|  | The new `SourceElement` for the `StringLiteral` gets initialized inside the `ASTScope` constructor: | 
|  |  | 
|  | ```c++ | 
|  | parser_->active_ast_scopes_.push_back(raw::SourceElement(Token(), Token())); | 
|  | ``` | 
|  |  | 
|  | Then, `ConsumeToken` is called to process the next token. This method takes a predicate constructed | 
|  | with `OfKind()` and calls that predicate on `last_token_`'s kind or subkind. `OfKind()` returns a | 
|  | function that checks that its input matches the given kind or subkind. If the predicate fails (in | 
|  | this case, if the current token is not a string literal), the error gets stored in the class and | 
|  | the method returns. The `Ok()` call catches the error and stops the compiler with a parsing error. | 
|  | If the predicate succeeds, the start token of any `SourceElement`s in the stack that are | 
|  | uninitialized are set to the current token: | 
|  |  | 
|  | ```c++ | 
|  | for (auto& scope : active_ast_scopes_) { | 
|  | if (scope.start_.kind() == Token::Kind::kNotAToken) { | 
|  | scope.start_ = token; | 
|  | } | 
|  | } | 
|  | ``` | 
|  |  | 
|  | In this example, the start token is set to the top element of the stack since it was initialized at | 
|  | the start of the method. The `Parser` then advances to the next token by setting `previous_token_ = | 
|  | last_token_` and `last_token_ = Lex()`. | 
|  |  | 
|  | Finally, the resulting `StringLiteral` node is returned using `scope.GetSourceElement()`. This sets | 
|  | the end token of the `SourceElement` at the top of the stack to the `previous_token_`, and then | 
|  | returns the `SourceElement`. The final node has the same start and end token because | 
|  | `StringLiteral`s are only ever a single token long, but other methods may consume many tokens | 
|  | before calling `GetSourceElement()`. When the method returns, the destructor of `ASTScope` pops the | 
|  | top element off of `active_ast_scopes_`. | 
|  |  | 
|  | ## Compilation | 
|  |  | 
|  | At this point, the compiler has successfully constructed a `raw::File` for each input file and | 
|  | proceeds to compile these raw ASTs into flat ASTs in three steps: | 
|  |  | 
|  | 1. **Consumption**: where the raw ASTs (whose representation tries to match the FIDL grammar) are | 
|  | grouped by FIDL library and de-sugared into flat ASTs (whose representation tries to match the | 
|  | FIDL language semantics). This step converts one `raw::File` per file into one `flat::Library` | 
|  | per library. | 
|  | 2. **Topological Sorting**: to determine the order in which to resolve the flat AST nodes (see next | 
|  | step). | 
|  | 3. **Resolution**: which performs name resolution, type resolution, and type shape and size | 
|  | calculations. The resolution process is done node by node, and the resulting information is | 
|  | stored into the `flat::` node itself. | 
|  |  | 
|  | Note: Within the code, the term "compilation" may refer to these three steps as a whole (for | 
|  | example, all three steps are performed within the `Library::Compile` method), or it could refer | 
|  | specifically to the "resolution" step (for example, the logic for performing resolution is | 
|  | contained in the `Library::CompileDecl` method ). In general this document tries to use | 
|  | "resolution" when referring to the third step to disambiguate between the two possible meanings. | 
|  |  | 
|  | ### Consumption | 
|  |  | 
|  | After each file is parsed into a `raw::File` and an empty AST (`flat::Library`) has been | 
|  | initialized for each library, the AST needs to be updated with all of data from the `raw::File`s | 
|  | that correspond to it. This is done recursively with the `ConsumeFoo()` methods. Each | 
|  | `ConsumeFoo()` method generally takes the corresponding raw AST node as input, updates the | 
|  | `Library` class, and then return a `bool` to indicate success or failure. These methods are | 
|  | responsible for: | 
|  |  | 
|  | * Validating the placement of attributes; for example checking that the `Transport` attribute is | 
|  | only specified for protocols. | 
|  | * Checking for any undefined library dependencies (e.g. using `textures.Foo` will error if the | 
|  | `textures` library was not imported) | 
|  | * Converting the raw AST nodes into their flat AST node equivalents, and storing them in the | 
|  | `Library`'s `foo_declarations_` attribute. Initially the values of the flat AST nodes are unset | 
|  | but they are calculated later during compilation. | 
|  | * Registering each [declaration](#decl) by adding them to the `declarations_` vector. Const | 
|  | declarations and enum/bit fields (which declare a value) are also added to the `constants_` | 
|  | vector, whereas all other declarations (which declare a type) get their corresponding [type | 
|  | template](#typetemplate) added to the library's [typespace](#typespace). | 
|  |  | 
|  | ### Topological Sorting | 
|  |  | 
|  | Once all of the declarations for a given `Library` have been added to the `declarations_` vector, | 
|  | the compiler can proceed to resolve each individual declaration. However, it must do this in the | 
|  | correct order (so that any dependencies of a declaration are resolved before it); this is | 
|  | accomplished by first sorting the declarations into a separate `declarations_order_` vector, and | 
|  | then iterating through it to compile each declaration. The sorting is done in the | 
|  | `SortDeclarations()` method, and makes use of `DeclDependencies()` to determine the dependencies | 
|  | for a given declaration. | 
|  |  | 
|  | ### Resolution | 
|  |  | 
|  | Given the sorted declarations, the compilation happens through the `CompileFoo` methods, generally | 
|  | corresponding to the AST nodes (e.g. `CompileStruct`, `CompileConst`), with `CompileDecl` as the | 
|  | entrypoint. The main purpose of `CompileDecl` is for: | 
|  |  | 
|  | * [`TypeDecl`s](#decl) to determine their [`TypeShape`](#typeshape) | 
|  | * [`Const`s](#decl) to resolve their value, and | 
|  | * `TypeConstructor`s to set their [`Type`](#type). | 
|  |  | 
|  | Once this step is complete, the `flat::Library` contains all the necessary information for any code | 
|  | generation. The FIDL compiler can directly generate C bindings, or can generate a JSON IR that can | 
|  | be consumed by a separate backend | 
|  |  | 
|  | ### Additional Checks | 
|  |  | 
|  | Before marking compilation as successful, the FIDL compiler also does a few additional checks: for | 
|  | example, it will check that any constraints on attributes are satisfied, and that all declared | 
|  | library dependencies were actually used. | 
|  |  | 
|  | ## Backend generation | 
|  |  | 
|  | The FIDL compiler emits a [JSON intermediate representation (IR)][schema]. The JSON IR is | 
|  | consumed by a separate program, named the **back-end**, that generates the language bindings from | 
|  | the JSON IR. | 
|  |  | 
|  | The officially supported FIDL language back-ends are: | 
|  |  | 
|  | * C++, Rust, and Go: [fidlgen][fidlgen] | 
|  | * Dart: [fidlgen_dart][fidlgen-dart] | 
|  |  | 
|  | ## C Bindings | 
|  |  | 
|  | For historical reasons, C bindings are directly generated from the FIDL compiler - the C bindings | 
|  | do not support all features and types that are used with the C bindings need to be annotated with | 
|  | the [`Layout = "Simple"`][layout-attr] attribute. | 
|  |  | 
|  | ## C Family Runtime | 
|  |  | 
|  | `fidlc` is also responsible for generating "coding tables", which are instances of `fidl_type_t` used | 
|  | to represent the FIDL messages at runtime and are used by all of the bindings for the C family of | 
|  | languages (C, LLCPP, HLCPP). To do this, the flat AST is converted to an intermediate | 
|  | representation called the "coded" AST with the `CodedTypesGenerator`, which iterates through the | 
|  | `flat::Decl`s in a given `flat::Library` and converts each `raw::Decl` node to the corresponding | 
|  | `coded::Type` node (e.g. `flat::Struct` becomes a `coded::StructType`). | 
|  |  | 
|  | The coding tables are then generated by the `TablesGenerator`, which will generate C code for each | 
|  | `coded::Type` that constructs the equivalent `fidl_type_t` type. For example, for a | 
|  | `coded::StructType` called `MyStruct`, the `TablesGenerator` would write out C code that constructs | 
|  | an equivalent `fidl::FidlCodedStruct`, something like: | 
|  |  | 
|  | ```C | 
|  | const fidl_type_t MyStruct = fidl_type_t(::fidl::FidlCodedStruct( | 
|  | my_struct_fields, 1u, 32u, "mylibrary/MyStruct")); | 
|  | ``` | 
|  |  | 
|  | Coding tables for FIDL libraries in `//sdk/fidl` are generated into the out directory under | 
|  | `fidling/gen/sdk/fidl/LIBRARY/LIBRARY.fidl.tables.c`. For example | 
|  | `out/default/fidling/gen/sdk/fuchsia.sys/fuchsia.sys.fidl.tables.c`. The [README][coding-readme] | 
|  | gives additional context. The `fidl_type_t` definitions (such as for `FidlCodedStruct`) can be found | 
|  | in [internal.h][internal]. | 
|  |  | 
|  | ## Glossary | 
|  |  | 
|  | ### Decl {#decl} | 
|  |  | 
|  | The `Decl` is the base of all flat AST nodes, just like `SourceElement` is the base of all parser | 
|  | tree nodes, and corresponds to all possible declarations that a user can make in a FIDL file. There | 
|  | are two types of `Decl`s: | 
|  |  | 
|  | * `Const`s, which declare a value, and have a `value` attribute that gets resolved during | 
|  | compilation, and | 
|  | * `TypeDecl`s, which declare a message type or interface and have a `typeshape` attribute that | 
|  | gets set during compilation. | 
|  |  | 
|  | `TypeDecl`s representing an aggregate type (e.g. structs, tables, and unions) also | 
|  | have a static `Shape()` method, which contains the logic for determining the `Typeshape` | 
|  | of that given type. | 
|  |  | 
|  | ### FieldShape {#fieldshape} | 
|  |  | 
|  | A `FieldShape` describes the offset and padding information for members of an aggregate type like a | 
|  | struct or union. Generally these fields will require both a `Typeshape` as well as a `FieldShape` | 
|  |  | 
|  | ### Name {#name} | 
|  |  | 
|  | A `Name` represents a scope variable name, and consists of the library the name belongs to (or | 
|  | `nullptr` for global names), and the variable name itself as a string (for anonymous names) or a | 
|  | `SourceLocation`. `Name`s can also optionally include a `member_name_` field when referring to the | 
|  | field of a declared variable (for example `MyEnum.MyField`). | 
|  |  | 
|  | ### SourceElement {#sourceelement} | 
|  |  | 
|  | A `SourceElement` represents a block of code inside a fidl file and is parameterized by a `start_` | 
|  | and `end_` `Token`. All parser tree ("raw" AST) nodes inherit from this class. | 
|  |  | 
|  | ### SourceFile {#sourcefile} | 
|  |  | 
|  | Wrapper around a file, which is responsible for owning the data in that file. Also see [Virtual | 
|  | SourceFile](#virtualsourcefile) | 
|  |  | 
|  | ### SourceLocation {#sourcelocation} | 
|  |  | 
|  | Wrapper around a `StringView` and the `SourceFile` it comes from. It provides methods to get the | 
|  | surrounding line of the `StringView` as well as its location in the form of a | 
|  | `"[filename]:[line]:[col]"` string | 
|  |  | 
|  | ### SourceManager {#sourcemanager} | 
|  |  | 
|  | Wrapper around a set of `SourceFile`s that all relate to a single `Library`. | 
|  |  | 
|  | ### Token {#token} | 
|  |  | 
|  | A token is essentially a lexeme (in the form of a [`SourceLocation`](#sourcelocation) stored as the | 
|  | `location_` attribute), enhanced with two other pieces information that are useful to the parser | 
|  | during the later stages of compilation: | 
|  |  | 
|  | * `previous_end_`. A `SourceLocation`, which starts at the end of the previous token and ends at the | 
|  | start of this token. It contains data that is uninteresting to the parser such as whitespace. | 
|  | *  A kind and subkind that, together, classify the lexeme. The possible kinds are: | 
|  | * The special characters such as `Kind::LeftParen`, `Kind::Dot`, `Kind::Comma`, etc... | 
|  | * String and numeric constants | 
|  | * Identifiers. Tokens that are keywords (e.g. `const`, `struct`) are considered identifiers, but | 
|  | also have a subkind defined to identify which keyword it is (e.g. `Subkind::Const`, | 
|  | `Subkind::Struct`). All other tokens have a subkind of `None`. | 
|  | * Uninitialized tokens have a kind of `kNotAToken`. | 
|  |  | 
|  | ### Type {#type} | 
|  |  | 
|  | A struct representing an instance of a type. For example, the `vector<int32>:10?` type corresponds | 
|  | to an instance of the `VectorType` `TypeDecl` with `max_size_ = 10` and `maybe_arg_type` set to the | 
|  | `Type` corresponding to `int32`. Built-in types all have a static `Shape()` method that | 
|  | returns the `Typeshape` given the parameters for an instance of that type. User defined types (e.g. | 
|  | structs or unions) will all have a type of `IdentifierType` - the corresponding | 
|  | [`TypeDecl`](#typedecl), like `Struct` provides the static `Shape()` method instead. | 
|  |  | 
|  | ### TypeDecl {#typedecl} | 
|  |  | 
|  | See [`Decl`](#decl) | 
|  |  | 
|  | ### TypeShape {#typeshape} | 
|  |  | 
|  | Information about how objects of a type should be laid out in memory, including their size, | 
|  | alignment, depth, etc. | 
|  |  | 
|  | ### Typespace {#typespace} | 
|  |  | 
|  | The typespace is a map from [`Type`](#type) names to a `TypeTemplate` for that `Type`. During | 
|  | compilation, the typespace is initialized to include all of the built-in types (e.g. `"vector"` | 
|  | maps to `VectorTypeTemplate`), whereas user-defined types get added during the compilation process. | 
|  | This also ensures that a single type template instance exists per type and avoids name | 
|  | collisions/shadowing of types (e.g. fxbug.dev/7646). | 
|  |  | 
|  | ### TypeTemplate {#typetemplate} | 
|  |  | 
|  | Instances of TypeTemplates provide a `Create()` method to create a new instance of a specific | 
|  | `Type` - therefore there is a TypeTemplate subclass for each built-in FIDL type (e.g. | 
|  | `ArrayTypeTemplate`, `PrimitiveTypeTemplate`, etc.) as well as a single class for all user defined | 
|  | types (`TypeDeclTypeTemplate`), and one for type aliases (`TypeAliasTypeTemplate`). `Create()` | 
|  | takes as parameters the possible type parameters: argument type, nullability, and size. | 
|  |  | 
|  | For example, to create an object representing the type `vector<int32>:10?` the compiler would call | 
|  | the `Create()` method of the `VectorTypeTemplate` with an argument type of `int32`, max size of | 
|  | `10`, and a nullability of `types::Nullability::kNullable`. This call returns an instance of a | 
|  | `VectorType` with those parameters. Note that not all 3 of these parameters apply to all of the | 
|  | types (e.g. `PrimitiveType`s, like `int32` have none of the 3). The `Create()` method of the type | 
|  | template for each type automatically checks that only the relevant parameters are passed. | 
|  |  | 
|  | The concrete type for user defined types is the `IdentifierType`, which gets generated by the | 
|  | `TypeDeclTypeTemplate`. | 
|  |  | 
|  | ### Virtual SourceFile {#virtualsourcefile} | 
|  |  | 
|  | A subclass of `SourceFile` that has a fake "filename" and is initialized with no backing data. It | 
|  | exposes an `AddLine()` method to add data to the file, and is used as a backing `SourceFile` for | 
|  | content that does not appear directly in any of the input `SourceFile`s, like for generated | 
|  | anonymous [`Name`](#name)s. | 
|  |  | 
|  | <!-- xrefs --> | 
|  | [internal]: /zircon/system/ulib/fidl/include/lib/fidl/internal.h | 
|  | [layout-attr]: /docs/reference/fidl/language/attributes.md#layout_layout | 
|  | [fidlgen]: /garnet/go/src/fidl/compiler/backend | 
|  | [fidlgen-dart]: /tools/fidl/fidlgen_dart | 
|  | [schema]: /docs/reference/fidl/language/json-ir.md | 
|  | [coding-readme]: /src/lib/fidl/c/walker_tests/README.md |