| # Fidl Compiler Frontend |
| |
| This document serves as an overview of the internal workings of the FIDL compiler `fidlc`. |
| `fidlc` is a command line tool that takes in a number of FIDL files, and outputs |
| [FIDL JSON IR][schema]. |
| |
| ## Overview |
| |
| The main inputs to the compiler are the arguments to the `--files` flags, which describe a list of |
| files grouped by library. The compiler parses each file individually to obtain a parse tree for |
| each file: |
| |
| 1. Each file is loaded into a [`SourceFile`](#sourcefile), which owns the buffer backing the file |
| 2. The compiler initializes a [`Lexer`](#lexing), which takes the `SourceFile` as an argument to |
| its constructor. This class exposes a `Lex()` method, which returns the next [`Token`](#token) |
| in the file; it can be called repeatedly to get the sequence of `Token`s in the file. |
| 3. The compiler uses the `Lexer` to initialize a [`Parser`](#parsing) and then calls the `Parse()` |
| method, which constructs a parse tree. The code refers to parse trees as a raw AST. This function |
| returns a `raw::File`, which is the class representing the root node of the raw AST. |
| 4. Once the compiler has parsed each file into a parse tree, it then groups the parse trees into a |
| single AST (referred to in the code as a flat AST) for each library. The root of this flat AST |
| is a `flat::Library`. |
| * For each library, the compiler traverses every `raw::File` parse tree corresponding to that |
| library, converting `raw::` nodes to their `flat::` counterparts. For example, a |
| `raw::StructDeclaration` becomes a `flat::Struct`, and a `raw::Constant` becomes a |
| `flat::Constant`. The converted `flat::` AST nodes are then stored under a single |
| `flat::Library`. Initially these flat AST nodes contain the same information as the raw AST |
| nodes, but they also contain fields such as a `value` field for values and a `typeshape` field |
| for types that are later set during the compilation step. |
| 5. Once the AST has been fully initialized, the compiler evaluates constants and determines the |
| memory alignment and size information for the declared types. |
| |
| Note: One of the key distinction between raw and flat ASTs is that each raw tree represents a |
| single file, whereas each flat tree represents a single FIDL library that may contain multiple |
| files. |
| |
| Finally, we end up with a flat AST that is processed and ready for backend generation either to C |
| bindings or to a JSON IR. |
| |
| ## Lexing {#lexing} |
| |
| The `Lexer` works primarily by keeping track of two `char *` pointer into the file data. The |
| `current_` pointer marks the current location that the class is at. The`token_start_` pointer marks |
| the start of the lexeme that is currently being worked on. Each time the `Lex()` method is called, |
| the `current_` pointer is advanced until a complete lexeme has been traversed. Then, a `Token` is |
| constructed using the data in between the two pointers. |
| |
| The lexer also keeps track of a third `previous_end_` pointer so it can get the data between |
| lexemes (generally whitespace) when it constructs the `Token`. This example shows of how the |
| pointers change during a call to `Lex()` on the short FIDL snippet `const bool flag = true;`: |
| |
| Initial state after lexing the `const` keyword: |
| |
| ``` |
| const bool flag = true; |
| ^current_ |
| ^token_start_ |
| ^previous_end_ |
| ``` |
| |
| Whitespaces are skipped until the next lexeme: |
| |
| ``` |
| const bool flag = true; |
| ^current_ |
| ^token_start_ |
| ^previous_end_ |
| ``` |
| |
| The `current_` pointer is updated until the end of the lexeme: |
| |
| ``` |
| const bool flag = true; |
| ^current_ |
| ^token_start_ |
| ^previous_end_ |
| ``` |
| |
| At this point, the next `Token` that gets returned is ready to be constructed. A `Token` is created |
| with its `previous_end_` argument set to the data between `previous_end_` and `token_start_`. The |
| `location_` is set to the data between `token_start_` and `current_`. The kind is set to |
| `Identifier`. Before returning, the pointers are reset and end up in a state similar to the initial |
| state. This process can then be repeated for the next token: |
| |
| ``` |
| const bool flag = true; |
| ^current_ |
| ^token_start_ |
| ^previous_end_ |
| ``` |
| |
| Internally the two pointers are manipulated with these main methods: |
| |
| * `Skip()`. Skips over any unnecessary characters (e.g. whitespace) by |
| moving both the `current_` and the `token_start_` pointers forward. |
| * `Consume()`. Returns the current character and advances `current_`. |
| * `Reset()`. Returns the data between `token_start_` and `current_`. Then, sets `token_start_` |
| to the value of `current_`. |
| |
| ## Parsing {#parsing} |
| |
| The `Parser`'s goal is to convert a `Token` stream into a parse tree (the `raw::` AST) with the |
| `Parse()` method. The `Token` stream is generated by calling `Lex` repeatedly on a FIDL file. This |
| parser is implemented with [recursive |
| descent](https://en.wikipedia.org/wiki/Recursive_descent_parser). Each node of the raw AST has a |
| corresponding `ParseFoo()` method that consumes `Token`s from the `Lexer` and returns a |
| `unique_ptr` to an instance of that node. In case there is a failure, a `nullptr` is returned. |
| |
| Note: A node of the raw AST is a [start and end token pair](#sourceelement) with pointers to any |
| children. The start and end tokens of each child are contained within the start and end tokens |
| of its parent. The root node, which is a `raw::File`, has a start token equal to the |
| first token of the file and an end token equal to the last token of the file. |
| |
| The `Parser` keeps track of the: |
| |
| * Current nodes that are being built using a stack of [`SourceElements`](#sourceelement), which are |
| stored in `active_ast_scopes_`. |
| * Current and previous `Token`s that are being processed. `last_token_` and `previous_token_`, |
| respectively. |
| * Its own state machine inside of the `Parser::Lex` method. The current token (`last_token_`) is |
| always the next token that is about to be consumed, which effectively gives the parser a one |
| token lookahead (i.e. LL(1)). |
| |
| The `Parser` determines what kind of node the current `Token` belongs to based on the `Token::Kind` |
| of `last_token_`, using the `Peek()` method. Then, the `Parser` updates its state and constructs the |
| nodes through the use of the `ASTScope` class as well as the `ConsumeToken` and `MaybeConsumeToken` |
| helper methods. |
| |
| This example shows a simple non-recursive case line by line. The parser method looks like this: |
| |
| ```c++ |
| std::unique_ptr<raw::StringLiteral> Parser::ParseStringLiteral() { |
| ASTScope scope(this); |
| ConsumeToken(OfKind(Token::Kind::kStringLiteral)); |
| if (!Ok()) |
| return Fail(); |
| |
| return std::make_unique<raw::StringLiteral>(scope.GetSourceElement()); |
| } |
| ``` |
| |
| It consumes a single token and returns a leaf node, `raw::StringLiteral`: |
| |
| ```c++ |
| class StringLiteral : public SourceElement { |
| public: |
| explicit StringLiteral(SourceElement const& element) : SourceElement(element) {} |
| |
| virtual ~StringLiteral() {} |
| } |
| ``` |
| |
| The method starts by creating a new `ASTScope`, which initializes the `SourceElement` that will |
| later be used to create the returned node by pushing it onto `active_ast_scopes_`. The start of |
| the `SourceElement` gets set to the first token that is consumed and the end gets set in the call |
| to `GetSourceElement()` before returning. |
| |
| The new `SourceElement` for the `StringLiteral` gets initialized inside the `ASTScope` constructor: |
| |
| ```c++ |
| parser_->active_ast_scopes_.push_back(raw::SourceElement(Token(), Token())); |
| ``` |
| |
| Then, `ConsumeToken` is called to process the next token. This method takes a predicate constructed |
| with `OfKind()` and calls that predicate on `last_token_`'s kind or subkind. `OfKind()` returns a |
| function that checks that its input matches the given kind or subkind. If the predicate fails (in |
| this case, if the current token is not a string literal), the error gets stored in the class and |
| the method returns. The `Ok()` call catches the error and stops the compiler with a parsing error. |
| If the predicate succeeds, the start token of any `SourceElement`s in the stack that are |
| uninitialized are set to the current token: |
| |
| ```c++ |
| for (auto& scope : active_ast_scopes_) { |
| if (scope.start_.kind() == Token::Kind::kNotAToken) { |
| scope.start_ = token; |
| } |
| } |
| ``` |
| |
| In this example, the start token is set to the top element of the stack since it was initialized at |
| the start of the method. The `Parser` then advances to the next token by setting `previous_token_ = |
| last_token_` and `last_token_ = Lex()`. |
| |
| Finally, the resulting `StringLiteral` node is returned using `scope.GetSourceElement()`. This sets |
| the end token of the `SourceElement` at the top of the stack to the `previous_token_`, and then |
| returns the `SourceElement`. The final node has the same start and end token because |
| `StringLiteral`s are only ever a single token long, but other methods may consume many tokens |
| before calling `GetSourceElement()`. When the method returns, the destructor of `ASTScope` pops the |
| top element off of `active_ast_scopes_`. |
| |
| ## Compilation |
| |
| At this point, the compiler has successfully constructed a `raw::File` for each input file and |
| proceeds to compile these raw ASTs into flat ASTs in three steps: |
| |
| 1. **Consumption**: where the raw ASTs (whose representation tries to match the FIDL grammar) are |
| grouped by FIDL library and de-sugared into flat ASTs (whose representation tries to match the |
| FIDL language semantics). This step converts one `raw::File` per file into one `flat::Library` |
| per library. |
| 2. **Topological Sorting**: to determine the order in which to resolve the flat AST nodes (see next |
| step). |
| 3. **Resolution**: which performs name resolution, type resolution, and type shape and size |
| calculations. The resolution process is done node by node, and the resulting information is |
| stored into the `flat::` node itself. |
| |
| Note: Within the code, the term "compilation" may refer to these three steps as a whole (for |
| example, all three steps are performed within the `Library::Compile` method), or it could refer |
| specifically to the "resolution" step (for example, the logic for performing resolution is |
| contained in the `Library::CompileDecl` method ). In general this document tries to use |
| "resolution" when referring to the third step to disambiguate between the two possible meanings. |
| |
| ### Consumption |
| |
| After each file is parsed into a `raw::File` and an empty AST (`flat::Library`) has been |
| initialized for each library, the AST needs to be updated with all of data from the `raw::File`s |
| that correspond to it. This is done recursively with the `ConsumeFoo()` methods. Each |
| `ConsumeFoo()` method generally takes the corresponding raw AST node as input, updates the |
| `Library` class, and then return a `bool` to indicate success or failure. These methods are |
| responsible for: |
| |
| * Validating the placement of attributes; for example checking that the `Transport` attribute is |
| only specified for protocols. |
| * Checking for any undefined library dependencies (e.g. using `textures.Foo` will error if the |
| `textures` library was not imported) |
| * Converting the raw AST nodes into their flat AST node equivalents, and storing them in the |
| `Library`'s `foo_declarations_` attribute. Initially the values of the flat AST nodes are unset |
| but they are calculated later during compilation. |
| * Registering each [declaration](#decl) by adding them to the `declarations_` vector. Const |
| declarations and enum/bit fields (which declare a value) are also added to the `constants_` |
| vector, whereas all other declarations (which declare a type) get their corresponding [type |
| template](#typetemplate) added to the library's [typespace](#typespace). |
| |
| ### Topological Sorting |
| |
| Once all of the declarations for a given `Library` have been added to the `declarations_` vector, |
| the compiler can proceed to resolve each individual declaration. However, it must do this in the |
| correct order (so that any dependencies of a declaration are resolved before it); this is |
| accomplished by first sorting the declarations into a separate `declarations_order_` vector, and |
| then iterating through it to compile each declaration. The sorting is done in the |
| `SortDeclarations()` method, and makes use of `DeclDependencies()` to determine the dependencies |
| for a given declaration. |
| |
| ### Resolution |
| |
| Given the sorted declarations, the compilation happens through the `CompileFoo` methods, generally |
| corresponding to the AST nodes (e.g. `CompileStruct`, `CompileConst`), with `CompileDecl` as the |
| entrypoint. The main purpose of `CompileDecl` is for: |
| |
| * [`TypeDecl`s](#decl) to determine their [`TypeShape`](#typeshape) |
| * [`Const`s](#decl) to resolve their value, and |
| * `TypeConstructor`s to set their [`Type`](#type). |
| |
| Once this step is complete, the `flat::Library` contains all the necessary information for any code |
| generation. The FIDL compiler can directly generate C bindings, or can generate a JSON IR that can |
| be consumed by a separate backend |
| |
| ### Additional Checks |
| |
| Before marking compilation as successful, the FIDL compiler also does a few additional checks: for |
| example, it will check that any constraints on attributes are satisfied, and that all declared |
| library dependencies were actually used. |
| |
| ## Backend generation |
| |
| The FIDL compiler emits a [JSON intermediate representation (IR)][schema]. The JSON IR is |
| consumed by a separate program, named the **back-end**, that generates the language bindings from |
| the JSON IR. |
| |
| The officially supported FIDL language back-ends are: |
| |
| * C++: |
| * High level: [fidlgen_hlcpp] |
| * Low level: [fidlgen_llcpp] |
| * Unified: [fidlgen_llcpp] (shared tooling for all new C++ backends) |
| * Dart: [fidlgen_dart] |
| * Go: [fidlgen_go] |
| * Rust: [fidlgen_rust] |
| |
| ## C Bindings |
| |
| For historical reasons, C bindings are directly generated from the FIDL compiler - the C bindings |
| do not support all features and types that are used with the C bindings need to be annotated with |
| the [`Layout = "Simple"`][layout-attr] attribute. |
| |
| ## C Family Runtime |
| |
| `fidlc` is also responsible for generating "coding tables", which are instances of `fidl_type_t` used |
| to represent the FIDL messages at runtime and are used by all of the bindings for the C family of |
| languages (C, LLCPP, HLCPP). To do this, the flat AST is converted to an intermediate |
| representation called the "coded" AST with the `CodedTypesGenerator`, which iterates through the |
| `flat::Decl`s in a given `flat::Library` and converts each `raw::Decl` node to the corresponding |
| `coded::Type` node (e.g. `flat::Struct` becomes a `coded::StructType`). |
| |
| The coding tables are then generated by the `TablesGenerator`, which will generate C code for each |
| `coded::Type` that constructs the equivalent `fidl_type_t` type. For example, for a |
| `coded::StructType` called `MyStruct`, the `TablesGenerator` would write out C code that constructs |
| an equivalent `fidl::FidlCodedStruct`, something like: |
| |
| ```C |
| const fidl_type_t MyStruct = fidl_type_t(::fidl::FidlCodedStruct( |
| my_struct_fields, 1u, 32u, "mylibrary/MyStruct")); |
| ``` |
| |
| Coding tables for FIDL libraries in `//sdk/fidl` are generated into the out directory under |
| `fidling/gen/sdk/fidl/LIBRARY/LIBRARY.fidl.tables.c`. For example |
| `out/default/fidling/gen/sdk/fuchsia.sys/fuchsia.sys.fidl.tables.c`. The [README][coding-readme] |
| gives additional context. The `fidl_type_t` definitions (such as for `FidlCodedStruct`) can be found |
| in [internal.h][internal]. |
| |
| ## Glossary |
| |
| ### Decl {#decl} |
| |
| The `Decl` is the base of all flat AST nodes, just like `SourceElement` is the base of all parser |
| tree nodes, and corresponds to all possible declarations that a user can make in a FIDL file. There |
| are two types of `Decl`s: |
| |
| * `Const`s, which declare a value, and have a `value` attribute that gets resolved during |
| compilation, and |
| * `TypeDecl`s, which declare a message type or interface and have a `typeshape` attribute that |
| gets set during compilation. |
| |
| `TypeDecl`s representing an aggregate type (e.g. structs, tables, and unions) also |
| have a static `Shape()` method, which contains the logic for determining the `Typeshape` |
| of that given type. |
| |
| ### FieldShape {#fieldshape} |
| |
| A `FieldShape` describes the offset and padding information for members of an aggregate type like a |
| struct or union. Generally these fields will require both a `Typeshape` as well as a `FieldShape` |
| |
| ### Name {#name} |
| |
| A `Name` represents a scope variable name, and consists of the library the name belongs to (or |
| `nullptr` for global names), and the variable name itself as a string (for anonymous names) or a |
| `SourceLocation`. `Name`s can also optionally include a `member_name_` field when referring to the |
| field of a declared variable (for example `MyEnum.MyField`). |
| |
| ### SourceElement {#sourceelement} |
| |
| A `SourceElement` represents a block of code inside a fidl file and is parameterized by a `start_` |
| and `end_` `Token`. All parser tree ("raw" AST) nodes inherit from this class. |
| |
| ### SourceFile {#sourcefile} |
| |
| Wrapper around a file, which is responsible for owning the data in that file. Also see [Virtual |
| SourceFile](#virtualsourcefile) |
| |
| ### SourceLocation {#sourcelocation} |
| |
| Wrapper around a `StringView` and the `SourceFile` it comes from. It provides methods to get the |
| surrounding line of the `StringView` as well as its location in the form of a |
| `"[filename]:[line]:[col]"` string |
| |
| ### SourceManager {#sourcemanager} |
| |
| Wrapper around a set of `SourceFile`s that all relate to a single `Library`. |
| |
| ### Token {#token} |
| |
| A token is essentially a lexeme (in the form of a [`SourceLocation`](#sourcelocation) stored as the |
| `location_` attribute), enhanced with two other pieces information that are useful to the parser |
| during the later stages of compilation: |
| |
| * `previous_end_`. A `SourceLocation`, which starts at the end of the previous token and ends at the |
| start of this token. It contains data that is uninteresting to the parser such as whitespace. |
| * A kind and subkind that, together, classify the lexeme. The possible kinds are: |
| * The special characters such as `Kind::LeftParen`, `Kind::Dot`, `Kind::Comma`, etc... |
| * String and numeric constants |
| * Identifiers. Tokens that are keywords (e.g. `const`, `struct`) are considered identifiers, but |
| also have a subkind defined to identify which keyword it is (e.g. `Subkind::Const`, |
| `Subkind::Struct`). All other tokens have a subkind of `None`. |
| * Uninitialized tokens have a kind of `kNotAToken`. |
| |
| ### Type {#type} |
| |
| A struct representing an instance of a type. For example, the `vector<int32>:10?` type corresponds |
| to an instance of the `VectorType` `TypeDecl` with `max_size_ = 10` and `maybe_arg_type` set to the |
| `Type` corresponding to `int32`. Built-in types all have a static `Shape()` method that |
| returns the `Typeshape` given the parameters for an instance of that type. User defined types (e.g. |
| structs or unions) will all have a type of `IdentifierType` - the corresponding |
| [`TypeDecl`](#typedecl), like `Struct` provides the static `Shape()` method instead. |
| |
| ### TypeDecl {#typedecl} |
| |
| See [`Decl`](#decl) |
| |
| ### TypeShape {#typeshape} |
| |
| Information about how objects of a type should be laid out in memory, including their size, |
| alignment, depth, etc. |
| |
| ### Typespace {#typespace} |
| |
| The typespace is a map from [`Type`](#type) names to a `TypeTemplate` for that `Type`. During |
| compilation, the typespace is initialized to include all of the built-in types (e.g. `"vector"` |
| maps to `VectorTypeTemplate`), whereas user-defined types get added during the compilation process. |
| This also ensures that a single type template instance exists per type and avoids name |
| collisions/shadowing of types (e.g. fxbug.dev/7646). |
| |
| ### TypeTemplate {#typetemplate} |
| |
| Instances of TypeTemplates provide a `Create()` method to create a new instance of a specific |
| `Type` - therefore there is a TypeTemplate subclass for each built-in FIDL type (e.g. |
| `ArrayTypeTemplate`, `PrimitiveTypeTemplate`, etc.) as well as a single class for all user defined |
| types (`TypeDeclTypeTemplate`), and one for type aliases (`TypeAliasTypeTemplate`). `Create()` |
| takes as parameters the possible type parameters: argument type, nullability, and size. |
| |
| For example, to create an object representing the type `vector<int32>:10?` the compiler would call |
| the `Create()` method of the `VectorTypeTemplate` with an argument type of `int32`, max size of |
| `10`, and a nullability of `types::Nullability::kNullable`. This call returns an instance of a |
| `VectorType` with those parameters. Note that not all 3 of these parameters apply to all of the |
| types (e.g. `PrimitiveType`s, like `int32` have none of the 3). The `Create()` method of the type |
| template for each type automatically checks that only the relevant parameters are passed. |
| |
| The concrete type for user defined types is the `IdentifierType`, which gets generated by the |
| `TypeDeclTypeTemplate`. |
| |
| ### Virtual SourceFile {#virtualsourcefile} |
| |
| A subclass of `SourceFile` that has a fake "filename" and is initialized with no backing data. It |
| exposes an `AddLine()` method to add data to the file, and is used as a backing `SourceFile` for |
| content that does not appear directly in any of the input `SourceFile`s, like for generated |
| anonymous [`Name`](#name)s. |
| |
| <!-- xrefs --> |
| [internal]: /zircon/system/ulib/fidl/include/lib/fidl/internal.h |
| [layout-attr]: /docs/reference/fidl/language/attributes.md#layout_layout |
| [fidlgen_dart]: /tools/fidl/fidlgen_dart |
| [fidlgen_go]: /tools/fidl/fidlgen_go |
| [fidlgen_hlcpp]: /tools/fidl/fidlgen_hlcpp |
| [fidlgen_llcpp]: /tools/fidl/fidlgen_llcpp |
| [fidlgen_rust]: /tools/fidl/fidlgen_rust |
| [schema]: /docs/reference/fidl/language/json-ir.md |
| [coding-readme]: /src/lib/fidl/c/walker_tests/README.md |