The FIDL formatter strives to be highly opinionated, specifically aiming to make the set of valid formattings for any given semantic input as small as possible. The model here is gofmt
, which is well known for being non-configurable, allowing it to enforce a canonical style.
It is worth noting that a great source of complexity in formatters in other languages has been the addition of line wrapping: formatters that omit it, such as the aforementioned gofmt
, are relatively simple to implement, whereas rustfmt
is much more complex, due to the need to produce a “correct” line wrapping in all cases. The FIDL formatter aims for a middle ground: the line wrapping algorithm is simple, and therefore doesn't truncate lines below the desired column size in all cases, but does so for the most common ones, keeping things readable without adding too much complexity.
Conceptually, the formatting algorithm operates on the raw AST, “pretty printing” its contents per a fixed set of rules. This algorithm can be thought of as having three phases:
Note: For examples in this document certain relevant lines whose size or subspans need to be highlighted may be surrounded by specially styled comments that enumerate the capacity or subspanning of the line in question, such as |---10---|
for a line with 10 characters remaining. Such highlights come in three forms: 1. highlights using -
(ex: |---10---|
) indicate that the marked line does not overflow the remaining capacity and does not need wrapping, 2. highlights using +
(ex: |+++10+++|
) indicate that the marked line overflows the remaining capacity, and needs to be wrapped, and 3. highlights using =
(ex: |===10===|
) indicate that the marked line overflows the remaining capacity, but cannot possibly be wrapped any further.
Note: Examples in this section use a column width of 50 for ease of demonstration; the column width on the actual formatter is 100, to match that of C++.
All non-comment FIDL statement types have well-defined line breaking conventions. For example, a type declaration whose layout has at least one member may have the main declaration on the first line, followed by one line for each member of the layout, followed by a line closing the layout definition and optionally listing constraints:
// |-----------------------50-----------------------| type U = union { 1: foo string; }:optional;
The simplicity of this model breaks down when we have elements that force a line in this default layout to exceed the maximum column width:
// |+++++++++++++++++++++++50+++++++++++++++++++++++| type ThisUnionHasAnExtremelyLongNameForNoReason = union { 1: foo string; }:optional;
In such cases, the offending lines may be wrapped to avoid going over the limit. All wrapped content for a line split in such a manner are continued on the next line(s), doubly indented:
// |-----------------------50-----------------------| type ThisUnionHasAnExtremelyLongNameForNoReason = union { 1: foo string; }:optional;
The algorithm for determining where to split overlong lines operates by defining “subspans” for each type of FIDL statement. Any line that exceeds the maximum column width is split into its subspans, with the first subspan remaining on the original line, and each subsequent subspan getting a doubly indented line of its own. Subspans may recursively contain other subspans, allowing this process to be repeated until every subspan is placed such that it does not exceed the maximum column width, or otherwise cannot be split (ie, has no further subspans).
Consider the following example, with all of the nested subspans shown below the line in question:
// |+++++++++++++++++++++++50+++++++++++++++++++++++| type Foo = vector<ThisTypeNameIsAbsurdlyTooLong>:<16,optional>; // |---A--| |--------------------------B-------------------------| // |------------------C------------------||-----D-------| // |-E-||---F----|
The line is too long, so it is split into its top-level subspans:
// |-----------------------50-----------------------| type Foo // |+++++++++++++++++++42+++++++++++++++++++| = vector<ThisTypeNameIsAbsurdlyTooLong>:<16,optional>; // |---------------C---------------------||-----D-------| // |-E-||---F----|
The second subspan is still too long, so it is split into its constituent pieces as well. Note that additional indentation only occurs once per statement - all further subspan line wraps are indented to the same depth as the first one:
// |-----------------------50-----------------------| type Foo // |-------------------42-------------------| = vector<ThisTypeNameIsAbsurdlyTooLong> // |-------------------42-------------------| :<16,optional>;
All of the lines are now shorter than the maximum column length. If the name of the vector
s inner type had been even longer, we would have been unable to split it further:
// |-----------------------50-----------------------| type Foo // |===================42===================| = vector<ThisTypeNameIsLiterallyLongerThanTheMaximumColumnWidth> // |-------------------42-------------------| :<16,optional>;
The same rules apply for nested anonymous layouts, or any other lines that start from an indented position. The only difference for these cases is the maximum column width for the wrapping operation:
// |-----------------------50-----------------------| type S = struct { // |---------------------46---------------------| anon struct { // |+++++++++++++++++++42+++++++++++++++++++| ohDearThisFieldNameIsTooLong AndTheTypeNameIsTooLongAsWell:optional; // |-------------A------------| |------------B--------------||----C---| }; };
Becomes:
// |-----------------------50-----------------------| type S = struct { // |---------------------46---------------------| a struct { // |-------------------42-------------------| ohDearThisFieldNameIsTooLong // |---------------34---------------| AndTheTypeNameIsTooLongAsWell // |---------------34---------------| :optional; }; };
A consequence of these rules is that very deeply nested layouts will often be “maximally” wrapped, with even relatively small subspans pushing past the maximum column width boundary:
// |-----------------------50-----------------------| type S = struct { // |---------------------46---------------------| a struct { // |-------------------42-------------------| b struct { // |-----------------38-----------------| c struct { // |---------------34---------------| d struct { // |-------------30-------------| e struct { // |-----------26-----------| somePrettyReasonableFieldName // |-------18-------| client_end // |-------18-------| :<SomeProtocolName,optional>; }; }; }; }; }; };
Deeply nested layouts are generally considered difficult to read, so wrapping them “prettily” is a non-goal at best for the formatter. In fact, “ugly” wrapping could even be considered a desirable outcome, since it subtly discourages such deep layout declarations.
Finally, one interesting result of this algorithm is that each entry in a list must be its own subspan, so each entry gets its own line when its parent subspan is wrapped. For example:
// |+++++++++++++++++++++++50+++++++++++++++++++++++| @my_attr_invocation(arg_a=123,arg_b=456,arg_c="foobar",arg_c="bazquux") // |-------A----------||----------------------B--------------------------| // |---C----||---D----||------E------||------F-------| // |-----------------------50-----------------------| protocol P {};
Becomes:
// |-----------------------50-----------------------| @my_attr_invocation( // |-------------------42-------------------| arg_a=123, // |-------------------42-------------------| arg_b=456, // |-------------------42-------------------| arg_c="foobar", // |-------------------42-------------------| arg_c="bazquux") // |-----------------------50-----------------------| protocol P {};
While a clang
-style BinPackArguments-enabled wrapping which tries to fit as many elements into each line as possible may produce more compact, “prettier” output, it also generates a great deal more complexity in the implementation, and is thus intentionally avoided in favor of a “one-subelement-per-line” algorithm.
The comment formatting algorithm is less opinionated than that for text wrapping, simply because doing proper comment wrapping would likely require implementing a markdown formatter, which is not desirable at this time. The rules for comment formatting only deal with ensuring that comment blocks are properly indented, and no attempt is made to wrap or otherwise re-format the contents of the comment lines themselves. There are two kinds of comment blocks worth considering: single line comments that suffix a line containing some non-whitespace FIDL syntax (aka “inline blocks”), and groups of one or more contiguous comment lines that contain no non-comment syntax between them (aka “standalone blocks”, a category which includes doc comments). For example:
/// I am a single-line standalone block. type S = struct { // I am an inline block. // I am multi-line // standalone block. };
The formatter will not touch inline comment blocks - they are always placed at the ends of their respective lines, one space after the last token. For inline blocks that occur between tokens that would otherwise be on the same line per the statement rules listed below, a line break is placed immediately after the comment, and the remainder of the line is doubly-indented. When formatting standalone blocks, the formatter adjusts the indentation of each line to match the indentation of the first non-comment line immediately preceding or following the block, taking whichever indentation is greater. As an example, the following unformatted FIDL:
// Standalone block //#1 /// Standalone /// block ///#2 type S = struct // Inline (mid-span) block #3 { a struct { // Inline block #4 /// Standalone block #5 (when formatted, next line has greater indentation) b string; // Inline block #6 // Standalone // block #7 (when formatted, previous line has greater indentation) };}; // Inline block #8 // Standalone block // #9
Becomes:
// Standalone block // #1 /// Standalone /// block /// #2 type S = struct // Inline (mid-span) block #3 { a struct { // Inline block #4 /// Standalone block #5 (when formatted, next line has greater indentation) b string; // Inline block #6 // Standalone // block #7 (when formatted, previous line has greater indentation) }; }; // Inline block #8 // Standalone block // #9
Note: Examples in this section use a column width of 30 for ease of demonstration; the column width on the actual formatter is 100, to match that of C++.
The formatting rules for each type of FIDL statement specify the default line breaks and the wrapping subspan trees for the text wrapping algorithm.
library
statements are always on their own line, and are not subspanned for the purposes of text wrapping. They always take the following form:
// |=============30=============| library some.very.extremely.absurdly.overlong.library.name;
using
statements occupy their own line by default, and have a sigle subspan if the as
keyword is used. They always take the following form:
// |+++++++++++++30+++++++++++++| using some.very.extremely.absurdly.overlong.library.name as foo;
// |=============30=============| using some.very.extremely.absurdly.overlong.library.name as foo;
Attribute list declarations always occupy their own line by default. There are two top-level subspans: the attribute name (ie, everything between @
and (
, inclusive), and the argument list (everything else, including the closing )
). The argument list can be further subspanned, with each argument taking its own line. The last argument's subspan includes the closing )
symbol.
// |+++++++++++++30+++++++++++++| @my_attr_invocation(arg_a="this is a long string",arg_b=12345) // |------------------||----------------------------------------| // |----------------------------||----------|
// |=============30=============| @my_attr_invocation( arg_a="this is a long string", arg_b=12345)
const
statements occupy a single line by default. They are subspanned into two portions: a declaration containing everything prior to but excluding the =
sign, and a value portion for everything after it. The first subspan is further subspanned into its name and type components.
// |+++++++++++++30+++++++++++++| const MyOverlongStringName string:<123,optional> = "some val for my string"; // |----------------------------------------------| |-------------------------| // |-------------------------| |------------------|
// |=============30=============| const MyOverlongStringName string:<123,optional> = "some val for my string";
alias
statements occupy a single line by default. They are subspanned into two portions: a name containing everything prior to but excluding the =
sign, and a layout portion for everything after it.
// |+++++++++++++30+++++++++++++| alias MyOverlongStringType = string:<123,optional>; // |------------------------| |----------------------|
// |=============30=============| alias MyOverlongStringType = string:<123,optional>;
type
statements should only keep the first line of their constituent layout declaration on the declaration's line. Type declarations have two top-level subspans: the keyword/name and the layout declaration. The layout declaration may be further subspanned per the rules in the layout declarations section below.
// |+++++++++++++30+++++++++++++| type MyVeryLongStructName = struct {};
// |=============30=============| type MyVeryLongStructName = struct {};
Layout declaring statements take two forms: those that contain no members, such as struct{}
, occupy a single line by default. Membered layouts occupy at least three lines: one to open the declaration, some number of lines for its contained members and attributes (all of which are laid out according to their own rules, depending on statement type), and one more to close it. Additionally, ordinaled layout members will be right aligned, such that all ordinals up to 5 digits in length end at the same character depth on their respective lines. Neither the first nor last line may be subspanned, meaning that layout declarations always take the form:
// |-------------30-------------| strict resource union { // [LAYOUT_CONTENTS] };
Layout member statements should appear on one line by default, unless they contain an anonymous layout, in which case only the first line of that layout should be on the member's line. Members have three top-level subspans: ordinal and name, layout reference, and value. The layout reference may be further subspanned per the rules in the layout declarations section above.
type E = enum { // |+++++++++++26+++++++++++| MY_LONG_ENUM_ELEMENT_NAME = 1; // |-----------------------| |--| }; type S = struct { // |+++++++++++26+++++++++++| foo vector<vector<string:optional>>:<16,optional>; // |-| |-----------------------------||-------------| bar string:optional = "some val for my string"; // |-| |-------------| |-------------------------| my_anonymous_layout struct { // |-----------------| |------| baz string; // |-| |-----| }; }; type T = table { // |+++++++++++26+++++++++++| 1: foo vector<vector<string:optional>>:<16,optional>; // |----| |-----------------------------||-------------| 002: my_anonymous_layout struct { // |----------------------| |------| baz string; // |-| |-----| }; };
type E = enum { // |===========26===========| MY_LONG_ENUM_ELEMENT_NAME = 1; }; type S = struct { // |===========26===========| foo vector<vector<string:optional>> :<16,optional>; bar string:optional = "some val for my string"; my_anonymous_layout struct { baz string; }; }; type T = table { // |===========26===========| 1: foo vector<vector<string:optional>> :<16,optional>; 002: my_anonymous_layout struct { baz string; }; };
Barring the special case of an empty protocol
declaration, which always occupies a single contiguous line, protocol
declarations must occupy at least three lines: one to open the declaration, some number of lines for its contained methods and attributes (all of which are laid out according to their own rules, depending on statement type), and one more to close it. Neither the first nor last line may be subspanned, meaning that protocol declarations always take the form:
// |-------------30-------------| protocol MyProtocol { // [PROTOCOL_CONTENTS] };
compose
statement inside of protocol
declarations always occupy a single line:
protocol MyProtocol { // |-----------26-----------| compose SomeProtocolBeingComposed; // [PROTOCOL_CONTENTS] };
protocol
method declarations occupy one line by default, though any layout declarations used for request or response arguments may cause line splits. Overlong method declaration lines may be split into three subspans: one containing the parenthesized request, one containing the ->
followed by the parenthesized response, and the last for the error description.
protocol MyProtocol { // |+++++++++++26+++++++++++| Foo(FooReq) -> (FooResp) error FooError; // |---------| |----------| |-------------| };
protocol MyProtocol { // |-----------26-----------| Foo(FooReq) -> (FooResp) error FooError; };
Barring the special case of an empty service
declaration, which always occupies a single contiguous line, service
declarations must occupy at least three lines: one to open the declaration, some number of lines for its contained protocols and attributes (all of which are laid out according to their own rules, depending on statement type), and one more to close it. Neither the first nor last line may be subspanned, meaning that service declarations always take the form:
// |-------------30-------------| service MyService { // [SERVICE_CONTENTS] };
The formatter is implemented in three phases:
SpanSequence
s.SpanSequence
tree, using an algorithm that visits each node, and decides whether or not it can/should be wrapped, based on the SpanSequence
type and the amount of column width remaining on the current line being printed.The SpanSequenceTreeVisitor
visits every raw AST node, making decisions about how it should be divided into SpanSequence
s, whether leading/trailing whitespace should be preserved for that node or its constituent tokens, and so on. Once this is achieved, the contents of each SpanSequence
can be printed according to small set of rules (described below), resulting in a fully formatted text output.
There are six types of SpanSequence
s:
Corresponds 1:1 with FIDL tokens. In the line library foo.bar;
, there are 5 such tokens: library
, foo
, .
, bar
, and ;
.
Stores the text representing an inline comment, so that it may later be appended to the end of its parent SpanSequence
, regardless of any overflow considerations.
A line that may never be split for the purposes of line wrapping. The line library foo.bar;
is a single AtomicSpanSequence
with five child TokenSpanSequence
s, as line breaks are disallowed in library
declarations.
Stores the text representing a standalone comment. This SpanSequence
is a bit of a special case: even though it is exclusively found as a child of AtomicSpanSequence
, it is required to be placed on its own line, with bounding newlines preserved. So, even though a library
declaration is a single AtomicSpanSequence
, in the following case, it will still result in multi-line output.
library foo // My oddly placed comment .bar;
This text an AtomicSpanSequence
of five TokenSpanSequence
s with a single StandaloneCommentSpanSequence
interspersed. This results in the above formatting being preserved, in spite of AtomicSpanSequence
s “no line splitting” rule.
A line that may be split into its constituent children, thus creating a line break to enable text wrapping. For example, the line const Foo string = "abc"
contains DivisibleSpanSequence
with two children: another DivisibleSpanSequence
encapsulating const Foo string
, and an AtomicSpanSequence
encapsulating = "abc"
.
A set of lines that MUST be split by default. For example, the following layout contains three MultilineSpanSequence
s:
resource struct { a zx.handle; };
Every .fidl
file's SpanSequence
tree is has a MutlilineSpanSequence
representing the entire file as its root node.
Consider the following FIDL:
// |+++++++++++++30+++++++++++++| library foo.bar; type AnOverlongNamedTable = table { // inline 1: an_overlong_named_member struct { // block // text anon_struct_field uint64; }; };
It may be split into a SpanSequence
tree shaped like so:
Because each of the three DivisibleSpanSequence
s in this tree are in fact overflowing (that is, if printed onto their current lines, they would go over the 30 column maximum), they are wrapped with double indentation, resulting in the following output:
// |=============30=============| library foo.bar; type AnOverlongNamedTable = table { 1: an_overlong_named_member struct { // block // text anon_struct_field uint64; }; };
gofmt
, serves as the model of an opinionated formatter, though it is unfortunately not formally specified. This formatter’s implementation is relatively simple, due in large part to the decision to not enforce line wrapping.