| # Schema |
| |
| The syntax of the schema language (aka IDL, |
| [Interface Definition Language](https://en.wikipedia.org/wiki/Interface_description_language)) |
| should look quite familiar to users of any of the C family of languages, and |
| also to users of other IDLs. Let's look at an example first: |
| |
| ```c title="monster.fbs" linenums="1" |
| // example IDL file |
| |
| namespace MyGame; |
| |
| attribute "priority"; |
| |
| enum Color : byte { Red = 1, Green, Blue } |
| |
| union Any { Monster, Weapon, Pickup } |
| |
| struct Vec3 { |
| x:float; |
| y:float; |
| z:float; |
| } |
| |
| table Monster { |
| pos:Vec3; |
| mana:short = 150; |
| hp:short = 100; |
| name:string; |
| friendly:bool = false (deprecated, priority: 1); |
| inventory:[ubyte]; |
| color:Color = Blue; |
| test:Any; |
| } |
| |
| table Weapon {} |
| table Pickup {} |
| |
| root_type Monster; |
| ``` |
| |
| ## Tables |
| |
| Tables are the main way of defining objects in FlatBuffers. |
| |
| ```c title="monster.fbs - Example Table" linenums="17" |
| table Monster { |
| pos:Vec3; |
| mana:short = 150; |
| hp:short = 100; |
| name:string; |
| friendly:bool = false (deprecated, priority: 1); |
| inventory:[ubyte]; |
| color:Color = Blue; |
| test:Any; |
| } |
| ``` |
| |
| They consist of a name (here `Monster`) and a list of [fields](#fields). This |
| field list can be appended to (and deprecated from) while still maintaining |
| compatibility. |
| |
| ### Fields |
| |
| Table fields have a name identifier, a [type](#types), optional default value, |
| optional [attributes](#attributes) and ends with a `;`. See the |
| [grammar](grammar.md) for full details. |
| |
| ```ebnf |
| field_decl = ident `:` type [ `=` scalar ] metadata `;` |
| ``` |
| |
| Fields do not have to appear in the wire representation, and you can choose to |
| omit fields when constructing an object. You have the flexibility to add fields |
| without fear of bloating your data. This design is also FlatBuffer's mechanism |
| for forward and backwards compatibility. |
| |
| There are three, mutually exclusive, reactions to the non-presence of a table's |
| field in the binary data. |
| |
| #### 1. Default |
| |
| Default value fields with return the default value as defined in the schema. If |
| the default value is not specified in the schema, it will be `0` for scalar |
| types, or `null` for other types. |
| |
| ```c++ |
| mana:short = 150; |
| hp:short; |
| inventory:[ubyte]; |
| ``` |
| |
| Here `mana` would default to the value `150`, `hp` to value `0`, and `inventory` |
| to `null`, if those fields are not set. |
| |
| Only scalar values can have explicit defaults, non-scalar fields (strings, |
| vectors, tables) are `null` when not present. |
| |
| This is the normal mode that fields will take. |
| |
| ??? danger "Don't change Default values" |
| |
| You generally do not want to change default values after they're initially |
| defined. Fields that have the default value are not actually stored in the |
| serialized data (see also Gotchas below). Values explicitly written by code |
| generated by the old schema old version, if they happen to be the default, will |
| be read as a different value by code generated with the new schema. This is |
| slightly less bad when converting an optional scalar into a default valued |
| scalar since non-presence would not be overloaded with a previous default value. |
| There are situations, however, where this may be desirable, especially if you |
| can ensure a simultaneous rebuild of all code. |
| |
| #### 2. Optional |
| |
| Optional value fields will return some form of `null` in the language generated. |
| |
| === "C++" |
| |
| ```c++ |
| std::optional<T> field; |
| ``` |
| |
| For optional scalars, just set the field default value to `null`. If the |
| producer of the buffer does not explicitly set that field, it will be marked |
| `null`. |
| |
| ```c++ |
| hp:short = null; |
| ``` |
| |
| !!! note |
| |
| Not every languages support scalar defaults yet |
| |
| #### 3. Required |
| |
| Required valued fields will cause an error if they are not set. The FlatBuffers |
| verifier would consider the whole buffer invalid. |
| |
| This is enabled by the [`required` attribute](#required) on the field. |
| |
| ``` |
| hp:short (required) |
| ``` |
| |
| You cannot have `required` set with an explicit default value, it will result in |
| a compiler error. |
| |
| ## Structs |
| |
| Similar to a table, `structs` consist of fields are required (so no defaults |
| either), and fields may not be added or be deprecated. |
| |
| ```c title="monster.fbs - Example Struct" linenums="11" |
| struct Vec3 { |
| x:float; |
| y:float; |
| z:float; |
| } |
| ``` |
| |
| Structs may only contain scalars or other structs. Use this for simple objects |
| where you are very sure no changes will ever be made (as quite clear in the |
| example `Vec3`). Structs use less memory than tables and are even faster to |
| access (they are always stored in-line in their parent object, and use no |
| virtual table). |
| |
| ### Arrays |
| |
| Arrays are a convenience short-hand for a fixed-length collection of elements. |
| Arrays allow the following syntax, while maintaining binary equivalency. |
| |
| <div class="grid cards" markdown> |
| |
| - **Normal Syntax** |
| |
| === |
| |
| ```c++ |
| struct Vec3 { |
| x:float; |
| y:float; |
| z:float; |
| } |
| ``` |
| |
| - **Array Syntax** |
| |
| === |
| |
| ```c++ |
| struct Vec3 { |
| v:[float:3]; |
| } |
| ``` |
| |
| </div> |
| |
| Arrays are currently only supported in a `struct`. |
| |
| ## Types |
| |
| The following are the built-in types that can be used in FlatBuffers. |
| |
| ### Scalars |
| |
| The standard assortment of fixed sized scalars are available. There are no |
| variable sized integers (e.g., `varints`). |
| |
| | Size | Signed | Unsigned | Floating Point | |
| | ------ | ----------------- | ------------------- | -------------------- | |
| | 8-bit | `byte`, `bool` | `ubyte` (`uint8`) | | |
| | 16-bit | `short` (`int16`) | `ushort` (`uint16`) | |
| | 32-bit | `int` (`int32`) | `uint` (`uint32`) | `float` (`float32`) | |
| | 64-bit | `long` (`int64`) | `ulong` (`uint64`) | `double` (`float64`) | |
| |
| The type names in parentheses are alias names such that for example `uint8` can |
| be used in place of `ubyte`, and `int32` can be used in place of `int` without |
| affecting code generation. |
| |
| ### Non-scalars |
| |
| #### Vectors |
| |
| Vector of any other type (denoted with `[type]`). |
| |
| ```c++ |
| inventory:[ubyte]; |
| ``` |
| |
| !!! note "Nesting vectors" |
| |
| Nesting vectors is not supported, instead you can wrap the inner vector with |
| a table. |
| |
| ``` |
| table nest{ |
| a:[ubyte] |
| } |
| |
| table monster { |
| a:[nest] |
| } |
| ``` |
| |
| #### Strings |
| |
| Strings (indicated by `string`) are zero-terminated strings, prefixed by their |
| length. Strings may only hold UTF-8 or 7-bit ASCII. For other text encodings or |
| general binary data use vectors (`[byte]` or `[ubyte]`) instead. |
| |
| ```c++ |
| name:string; |
| ``` |
| |
| ## Enums |
| |
| Define a sequence of named constants, each with a given value, or increasing by |
| one from the previous one. The default first value is `0`. As you can see in the |
| enum declaration, you specify the underlying integral type of the enum with `:` |
| (in this case `byte`), which then determines the type of any fields declared |
| with this enum type. |
| |
| Only integer types are allowed, i.e. `byte`, `ubyte`, `short` `ushort`, `int`, |
| `uint`, `long` and `ulong`. |
| |
| Typically, enum values should only ever be added, never removed (there is no |
| deprecation for enums). This requires code to handle forwards compatibility |
| itself, by handling unknown enum values. |
| |
| ## Unions |
| |
| Unions share a lot of properties with enums, but instead of new names for |
| constants, you use names of tables. You can then declare a union field, which |
| can hold a reference to any of those types, and additionally a field with the |
| suffix `_type` is generated that holds the corresponding enum value, allowing |
| you to know which type to cast to at runtime. |
| |
| It's possible to give an alias name to a type union. This way a type can even be |
| used to mean different things depending on the name used: |
| |
| ```txt |
| table PointPosition { x:uint; y:uint; } |
| table MarkerPosition {} |
| union Position { |
| Start:MarkerPosition, |
| Point:PointPosition, |
| Finish:MarkerPosition |
| } |
| ``` |
| |
| Unions contain a special `NONE` marker to denote that no value is stored so that |
| name cannot be used as an alias. |
| |
| Unions are a good way to be able to send multiple message types as a FlatBuffer. |
| Note that because a union field is really two fields, it must always be part of |
| a table, it cannot be the root of a FlatBuffer by itself. |
| |
| If you have a need to distinguish between different FlatBuffers in a more |
| open-ended way, for example for use as files, see the file identification |
| feature below. |
| |
| There is an experimental support only in C++ for a vector of unions (and types). |
| In the example IDL file above, use [Any] to add a vector of Any to Monster |
| table. There is also experimental support for other types besides tables in |
| unions, in particular structs and strings. There's no direct support for scalars |
| in unions, but they can be wrapped in a struct at no space cost. |
| |
| ## Namespaces |
| |
| These will generate the corresponding namespace in C++ for all helper code, and |
| packages in Java. You can use `.` to specify nested namespaces / packages. |
| |
| ## Includes |
| |
| You can include other schemas files in your current one, e.g.: |
| |
| ```txt |
| include "mydefinitions.fbs"; |
| ``` |
| |
| This makes it easier to refer to types defined elsewhere. `include` |
| automatically ensures each file is parsed just once, even when referred to more |
| than once. |
| |
| When using the `flatc` compiler to generate code for schema definitions, only |
| definitions in the current file will be generated, not those from the included |
| files (those you still generate separately). |
| |
| ## Root type |
| |
| This declares what you consider to be the root table of the serialized data. |
| This is particularly important for parsing JSON data, which doesn't include |
| object type information. |
| |
| ## File identification and extension |
| |
| Typically, a FlatBuffer binary buffer is not self-describing, i.e. it needs you |
| to know its schema to parse it correctly. But if you want to use a FlatBuffer as |
| a file format, it would be convenient to be able to have a "magic number" in |
| there, like most file formats have, to be able to do a sanity check to see if |
| you're reading the kind of file you're expecting. |
| |
| Now, you can always prefix a FlatBuffer with your own file header, but |
| FlatBuffers has a built-in way to add an identifier to a FlatBuffer that takes |
| up minimal space, and keeps the buffer compatible with buffers that don't have |
| such an identifier. |
| |
| You can specify in a schema, similar to `root_type`, that you intend for this |
| type of FlatBuffer to be used as a file format: |
| |
| ```txt |
| file_identifier "MYFI"; |
| ``` |
| |
| Identifiers must always be exactly 4 characters long. These 4 characters will |
| end up as bytes at offsets 4-7 (inclusive) in the buffer. |
| |
| For any schema that has such an identifier, `flatc` will automatically add the |
| identifier to any binaries it generates (with `-b`), and generated calls like |
| `FinishMonsterBuffer` also add the identifier. If you have specified an |
| identifier and wish to generate a buffer without one, you can always still do so |
| by calling `FlatBufferBuilder::Finish` explicitly. |
| |
| After loading a buffer, you can use a call like `MonsterBufferHasIdentifier` to |
| check if the identifier is present. |
| |
| Note that this is best for open-ended uses such as files. If you simply wanted |
| to send one of a set of possible messages over a network for example, you'd be |
| better off with a union. |
| |
| Additionally, by default `flatc` will output binary files as `.bin`. This |
| declaration in the schema will change that to whatever you want: |
| |
| ```txt |
| file_extension "ext"; |
| ``` |
| |
| ## RPC interface declarations |
| |
| You can declare RPC calls in a schema, that define a set of functions that take |
| a FlatBuffer as an argument (the request) and return a FlatBuffer as the |
| response (both of which must be table types): |
| |
| ```txt |
| rpc_service MonsterStorage { |
| Store(Monster):StoreResponse; |
| Retrieve(MonsterId):Monster; |
| } |
| ``` |
| |
| What code this produces and how it is used depends on language and RPC system |
| used, there is preliminary support for GRPC through the `--grpc` code generator, |
| see `grpc/tests` for an example. |
| |
| ## Comments & documentation |
| |
| May be written as in most C-based languages. Additionally, a triple comment |
| (`///`) on a line by itself signals that a comment is documentation for whatever |
| is declared on the line after it (table/struct/field/enum/union/element), and |
| the comment is output in the corresponding C++ code. Multiple such lines per |
| item are allowed. |
| |
| ## Attributes |
| |
| Attributes may be attached to a declaration, behind a field/enum value, or after |
| the name of a table/struct/enum/union. These may either have a value or not. |
| Some attributes like `deprecated` are understood by the compiler; user defined |
| ones need to be declared with the attribute declaration (like `priority` in the |
| example above), and are available to query if you parse the schema at runtime. |
| This is useful if you write your own code generators/editors etc., and you wish |
| to add additional information specific to your tool (such as a help text). |
| |
| Current understood attributes: |
| |
| - `id: n` (on a table field): manually set the field identifier to `n`. If you |
| use this attribute, you must use it on ALL fields of this table, and the |
| numbers must be a contiguous range from 0 onwards. Additionally, since a union |
| type effectively adds two fields, its id must be that of the second field (the |
| first field is the type field and not explicitly declared in the schema). For |
| example, if the last field before the union field had id 6, the union field |
| should have id 8, and the unions type field will implicitly be 7. IDs allow |
| the fields to be placed in any order in the schema. When a new field is added |
| to the schema it must use the next available ID. |
| - `deprecated` (on a field): do not generate accessors for this field anymore, |
| code should stop using this data. Old data may still contain this field, but |
| it won't be accessible anymore by newer code. Note that if you deprecate a |
| field that was previous required, old code may fail to validate new data (when |
| using the optional verifier). |
| |
| ### `required` |
| |
| - `required` (on a non-scalar table field): this field must always be set. By |
| default, fields do not need to be present in the binary. This is desirable, as |
| it helps with forwards/backwards compatibility, and flexibility of data |
| structures. By specifying this attribute, you make non- presence in an error |
| for both reader and writer. The reading code may access the field directly, |
| without checking for null. If the constructing code does not initialize this |
| field, they will get an assert, and also the verifier will fail on buffers |
| that have missing required fields. Both adding and removing this attribute may |
| be forwards/backwards incompatible as readers will be unable read old or new |
| data, respectively, unless the data happens to always have the field set. |
| - `force_align: size` (on a struct): force the alignment of this struct to be |
| something higher than what it is naturally aligned to. Causes these structs to |
| be aligned to that amount inside a buffer, IF that buffer is allocated with |
| that alignment (which is not necessarily the case for buffers accessed |
| directly inside a `FlatBufferBuilder`). Note: currently not guaranteed to have |
| an effect when used with `--object-api`, since that may allocate objects at |
| alignments less than what you specify with `force_align`. |
| - `force_align: size` (on a vector): force the alignment of this vector to be |
| something different than what the element size would normally dictate. Note: |
| Now only work for generated C++ code. |
| - `bit_flags` (on an unsigned enum): the values of this field indicate bits, |
| meaning that any unsigned value N specified in the schema will end up |
| representing 1<<N, or if you don't specify values at all, you'll get the |
| sequence 1, 2, 4, 8, ... |
| - `nested_flatbuffer: "table_name"` (on a field): this indicates that the field |
| (which must be a vector of ubyte) contains flatbuffer data, for which the root |
| type is given by `table_name`. The generated code will then produce a |
| convenient accessor for the nested FlatBuffer. |
| - `flexbuffer` (on a field): this indicates that the field (which must be a |
| vector of ubyte) contains flexbuffer data. The generated code will then |
| produce a convenient accessor for the FlexBuffer root. |
| - `key` (on a field): this field is meant to be used as a key when sorting a |
| vector of the type of table it sits in. Can be used for in-place binary |
| search. |
| - `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose |
| value during JSON parsing is allowed to be a string, which will then be stored |
| as its hash. The value of attribute is the hashing algorithm to use, one of |
| `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`. |
| - `original_order` (on a table): since elements in a table do not need to be |
| stored in any particular order, they are often optimized for space by sorting |
| them to size. This attribute stops that from happening. There should generally |
| not be any reason to use this flag. |
| - 'native*\*'. Several attributes have been added to support the C++ object |
| Based API. All such attributes are prefixed with the term "native*". |
| |
| ## JSON Parsing |
| |
| The same parser that parses the schema declarations above is also able to parse |
| JSON objects that conform to this schema. So, unlike other JSON parsers, this |
| parser is strongly typed, and parses directly into a FlatBuffer (see the |
| compiler documentation on how to do this from the command line, or the C++ |
| documentation on how to do this at runtime). |
| |
| Besides needing a schema, there are a few other changes to how it parses JSON: |
| |
| - It accepts field names with and without quotes, like many JSON parsers already |
| do. It outputs them without quotes as well, though can be made to output them |
| using the `strict_json` flag. |
| - If a field has an enum type, the parser will recognize symbolic enum values |
| (with or without quotes) instead of numbers, e.g. `field: EnumVal`. If a field |
| is of integral type, you can still use symbolic names, but values need to be |
| prefixed with their type and need to be quoted, e.g. `field: "Enum.EnumVal"`. |
| For enums representing flags, you may place multiple inside a string separated |
| by spaces to OR them, e.g. `field: "EnumVal1 EnumVal2"` or |
| `field: "Enum.EnumVal1 Enum.EnumVal2"`. |
| - Similarly, for unions, these need to specified with two fields much like you |
| do when serializing from code. E.g. for a field `foo`, you must add a field |
| `foo_type: FooOne` right before the `foo` field, where `FooOne` would be the |
| table out of the union you want to use. |
| - A field that has the value `null` (e.g. `field: null`) is intended to have the |
| default value for that field (thus has the same effect as if that field wasn't |
| specified at all). |
| - It has some built in conversion functions, so you can write for example |
| `rad(180)` where ever you'd normally write `3.14159`. Currently supports the |
| following functions: `rad`, `deg`, `cos`, `sin`, `tan`, `acos`, `asin`, |
| `atan`. |
| |
| When parsing JSON, it recognizes the following escape codes in strings: |
| |
| - `\n` - linefeed. |
| - `\t` - tab. |
| - `\r` - carriage return. |
| - `\b` - backspace. |
| - `\f` - form feed. |
| - `\"` - double quote. |
| - `\\` - backslash. |
| - `\/` - forward slash. |
| - `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8 |
| representation. |
| - `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is not |
| in the JSON spec (see http://json.org/), but is needed to be able to encode |
| arbitrary binary in strings to text and back without losing information (e.g. |
| the byte 0xFF can't be represented in standard JSON). |
| |
| It also generates these escape codes back again when generating JSON from a |
| binary representation. |
| |
| When parsing numbers, the parser is more flexible than JSON. A format of numeric |
| literals is more close to the C/C++. According to the |
| [grammar](grammar.md), it accepts the following numerical literals: |
| |
| - An integer literal can have any number of leading zero `0` digits. Unlike |
| C/C++, the parser ignores a leading zero, not interpreting it as the beginning |
| of the octal number. The numbers `[081, -00094]` are equal to `[81, -94]` |
| decimal integers. |
| - The parser accepts unsigned and signed hexadecimal integer numbers. For |
| example: `[0x123, +0x45, -0x67]` are equal to `[291, 69, -103]` decimals. |
| - The format of float-point numbers is fully compatible with C/C++ format. If a |
| modern C++ compiler is used the parser accepts hexadecimal and special |
| floating-point literals as well: |
| `[-1.0, 2., .3e0, 3.e4, 0x21.34p-5, -inf, nan]`. |
| |
| The following conventions for floating-point numbers are used: |
| |
| - The exponent suffix of hexadecimal floating-point number is mandatory. |
| - Parsed `NaN` converted to unsigned IEEE-754 `quiet-NaN` value. |
| |
| Extended floating-point support was tested with: |
| |
| - x64 Windows: `MSVC2015` and higher. |
| - x64 Linux: `LLVM 6.0`, `GCC 4.9` and higher. |
| |
| - For compatibility with a JSON lint tool all numeric literals of scalar fields |
| can be wrapped to quoted string: |
| `"1", "2.0", "0x48A", "0x0C.0Ep-1", "-inf", "true"`. |
| |
| ## Guidelines |
| |
| ### Efficiency |
| |
| FlatBuffers is all about efficiency, but to realize that efficiency you require |
| an efficient schema. There are usually multiple choices on how to represent data |
| that have vastly different size characteristics. |
| |
| It is very common nowadays to represent any kind of data as dictionaries (as in |
| e.g. JSON), because of its flexibility and extensibility. While it is possible |
| to emulate this in FlatBuffers (as a vector of tables with key and value(s)), |
| this is a bad match for a strongly typed system like FlatBuffers, leading to |
| relatively large binaries. FlatBuffer tables are more flexible than |
| classes/structs in most systems, since having a large number of fields only few |
| of which are actually used is still efficient. You should thus try to organize |
| your data as much as possible such that you can use tables where you might be |
| tempted to use a dictionary. |
| |
| Similarly, strings as values should only be used when they are truly open-ended. |
| If you can, always use an enum instead. |
| |
| FlatBuffers doesn't have inheritance, so the way to represent a set of related |
| data structures is a union. Unions do have a cost however, so an alternative to |
| a union is to have a single table that has all the fields of all the data |
| structures you are trying to represent, if they are relatively similar / share |
| many fields. Again, this is efficient because non-present fields are cheap. |
| |
| FlatBuffers supports the full range of integer sizes, so try to pick the |
| smallest size needed, rather than defaulting to int/long. |
| |
| Remember that you can share data (refer to the same string/table within a |
| buffer), so factoring out repeating data into its own data structure may be |
| worth it. |
| |
| ### Style guide |
| |
| Identifiers in a schema are meant to translate to many different programming |
| languages, so using the style of your "main" language is generally a bad idea. |
| |
| For this reason, below is a suggested style guide to adhere to, to keep schemas |
| consistent for interoperation regardless of the target language. |
| |
| Where possible, the code generators for specific languages will generate |
| identifiers that adhere to the language style, based on the schema identifiers. |
| |
| - Table, struct, enum and rpc names (types): UpperCamelCase. |
| - Table and struct field names: snake_case. This is translated to lowerCamelCase |
| automatically for some languages, e.g. Java. |
| - Enum values: UpperCamelCase. |
| - namespaces: UpperCamelCase. |
| |
| Formatting (this is less important, but still worth adhering to): |
| |
| - Opening brace: on the same line as the start of the declaration. |
| - Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`. |
| |
| For an example, see the schema at the top of this file. |
| |
| ## Gotchas |
| |
| |
| |
| ### Testing whether a field is present in a table |
| |
| Most serialization formats (e.g. JSON or Protocol Buffers) make it very explicit |
| in the format whether a field is present in an object or not, allowing you to |
| use this as "extra" information. |
| |
| FlatBuffers will not write fields that are equal to their default value, |
| sometimes resulting in significant space savings. However, this also means we |
| cannot disambiguate the meaning of non-presence as "written default value" or |
| "not written at all". This only applies to scalar fields since only they support |
| default values. Unless otherwise specified, their default is 0. |
| |
| If you care about the presence of scalars, most languages support "optional |
| scalars." You can set `null` as the default value in the schema. `null` is a |
| value that's outside of all types, so we will always write if `add_field` is |
| called. The generated field accessor should use the local language's canonical |
| optional type. |
| |
| Some `FlatBufferBuilder` implementations have an option called `force_defaults` |
| that circumvents this "not writing defaults" behavior you can then use |
| `IsFieldPresent` to query presence. / Another option that works in all languages |
| is to wrap a scalar field in a struct. This way it will return null if it is not |
| present. This will be slightly less ergonomic but structs don't take up any more |
| space than the scalar they represent. |
| |