docs/source/schema.md - third_party/github.com/google/flatbuffers - Git at Google

 # Schema

 The syntax of the schema language (aka IDL,
 [Interface Definition Language](https://en.wikipedia.org/wiki/Interface_description_language))
 should look quite familiar to users of any of the C family of languages, and
 also to users of other IDLs. Let's look at an example first:

 ```c title="monster.fbs" linenums="1"
 // example IDL file

 namespace MyGame;

 attribute "priority";

 enum Color : byte { Red = 1, Green, Blue }

 union Any { Monster, Weapon, Pickup }

 struct Vec3 {
   x:float;
   y:float;
   z:float;
 }

 table Monster {
   pos:Vec3;
   mana:short = 150;
   hp:short = 100;
   name:string;
   friendly:bool = false (deprecated, priority: 1);
   inventory:[ubyte];
   color:Color = Blue;
   test:Any;
 }

 table Weapon {}
 table Pickup {}

 root_type Monster;
 ```

 ## Tables

 Tables are the main way of defining objects in FlatBuffers.

 ```c title="monster.fbs - Example Table" linenums="17"
 table Monster {
   pos:Vec3;
   mana:short = 150;
   hp:short = 100;
   name:string;
   friendly:bool = false (deprecated, priority: 1);
   inventory:[ubyte];
   color:Color = Blue;
   test:Any;
 }
 ```

 They consist of a name (here `Monster`) and a list of [fields](#fields). This
 field list can be appended to (and deprecated from) while still maintaining
 compatibility.

 ### Fields

 Table fields have a name identifier, a [type](#types), optional default value,
 optional [attributes](#attributes) and ends with a `;`. See the
 [grammar](grammar.md) for full details.

 ```ebnf
 field_decl = ident `:` type [ `=` scalar ] metadata `;`
 ```

 Fields do not have to appear in the wire representation, and you can choose to
 omit fields when constructing an object. You have the flexibility to add fields
 without fear of bloating your data. This design is also FlatBuffer's mechanism
 for forward and backwards compatibility.

 There are three, mutually exclusive, reactions to the non-presence of a table's
 field in the binary data.

 #### 1. Default

 Default value fields with return the default value as defined in the schema. If
 the default value is not specified in the schema, it will be `0` for scalar
 types, or `null` for other types.

 ```c++
 mana:short = 150;
 hp:short;
 inventory:[ubyte];
 ```

 Here `mana` would default to the value `150`, `hp` to value `0`, and `inventory`
 to `null`, if those fields are not set.

 Only scalar values can have explicit defaults, non-scalar fields (strings,
 vectors, tables) are `null` when not present.

 This is the normal mode that fields will take.

 ??? danger "Don't change Default values"

     You generally do not want to change default values after they're initially
     defined. Fields that have the default value are not actually stored in the
     serialized data (see also Gotchas below). Values explicitly written by code
     generated by the old schema old version, if they happen to be the default, will
     be read as a different value by code generated with the new schema. This is
     slightly less bad when converting an optional scalar into a default valued
     scalar since non-presence would not be overloaded with a previous default value.
     There are situations, however, where this may be desirable, especially if you
     can ensure a simultaneous rebuild of all code.

 #### 2. Optional

 Optional value fields will return some form of `null` in the language generated.

 === "C++"

     ```c++
     std::optional<T> field;
     ```

 For optional scalars, just set the field default value to `null`. If the
 producer of the buffer does not explicitly set that field, it will be marked
 `null`.

 ```c++
   hp:short = null;
 ```

 !!! note

     Not every languages support scalar defaults yet

 #### 3. Required

 Required valued fields will cause an error if they are not set. The FlatBuffers
 verifier would consider the whole buffer invalid.

 This is enabled by the [`required` attribute](#required) on the field.

 ```
   hp:short (required)
 ```

 You cannot have `required` set with an explicit default value, it will result in
 a compiler error.

 ## Structs

 Similar to a table, `structs` consist of fields are required (so no defaults
 either), and fields may not be added or be deprecated.

 ```c title="monster.fbs - Example Struct" linenums="11"
 struct Vec3 {
   x:float;
   y:float;
   z:float;
 }
 ```

 Structs may only contain scalars or other structs. Use this for simple objects
 where you are very sure no changes will ever be made (as quite clear in the
 example `Vec3`). Structs use less memory than tables and are even faster to
 access (they are always stored in-line in their parent object, and use no
 virtual table).

 ### Arrays

 Arrays are a convenience short-hand for a fixed-length collection of elements.
 Arrays allow the following syntax, while maintaining binary equivalency.

 <div class="grid cards" markdown>

 - **Normal Syntax**

   ===

   ```c++
   struct Vec3 {
     x:float;
     y:float;
     z:float;
   }
   ```

 - **Array Syntax**

   ===

   ```c++
   struct Vec3 {
     v:[float:3];
   }
   ```

 </div>

 Arrays are currently only supported in a `struct`.

 ## Types

 The following are the built-in types that can be used in FlatBuffers.

 ### Scalars

 The standard assortment of fixed sized scalars are available. There are no
 variable sized integers (e.g., `varints`).

 | Size   | Signed            | Unsigned            | Floating Point       |
 | ------ | ----------------- | ------------------- | -------------------- |
 | 8-bit  | `byte`, `bool`    | `ubyte` (`uint8`)   |                      |
 | 16-bit | `short` (`int16`) | `ushort` (`uint16`) |
 | 32-bit | `int` (`int32`)   | `uint` (`uint32`)   | `float` (`float32`)  |
 | 64-bit | `long` (`int64`)  | `ulong` (`uint64`)  | `double` (`float64`) |

 The type names in parentheses are alias names such that for example `uint8` can
 be used in place of `ubyte`, and `int32` can be used in place of `int` without
 affecting code generation.

 ### Non-scalars

 #### Vectors

 Vector of any other type (denoted with `[type]`).

 ```c++
 inventory:[ubyte];
 ```

 !!! note "Nesting vectors"

     Nesting vectors is not supported, instead you can wrap the inner vector with
     a table.

     ```
     table nest{
        a:[ubyte]
     }

     table monster {
      a:[nest]
     }
     ```

 #### Strings

 Strings (indicated by `string`) are zero-terminated strings, prefixed by their
 length. Strings may only hold UTF-8 or 7-bit ASCII. For other text encodings or
 general binary data use vectors (`[byte]` or `[ubyte]`) instead.

 ```c++
 name:string;
 ```

 ## Enums

 Define a sequence of named constants, each with a given value, or increasing by
 one from the previous one. The default first value is `0`. As you can see in the
 enum declaration, you specify the underlying integral type of the enum with `:`
 (in this case `byte`), which then determines the type of any fields declared
 with this enum type.

 Only integer types are allowed, i.e. `byte`, `ubyte`, `short` `ushort`, `int`,
 `uint`, `long` and `ulong`.

 Typically, enum values should only ever be added, never removed (there is no
 deprecation for enums). This requires code to handle forwards compatibility
 itself, by handling unknown enum values.

 ## Unions

 Unions share a lot of properties with enums, but instead of new names for
 constants, you use names of tables. You can then declare a union field, which
 can hold a reference to any of those types, and additionally a field with the
 suffix `_type` is generated that holds the corresponding enum value, allowing
 you to know which type to cast to at runtime.

 It's possible to give an alias name to a type union. This way a type can even be
 used to mean different things depending on the name used:

 ```txt
 table PointPosition { x:uint; y:uint; }
 table MarkerPosition {}
 union Position {
   Start:MarkerPosition,
   Point:PointPosition,
   Finish:MarkerPosition
 }
 ```

 Unions contain a special `NONE` marker to denote that no value is stored so that
 name cannot be used as an alias.

 Unions are a good way to be able to send multiple message types as a FlatBuffer.
 Note that because a union field is really two fields, it must always be part of
 a table, it cannot be the root of a FlatBuffer by itself.

 If you have a need to distinguish between different FlatBuffers in a more
 open-ended way, for example for use as files, see the file identification
 feature below.

 There is an experimental support only in C++ for a vector of unions (and types).
 In the example IDL file above, use [Any] to add a vector of Any to Monster
 table. There is also experimental support for other types besides tables in
 unions, in particular structs and strings. There's no direct support for scalars
 in unions, but they can be wrapped in a struct at no space cost.

 ## Namespaces

 These will generate the corresponding namespace in C++ for all helper code, and
 packages in Java. You can use `.` to specify nested namespaces / packages.

 ## Includes

 You can include other schemas files in your current one, e.g.:

 ```txt
 include "mydefinitions.fbs";
 ```

 This makes it easier to refer to types defined elsewhere. `include`
 automatically ensures each file is parsed just once, even when referred to more
 than once.

 When using the `flatc` compiler to generate code for schema definitions, only
 definitions in the current file will be generated, not those from the included
 files (those you still generate separately).

 ## Root type

 This declares what you consider to be the root table of the serialized data.
 This is particularly important for parsing JSON data, which doesn't include
 object type information.

 ## File identification and extension

 Typically, a FlatBuffer binary buffer is not self-describing, i.e. it needs you
 to know its schema to parse it correctly. But if you want to use a FlatBuffer as
 a file format, it would be convenient to be able to have a "magic number" in
 there, like most file formats have, to be able to do a sanity check to see if
 you're reading the kind of file you're expecting.

 Now, you can always prefix a FlatBuffer with your own file header, but
 FlatBuffers has a built-in way to add an identifier to a FlatBuffer that takes
 up minimal space, and keeps the buffer compatible with buffers that don't have
 such an identifier.

 You can specify in a schema, similar to `root_type`, that you intend for this
 type of FlatBuffer to be used as a file format:

 ```txt
 file_identifier "MYFI";
 ```

 Identifiers must always be exactly 4 characters long. These 4 characters will
 end up as bytes at offsets 4-7 (inclusive) in the buffer.

 For any schema that has such an identifier, `flatc` will automatically add the
 identifier to any binaries it generates (with `-b`), and generated calls like
 `FinishMonsterBuffer` also add the identifier. If you have specified an
 identifier and wish to generate a buffer without one, you can always still do so
 by calling `FlatBufferBuilder::Finish` explicitly.

 After loading a buffer, you can use a call like `MonsterBufferHasIdentifier` to
 check if the identifier is present.

 Note that this is best for open-ended uses such as files. If you simply wanted
 to send one of a set of possible messages over a network for example, you'd be
 better off with a union.

 Additionally, by default `flatc` will output binary files as `.bin`. This
 declaration in the schema will change that to whatever you want:

 ```txt
 file_extension "ext";
 ```

 ## RPC interface declarations

 You can declare RPC calls in a schema, that define a set of functions that take
 a FlatBuffer as an argument (the request) and return a FlatBuffer as the
 response (both of which must be table types):

 ```txt
 rpc_service MonsterStorage {
     Store(Monster):StoreResponse;
     Retrieve(MonsterId):Monster;
 }
 ```

 What code this produces and how it is used depends on language and RPC system
 used, there is preliminary support for GRPC through the `--grpc` code generator,
 see `grpc/tests` for an example.

 ## Comments & documentation

 May be written as in most C-based languages. Additionally, a triple comment
 (`///`) on a line by itself signals that a comment is documentation for whatever
 is declared on the line after it (table/struct/field/enum/union/element), and
 the comment is output in the corresponding C++ code. Multiple such lines per
 item are allowed.

 ## Attributes

 Attributes may be attached to a declaration, behind a field/enum value, or after
 the name of a table/struct/enum/union. These may either have a value or not.
 Some attributes like `deprecated` are understood by the compiler; user defined
 ones need to be declared with the attribute declaration (like `priority` in the
 example above), and are available to query if you parse the schema at runtime.
 This is useful if you write your own code generators/editors etc., and you wish
 to add additional information specific to your tool (such as a help text).

 Current understood attributes:

 - `id: n` (on a table field): manually set the field identifier to `n`. If you
   use this attribute, you must use it on ALL fields of this table, and the
   numbers must be a contiguous range from 0 onwards. Additionally, since a union
   type effectively adds two fields, its id must be that of the second field (the
   first field is the type field and not explicitly declared in the schema). For
   example, if the last field before the union field had id 6, the union field
   should have id 8, and the unions type field will implicitly be 7. IDs allow
   the fields to be placed in any order in the schema. When a new field is added
   to the schema it must use the next available ID.
 - `deprecated` (on a field): do not generate accessors for this field anymore,
   code should stop using this data. Old data may still contain this field, but
   it won't be accessible anymore by newer code. Note that if you deprecate a
   field that was previous required, old code may fail to validate new data (when
   using the optional verifier).

 ### `required`

 - `required` (on a non-scalar table field): this field must always be set. By
   default, fields do not need to be present in the binary. This is desirable, as
   it helps with forwards/backwards compatibility, and flexibility of data
   structures. By specifying this attribute, you make non- presence in an error
   for both reader and writer. The reading code may access the field directly,
   without checking for null. If the constructing code does not initialize this
   field, they will get an assert, and also the verifier will fail on buffers
   that have missing required fields. Both adding and removing this attribute may
   be forwards/backwards incompatible as readers will be unable read old or new
   data, respectively, unless the data happens to always have the field set.
 - `force_align: size` (on a struct): force the alignment of this struct to be
   something higher than what it is naturally aligned to. Causes these structs to
   be aligned to that amount inside a buffer, IF that buffer is allocated with
   that alignment (which is not necessarily the case for buffers accessed
   directly inside a `FlatBufferBuilder`). Note: currently not guaranteed to have
   an effect when used with `--object-api`, since that may allocate objects at
   alignments less than what you specify with `force_align`.
 - `force_align: size` (on a vector): force the alignment of this vector to be
   something different than what the element size would normally dictate. Note:
   Now only work for generated C++ code.
 - `bit_flags` (on an unsigned enum): the values of this field indicate bits,
   meaning that any unsigned value N specified in the schema will end up
   representing 1<<N, or if you don't specify values at all, you'll get the
   sequence 1, 2, 4, 8, ...
 - `nested_flatbuffer: "table_name"` (on a field): this indicates that the field
   (which must be a vector of ubyte) contains flatbuffer data, for which the root
   type is given by `table_name`. The generated code will then produce a
   convenient accessor for the nested FlatBuffer.
 - `flexbuffer` (on a field): this indicates that the field (which must be a
   vector of ubyte) contains flexbuffer data. The generated code will then
   produce a convenient accessor for the FlexBuffer root.
 - `key` (on a field): this field is meant to be used as a key when sorting a
   vector of the type of table it sits in. Can be used for in-place binary
   search.
 - `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose
   value during JSON parsing is allowed to be a string, which will then be stored
   as its hash. The value of attribute is the hashing algorithm to use, one of
   `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`.
 - `original_order` (on a table): since elements in a table do not need to be
   stored in any particular order, they are often optimized for space by sorting
   them to size. This attribute stops that from happening. There should generally
   not be any reason to use this flag.
 - 'native*\*'. Several attributes have been added to support the C++ object
   Based API. All such attributes are prefixed with the term "native*".

 ## JSON Parsing

 The same parser that parses the schema declarations above is also able to parse
 JSON objects that conform to this schema. So, unlike other JSON parsers, this
 parser is strongly typed, and parses directly into a FlatBuffer (see the
 compiler documentation on how to do this from the command line, or the C++
 documentation on how to do this at runtime).

 Besides needing a schema, there are a few other changes to how it parses JSON:

 - It accepts field names with and without quotes, like many JSON parsers already
   do. It outputs them without quotes as well, though can be made to output them
   using the `strict_json` flag.
 - If a field has an enum type, the parser will recognize symbolic enum values
   (with or without quotes) instead of numbers, e.g. `field: EnumVal`. If a field
   is of integral type, you can still use symbolic names, but values need to be
   prefixed with their type and need to be quoted, e.g. `field: "Enum.EnumVal"`.
   For enums representing flags, you may place multiple inside a string separated
   by spaces to OR them, e.g. `field: "EnumVal1 EnumVal2"` or
   `field: "Enum.EnumVal1 Enum.EnumVal2"`.
 - Similarly, for unions, these need to specified with two fields much like you
   do when serializing from code. E.g. for a field `foo`, you must add a field
   `foo_type: FooOne` right before the `foo` field, where `FooOne` would be the
   table out of the union you want to use.
 - A field that has the value `null` (e.g. `field: null`) is intended to have the
   default value for that field (thus has the same effect as if that field wasn't
   specified at all).
 - It has some built in conversion functions, so you can write for example
   `rad(180)` where ever you'd normally write `3.14159`. Currently supports the
   following functions: `rad`, `deg`, `cos`, `sin`, `tan`, `acos`, `asin`,
   `atan`.

 When parsing JSON, it recognizes the following escape codes in strings:

 - `\n` - linefeed.
 - `\t` - tab.
 - `\r` - carriage return.
 - `\b` - backspace.
 - `\f` - form feed.
 - `\"` - double quote.
 - `\\` - backslash.
 - `\/` - forward slash.
 - `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8
   representation.
 - `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is not
   in the JSON spec (see http://json.org/), but is needed to be able to encode
   arbitrary binary in strings to text and back without losing information (e.g.
   the byte 0xFF can't be represented in standard JSON).

 It also generates these escape codes back again when generating JSON from a
 binary representation.

 When parsing numbers, the parser is more flexible than JSON. A format of numeric
 literals is more close to the C/C++. According to the
 [grammar](grammar.md), it accepts the following numerical literals:

 - An integer literal can have any number of leading zero `0` digits. Unlike
   C/C++, the parser ignores a leading zero, not interpreting it as the beginning
   of the octal number. The numbers `[081, -00094]` are equal to `[81, -94]`
   decimal integers.
 - The parser accepts unsigned and signed hexadecimal integer numbers. For
   example: `[0x123, +0x45, -0x67]` are equal to `[291, 69, -103]` decimals.
 - The format of float-point numbers is fully compatible with C/C++ format. If a
   modern C++ compiler is used the parser accepts hexadecimal and special
   floating-point literals as well:
   `[-1.0, 2., .3e0, 3.e4, 0x21.34p-5, -inf, nan]`.

   The following conventions for floating-point numbers are used:

   - The exponent suffix of hexadecimal floating-point number is mandatory.
   - Parsed `NaN` converted to unsigned IEEE-754 `quiet-NaN` value.

   Extended floating-point support was tested with:

   - x64 Windows: `MSVC2015` and higher.
   - x64 Linux: `LLVM 6.0`, `GCC 4.9` and higher.

 - For compatibility with a JSON lint tool all numeric literals of scalar fields
   can be wrapped to quoted string:
   `"1", "2.0", "0x48A", "0x0C.0Ep-1", "-inf", "true"`.

 ## Guidelines

 ### Efficiency

 FlatBuffers is all about efficiency, but to realize that efficiency you require
 an efficient schema. There are usually multiple choices on how to represent data
 that have vastly different size characteristics.

 It is very common nowadays to represent any kind of data as dictionaries (as in
 e.g. JSON), because of its flexibility and extensibility. While it is possible
 to emulate this in FlatBuffers (as a vector of tables with key and value(s)),
 this is a bad match for a strongly typed system like FlatBuffers, leading to
 relatively large binaries. FlatBuffer tables are more flexible than
 classes/structs in most systems, since having a large number of fields only few
 of which are actually used is still efficient. You should thus try to organize
 your data as much as possible such that you can use tables where you might be
 tempted to use a dictionary.

 Similarly, strings as values should only be used when they are truly open-ended.
 If you can, always use an enum instead.

 FlatBuffers doesn't have inheritance, so the way to represent a set of related
 data structures is a union. Unions do have a cost however, so an alternative to
 a union is to have a single table that has all the fields of all the data
 structures you are trying to represent, if they are relatively similar / share
 many fields. Again, this is efficient because non-present fields are cheap.

 FlatBuffers supports the full range of integer sizes, so try to pick the
 smallest size needed, rather than defaulting to int/long.

 Remember that you can share data (refer to the same string/table within a
 buffer), so factoring out repeating data into its own data structure may be
 worth it.

 ### Style guide

 Identifiers in a schema are meant to translate to many different programming
 languages, so using the style of your "main" language is generally a bad idea.

 For this reason, below is a suggested style guide to adhere to, to keep schemas
 consistent for interoperation regardless of the target language.

 Where possible, the code generators for specific languages will generate
 identifiers that adhere to the language style, based on the schema identifiers.

 - Table, struct, enum and rpc names (types): UpperCamelCase.
 - Table and struct field names: snake_case. This is translated to lowerCamelCase
   automatically for some languages, e.g. Java.
 - Enum values: UpperCamelCase.
 - namespaces: UpperCamelCase.

 Formatting (this is less important, but still worth adhering to):

 - Opening brace: on the same line as the start of the declaration.
 - Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`.

 For an example, see the schema at the top of this file.

 ## Gotchas


 ### Testing whether a field is present in a table

 Most serialization formats (e.g. JSON or Protocol Buffers) make it very explicit
 in the format whether a field is present in an object or not, allowing you to
 use this as "extra" information.

 FlatBuffers will not write fields that are equal to their default value,
 sometimes resulting in significant space savings. However, this also means we
 cannot disambiguate the meaning of non-presence as "written default value" or
 "not written at all". This only applies to scalar fields since only they support
 default values. Unless otherwise specified, their default is 0.

 If you care about the presence of scalars, most languages support "optional
 scalars." You can set `null` as the default value in the schema. `null` is a
 value that's outside of all types, so we will always write if `add_field` is
 called. The generated field accessor should use the local language's canonical
 optional type.

 Some `FlatBufferBuilder` implementations have an option called `force_defaults`
 that circumvents this "not writing defaults" behavior you can then use
 `IsFieldPresent` to query presence. / Another option that works in all languages
 is to wrap a scalar field in a struct. This way it will return null if it is not
 present. This will be slightly less ergonomic but structs don't take up any more
 space than the scalar they represent.