| :orphan: |
| |
| ================================= |
| Swift Binary Serialization Format |
| ================================= |
| |
| The fundamental unit of distribution for Swift code is a *module.* A module |
| contains declarations as an interface for clients to write code against. It may |
| also contain implementation information for any of these declarations that can |
| be used to optimize client code. Conceptually, the file containing the |
| interface for a module serves much the same purpose as the collection of C |
| header files for a particular library. |
| |
| Swift's binary serialization format is currently used for several purposes: |
| |
| - The public interface for a module ("swiftmodule files"). |
| |
| - A representation of captured compiler state after semantic analysis and SIL |
| generation, but before LLVM IR generation ("SIB", for "Swift Intermediate |
| Binary"). |
| |
| - Debug information about types, for proper high-level introspection without |
| running code. |
| |
| - Debug information about non-public APIs, for interactive debugging. |
| |
| The first two uses require a module to serve as a container of both AST nodes |
| and SIL entities. As a unit of distribution, it should also be |
| forward-compatible: module files installed on a developer's system in 201X |
| should be usable without updates for years to come, even as the Swift compiler |
| continues to be improved and enhanced. However, they are currently too closely |
| tied to the compiler internals to be useful for this purpose, and it is likely |
| we'll invent a new format instead. |
| |
| |
| Why LLVM bitcode? |
| ================= |
| |
| The `LLVM bitstream <http://llvm.org/docs/BitCodeFormat.html>`_ format was |
| invented as a container format for LLVM IR. It is a binary format supporting |
| two basic structures: *blocks,* which define regions of the file, and |
| *records,* which contain data fields that can be up to 64 bits. It has a few |
| nice properties that make it a useful container format for Swift modules as |
| well: |
| |
| - It is easy to skip over an entire block, because the block's length is |
| recorded at its start. |
| |
| - It is possible to jump to specific offsets *within* a block without having to |
| reparse from the start of the block. |
| |
| - A format change doesn't immediately invalidate existing bitstream files, |
| because the stream includes layout information for each record. |
| |
| - It's a binary format, so it's at least *somewhat* compact. [I haven't done a |
| size comparison against other formats.] |
| |
| If we were to switch to another container format, we would likely want it to |
| have most of these properties as well. But we're already linking against |
| LLVM...might as well use it! |
| |
| |
| Versioning |
| ========== |
| |
| .. warning:: |
| |
| This section is relevant to any forward-compatible format used for a |
| library's public interface. However, as mentioned above this may not be |
| the current binary serialization format. |
| |
| Today's Swift uses a "major" version number of 0 and an always-incrementing |
| "minor" version number. Every change is treated as compatibility-breaking; |
| the minor version must match exactly for the compiler to load the module. |
| |
| Persistent serialized Swift files use the following versioning scheme: |
| |
| - Serialized modules are given a major and minor version number. |
| |
| - When making a backwards-compatible change, the major and the minor version |
| number both MUST NOT be incremented. |
| |
| - When making a change such that new modules cannot be safely loaded by older |
| compilers, the minor version number MUST be incremented. |
| |
| - When making a change such that *old* modules cannot be safely loaded by |
| *newer* compilers, the major version number MUST be incremented. The minor |
| version number MUST then be reset to zero. |
| |
| - Ideally, the major version number is never incremented. |
| |
| A serialized file's version number is checked against the client's supported |
| version before it is loaded. If it is too old or too new, the file cannot be |
| loaded. |
| |
| Note that the version number describes the contents of the file. Thus, if a |
| compiler supports features introduced in file version 1.9, but a particular |
| module only uses features introduced before and in version 1.7, the compiler |
| MAY serialize that module with the version number 1.7. However, doing so |
| requires extra work on the compiler's part to detect which features are in use; |
| a simpler implementation would just use the latest version number supported: |
| 1.9. |
| |
| *This versioning scheme was inspired by* `Semantic Versioning |
| <http://semver.org>`_. *However, it is not compatible with Semantic Versioning |
| because it promises* forward-compatibility *rather than* backward-compatibility. |
| |
| |
| A High-Level Tour of the Current Module Format |
| ============================================== |
| |
| Every serialized module is represented as a single block called the "module |
| block". The module block is made up of several other block kinds, largely for |
| organizational purposes. |
| |
| - The **block info block** is a standard LLVM bitcode block that contains |
| metadata about the bitcode stream. It is the only block that appears outside |
| the module block; we always put it at the very start of the file. Though it |
| can contain actual semantic information, our use of it is only for debugging |
| purposes. |
| |
| - The **control block** is always the first block in the module block. It can |
| be processed without loading the rest of the module, and indeed is intended |
| to allow clients to decide whether not the module is compatible with the |
| current AST context. The major and minor version numbers of the format are |
| stored here. |
| |
| - The **input block** contains information about how to import the module once |
| the client has decided to load it. This includes the list of other modules |
| that this module depends on. |
| |
| - The **SIL block** contains SIL-level implementations that can be imported |
| into a client's SILModule context. In most cases this is just a performance |
| concern, but sometimes it affects language semantics as well, as in the case |
| of ``@_transparent``. The SIL block precedes the AST block because it affects |
| which AST nodes get serialized. |
| |
| - The **SIL index black** contains tables for accessing various SIL entities by |
| their names, along with a mapping of unique IDs for these to the appropriate |
| bit offsets into the SIL block. |
| |
| - The **AST block** contains the serialized forms of Decl, DeclContext, and |
| Type AST nodes. Decl nodes may be cross-references to other modules, while |
| types are always serialized with enough info to regenerate them at load time. |
| Nodes are accessed by a file-unique "DeclIDs" (also covering DeclContexts) |
| and "TypeIDs"; the two sets of IDs use separate numbering schemes. |
| |
| .. note:: |
| |
| The AST block is currently referred to as the "decls block" in the source. |
| |
| - The **identifier block** contains a single blob of strings. This is intended |
| for Identifiers---strings uniqued by the ASTContext---but can in theory |
| support any string data. The strings are accessed by a file-unique |
| "IdentifierID". |
| |
| - The **index block** contains mappings from the AST node and identifier IDs to |
| their offsets in the AST block or identifier block (as appropriate). It also |
| contains various top-level AST information about the module, such as its |
| top-level declarations. |
| |
| |
| SIL |
| === |
| |
| [to be written] |
| |
| |
| Cross-reference resilience |
| ========================== |
| |
| [to be written] |