| :orphan: |
| |
| .. @raise litre.TestsAreMissing |
| .. _ABI: |
| |
| .. highlight:: none |
| |
| The Swift ABI |
| ============= |
| |
| .. contents:: |
| |
| Hard Constraints on Resilience |
| ------------------------------ |
| |
| The root of a class hierarchy must remain stable, at pain of |
| invalidating the metaclass hierarchy. Note that a Swift class without an |
| explicit base class is implicitly rooted in the SwiftObject |
| Objective-C class. |
| |
| Type Layout |
| ----------- |
| |
| Fragile Struct and Tuple Layout |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Structs and tuples currently share the same layout algorithm, noted as the |
| "Universal" layout algorithm in the compiler implementation. The algorithm |
| is as follows: |
| |
| - Start with a **size** of **0** and an **alignment** of **1**. |
| - Iterate through the fields, in element order for tuples, or in ``var`` |
| declaration order for structs. For each field: |
| |
| * Update **size** by rounding up to the **alignment of the field**, that is, |
| increasing it to the least value greater or equal to **size** and evenly |
| divisible by the **alignment of the field**. |
| * Assign the **offset of the field** to the current value of **size**. |
| * Update **size** by adding the **size of the field**. |
| * Update **alignment** to the max of **alignment** and the |
| **alignment of the field**. |
| |
| - The final **size** and **alignment** are the size and alignment of the |
| aggregate. The **stride** of the type is the final **size** rounded up to |
| **alignment**. |
| |
| Note that this differs from C or LLVM's normal layout rules in that *size* |
| and *stride* are distinct; whereas C layout requires that an embedded struct's |
| size be padded out to its alignment and that nothing be laid out there, |
| Swift layout allows an outer struct to lay out fields in the inner struct's |
| tail padding, alignment permitting. Unlike C, zero-sized structs and tuples |
| are also allowed, and take up no storage in enclosing aggregates. The Swift |
| compiler emits LLVM packed struct types with manual padding to get the |
| necessary control over the binary layout. Some examples: |
| |
| :: |
| |
| // LLVM <{ i64, i8 }> |
| struct S { |
| var x: Int |
| var y: UInt8 |
| } |
| |
| // LLVM <{ i8, [7 x i8], <{ i64, i8 }>, i8 }> |
| struct S2 { |
| var x: UInt8 |
| var s: S |
| var y: UInt8 |
| } |
| |
| // LLVM <{}> |
| struct Empty {} |
| |
| // LLVM <{ i64, i64 }> |
| struct ContainsEmpty { |
| var x: Int |
| var y: Empty |
| var z: Int |
| } |
| |
| Class Layout |
| ~~~~~~~~~~~~ |
| |
| Swift relies on the following assumptions about the Objective-C runtime, |
| which are therefore now part of the Objective-C ABI: |
| |
| - 32-bit platforms never have tagged pointers. ObjC pointer types are |
| either nil or an object pointer. |
| |
| - On x86-64, a tagged pointer either sets the lowest bit of the pointer |
| or the highest bit of the pointer. Therefore, both of these bits are |
| zero if and only if the value is not a tagged pointer. |
| |
| - On ARM64, a tagged pointer always sets the highest bit of the pointer. |
| |
| - 32-bit platforms never perform any isa masking. ``object_getClass`` |
| is always equivalent to ``*(Class*)object``. |
| |
| - 64-bit platforms perform isa masking only if the runtime exports a |
| symbol ``uintptr_t objc_debug_isa_class_mask;``. If this symbol |
| is exported, ``object_getClass`` on a non-tagged pointer is always |
| equivalent to ``(Class)(objc_debug_isa_class_mask & *(uintptr_t*)object)``. |
| |
| - The superclass field of a class object is always stored immediately |
| after the isa field. Its value is either nil or a pointer to the |
| class object for the superclass; it never has other bits set. |
| |
| The following assumptions are part of the Swift ABI: |
| |
| - Swift class pointers are never tagged pointers. |
| |
| TODO |
| |
| Fragile Enum Layout |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| In laying out enum types, the ABI attempts to avoid requiring additional |
| storage to store the tag for the enum case. The ABI chooses one of five |
| strategies based on the layout of the enum: |
| |
| Empty Enums |
| ``````````` |
| |
| In the degenerate case of an enum with no cases, the enum is an empty type. |
| |
| :: |
| |
| enum Empty {} // => empty type |
| |
| Single-Case Enums |
| ````````````````` |
| |
| In the degenerate case of an enum with a single case, there is no |
| discriminator needed, and the enum type has the exact same layout as its |
| case's data type, or is empty if the case has no data type. |
| |
| :: |
| |
| enum EmptyCase { case X } // => empty type |
| enum DataCase { case Y(Int, Double) } // => LLVM <{ i64, double }> |
| |
| C-Like Enums |
| ```````````` |
| |
| If none of the cases has a data type (a "C-like" enum), then the enum |
| is laid out as an integer tag with the minimal number of bits to contain |
| all of the cases. The machine-level layout of the type then follows LLVM's |
| data layout rules for integer types on the target platform. The cases are |
| assigned tag values in declaration order. |
| |
| :: |
| |
| enum EnumLike2 { // => LLVM i1 |
| case A // => i1 0 |
| case B // => i1 1 |
| } |
| |
| enum EnumLike8 { // => LLVM i3 |
| case A // => i3 0 |
| case B // => i3 1 |
| case C // => i3 2 |
| case D // etc. |
| case E |
| case F |
| case G |
| case H |
| } |
| |
| Discriminator values after the one used for the last case become *extra |
| inhabitants* of the enum type (see `Single-Payload Enums`_). |
| |
| Single-Payload Enums |
| ```````````````````` |
| |
| If an enum has a single case with a data type and one or more no-data cases |
| (a "single-payload" enum), then the case with data type is represented using |
| the data type's binary representation, with added zero bits for tag if |
| necessary. If the data type's binary representation |
| has **extra inhabitants**, that is, bit patterns with the size and alignment of |
| the type but which do not form valid values of that type, they are used to |
| represent the no-data cases, with extra inhabitants in order of ascending |
| numeric value matching no-data cases in declaration order. If the type |
| has *spare bits* (see `Multi-Payload Enums`_), they are used to form extra |
| inhabitants. The enum value is then represented as an integer with the storage |
| size in bits of the data type. Extra inhabitants of the payload type not used |
| by the enum type become extra inhabitants of the enum type itself. |
| |
| :: |
| |
| enum CharOrSectionMarker { => LLVM i32 |
| case Paragraph => i32 0x0020_0000 |
| case Char(UnicodeScalar) => i32 (zext i21 %Char to i32) |
| case Chapter => i32 0x0020_0001 |
| } |
| |
| CharOrSectionMarker.Char('\x00') => i32 0x0000_0000 |
| CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF |
| |
| enum CharOrSectionMarkerOrFootnoteMarker { => LLVM i32 |
| case CharOrSectionMarker(CharOrSectionMarker) => i32 %CharOrSectionMarker |
| case Asterisk => i32 0x0020_0002 |
| case Dagger => i32 0x0020_0003 |
| case DoubleDagger => i32 0x0020_0004 |
| } |
| |
| If the data type has no extra inhabitants, or there are not enough extra |
| inhabitants to represent all of the no-data cases, then a tag bit is added |
| to the enum's representation. The tag bit is set for the no-data cases, which |
| are then assigned values in the data area of the enum in declaration order. |
| |
| :: |
| |
| enum IntOrInfinity { => LLVM <{ i64, i1 }> |
| case NegInfinity => <{ i64, i1 }> { 0, 1 } |
| case Int(Int) => <{ i64, i1 }> { %Int, 0 } |
| case PosInfinity => <{ i64, i1 }> { 1, 1 } |
| } |
| |
| IntOrInfinity.Int( 0) => <{ i64, i1 }> { 0, 0 } |
| IntOrInfinity.Int(20721) => <{ i64, i1 }> { 20721, 0 } |
| |
| Multi-Payload Enums |
| ``````````````````` |
| |
| If an enum has more than one case with data type, then a tag is necessary to |
| discriminate the data types. The ABI will first try to find common |
| **spare bits**, that is, bits in the data types' binary representations which are |
| either fixed-zero or ignored by valid values of all of the data types. The tag |
| will be scattered into these spare bits as much as possible. Currently only |
| spare bits of primitive integer types, such as the high bits of an ``i21`` |
| type, are considered. The enum data is represented as an integer with the |
| storage size in bits of the largest data type. |
| |
| :: |
| |
| enum TerminalChar { => LLVM i32 |
| case Plain(UnicodeScalar) => i32 (zext i21 %Plain to i32) |
| case Bold(UnicodeScalar) => i32 (or (zext i21 %Bold to i32), 0x0020_0000) |
| case Underline(UnicodeScalar) => i32 (or (zext i21 %Underline to i32), 0x0040_0000) |
| case Blink(UnicodeScalar) => i32 (or (zext i21 %Blink to i32), 0x0060_0000) |
| case Empty => i32 0x0080_0000 |
| case Cursor => i32 0x0080_0001 |
| } |
| |
| If there are not enough spare bits to contain the tag, then additional bits are |
| added to the representation to contain the tag. Tag values are |
| assigned to data cases in declaration order. If there are no-data cases, they |
| are collected under a common tag, and assigned values in the data area of the |
| enum in declaration order. |
| |
| :: |
| |
| class Bignum {} |
| |
| enum IntDoubleOrBignum { => LLVM <{ i64, i2 }> |
| case Int(Int) => <{ i64, i2 }> { %Int, 0 } |
| case Double(Double) => <{ i64, i2 }> { (bitcast %Double to i64), 1 } |
| case Bignum(Bignum) => <{ i64, i2 }> { (ptrtoint %Bignum to i64), 2 } |
| } |
| |
| Existential Container Layout |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Values of protocol type, protocol composition type, or ``Any`` type are laid |
| out using **existential containers** (so-called because these types are |
| "existential types" in type theory). |
| |
| Opaque Existential Containers |
| ````````````````````````````` |
| |
| If there is no class constraint on a protocol or protocol composition type, |
| the existential container has to accommodate a value of arbitrary size and |
| alignment. It does this using a **fixed-size buffer**, which is three pointers |
| in size and pointer-aligned. This either directly contains the value, if its |
| size and alignment are both less than or equal to the fixed-size buffer's, or |
| contains a pointer to a side allocation owned by the existential container. |
| The type of the contained value is identified by its `type metadata` record, |
| and witness tables for all of the required protocol conformances are included. |
| The layout is as if declared in the following C struct:: |
| |
| struct OpaqueExistentialContainer { |
| void *fixedSizeBuffer[3]; |
| Metadata *type; |
| WitnessTable *witnessTables[NUM_WITNESS_TABLES]; |
| }; |
| |
| Class Existential Containers |
| ```````````````````````````` |
| |
| If one or more of the protocols in a protocol or protocol composition type |
| have a class constraint, then only class values can be stored in the existential |
| container, and a more efficient representation is used. Class instances are |
| always a single pointer in size, so a fixed-size buffer and potential side |
| allocation is not needed, and class instances always have a reference to their |
| own type metadata, so the separate metadata record is not needed. The |
| layout is thus as if declared in the following C struct:: |
| |
| struct ClassExistentialContainer { |
| HeapObject *value; |
| WitnessTable *witnessTables[NUM_WITNESS_TABLES]; |
| }; |
| |
| Note that if no witness tables are needed, such as for the "any class" type |
| ``protocol<class>`` or an Objective-C protocol type, then the only element of |
| the layout is the heap object pointer. This is ABI-compatible with ``id`` |
| and ``id <Protocol>`` types in Objective-C. |
| |
| Type Metadata |
| ------------- |
| |
| The Swift runtime keeps a **metadata record** for every type used in a program, |
| including every instantiation of generic types. These metadata records can |
| be used by (TODO: reflection and) debugger tools to discover information about |
| types. For non-generic nominal types, these metadata records are generated |
| statically by the compiler. For instances of generic types, and for intrinsic |
| types such as tuples, functions, protocol compositions, etc., metadata records |
| are lazily created by the runtime as required. Every type has a unique metadata |
| record; two **metadata pointer** values are equal iff the types are equivalent. |
| |
| In the layout descriptions below, offsets are given relative to the |
| metadata pointer as an index into an array of pointers. On a 32-bit platform, |
| **offset 1** means an offset of 4 bytes, and on 64-bit platforms, it means |
| an offset of 8 bytes. |
| |
| Common Metadata Layout |
| ~~~~~~~~~~~~~~~~~~~~~~ |
| |
| All metadata records share a common header, with the following fields: |
| |
| - The **value witness table** pointer references a vtable of functions |
| that implement the value semantics of the type, providing fundamental |
| operations such as allocating, copying, and destroying values of the type. |
| The value witness table also records the size, alignment, stride, and other |
| fundamental properties of the type. The value witness table pointer is at |
| **offset -1** from the metadata pointer, that is, the pointer-sized word |
| **immediately before** the pointer's referenced address. |
| |
| - The **kind** field is a pointer-sized integer that describes the kind of type |
| the metadata describes. This field is at **offset 0** from the metadata |
| pointer. |
| |
| The current kind values are as follows: |
| |
| * `Struct metadata`_ has a kind of **1**. |
| * `Enum metadata`_ has a kind of **2**. |
| * **Opaque metadata** has a kind of **8**. This is used for compiler |
| ``Builtin`` primitives that have no additional runtime information. |
| * `Tuple metadata`_ has a kind of **9**. |
| * `Function metadata`_ has a kind of **10**. |
| * `Protocol metadata`_ has a kind of **12**. This is used for |
| protocol types, for protocol compositions, and for the ``Any`` type. |
| * `Metatype metadata`_ has a kind of **13**. |
| * `Class metadata`_, instead of a kind, has an *isa pointer* in its kind slot, |
| pointing to the class's metaclass record. This isa pointer is guaranteed |
| to have an integer value larger than **4096** and so can be discriminated |
| from non-class kind values. |
| |
| Struct Metadata |
| ~~~~~~~~~~~~~~~ |
| |
| In addition to the `common metadata layout`_ fields, struct metadata records |
| contain the following fields: |
| |
| - The `nominal type descriptor`_ is referenced at **offset 1**. |
| |
| - A reference to the **parent** metadata record is stored at **offset 2**. For |
| structs that are members of an enclosing nominal type, this is a reference |
| to the enclosing type's metadata. For top-level structs, this is null. |
| |
| TODO: The parent pointer is currently always null. |
| |
| - A vector of **field offsets** begins at **offset 3**. For each field of the |
| struct, in ``var`` declaration order, the field's offset in bytes from the |
| beginning of the struct is stored as a pointer-sized integer. |
| |
| - If the struct is generic, then the |
| `generic parameter vector`_ begins at **offset 3+n**, where **n** is the |
| number of fields in the struct. |
| |
| Enum Metadata |
| ~~~~~~~~~~~~~ |
| |
| In addition to the `common metadata layout`_ fields, enum metadata records |
| contain the following fields: |
| |
| - The `nominal type descriptor`_ is referenced at **offset 1**. |
| |
| - A reference to the **parent** metadata record is stored at **offset 2**. For |
| enums that are members of an enclosing nominal type, this is a reference to |
| the enclosing type's metadata. For top-level enums, this is null. |
| |
| TODO: The parent pointer is currently always null. |
| |
| - If the enum is generic, then the |
| `generic parameter vector`_ begins at **offset 3**. |
| |
| Tuple Metadata |
| ~~~~~~~~~~~~~~ |
| |
| In addition to the `common metadata layout`_ fields, tuple metadata records |
| contain the following fields: |
| |
| - The **number of elements** in the tuple is a pointer-sized integer at |
| **offset 1**. |
| - The **labels string** is a pointer to a list of consecutive null-terminated |
| label names for the tuple at **offset 2**. Each label name is given as a |
| null-terminated, UTF-8-encoded string in sequence. If the tuple has no |
| labels, this is a null pointer. |
| |
| TODO: The labels string pointer is currently always null, and labels are |
| not factored into tuple metadata uniquing. |
| |
| - The **element vector** begins at **offset 3** and consists of a vector of |
| type-offset pairs. The metadata for the *n*\ th element's type is a pointer |
| at **offset 3+2*n**. The offset in bytes from the beginning of the tuple to |
| the beginning of the *n*\ th element is at **offset 3+2*n+1**. |
| |
| Function Metadata |
| ~~~~~~~~~~~~~~~~~ |
| |
| In addition to the `common metadata layout`_ fields, function metadata records |
| contain the following fields: |
| |
| - The number of arguments to the function is stored at **offset 1**. |
| - A reference to the **result type** metadata record is stored at |
| **offset 2**. If the function has multiple returns, this references a |
| `tuple metadata`_ record. |
| - The **argument vector** begins at **offset 3** and consists of pointers to |
| metadata records of the function's arguments. |
| |
| If the function takes any **inout** arguments, a pointer to each argument's |
| metadata record will be appended separately, the lowest bit being set if it is |
| **inout**. Because of pointer alignment, the lowest bit will always be free to |
| hold this tag. |
| |
| If the function takes no **inout** arguments, there will be only one pointer in |
| the vector for the following cases: |
| |
| * 0 arguments: a `tuple metadata`_ record for the empty tuple |
| * 1 argument: the first and only argument's metadata record |
| * >1 argument: a `tuple metadata`_ record containing the arguments |
| |
| Protocol Metadata |
| ~~~~~~~~~~~~~~~~~ |
| |
| In addition to the `common metadata layout`_ fields, protocol metadata records |
| contain the following fields: |
| |
| - A **layout flags** word is stored at **offset 1**. The bits of this word |
| describe the `existential container layout`_ used to represent |
| values of the type. The word is laid out as follows: |
| |
| * The **number of witness tables** is stored in the least significant 31 bits. |
| Values of the protocol type contain this number of witness table pointers |
| in their layout. |
| * The **class constraint** is stored at bit 31. This bit is set if the type |
| is **not** class-constrained, meaning that struct, enum, or class values |
| can be stored in the type. If not set, then only class values can be stored |
| in the type, and the type uses a more efficient layout. |
| |
| Note that the field is pointer-sized, even though only the lowest 32 bits are |
| currently inhabited on all platforms. These values can be derived from the |
| `protocol descriptor`_ records, but are pre-calculated for convenience. |
| |
| - The **number of protocols** that make up the protocol composition is stored at |
| **offset 2**. For the "any" types ``Any`` or ``Any : class``, this |
| is zero. For a single-protocol type ``P``, this is one. For a protocol |
| composition type ``P & Q & ...``, this is the number of protocols. |
| |
| - The **protocol descriptor vector** begins at **offset 3**. This is an inline |
| array of pointers to the `protocol descriptor`_ for every protocol in the |
| composition, or the single protocol descriptor for a protocol type. For |
| an "any" type, there is no protocol descriptor vector. |
| |
| Metatype Metadata |
| ~~~~~~~~~~~~~~~~~ |
| |
| In addition to the `common metadata layout`_ fields, metatype metadata records |
| contain the following fields: |
| |
| - A reference to the metadata record for the **instance type** that the metatype |
| represents is stored at **offset 1**. |
| |
| Class Metadata |
| ~~~~~~~~~~~~~~ |
| |
| Class metadata is designed to interoperate with Objective-C; all class metadata |
| records are also valid Objective-C ``Class`` objects. Class metadata pointers |
| are used as the values of class metatypes, so a derived class's metadata |
| record also serves as a valid class metatype value for all of its ancestor |
| classes. |
| |
| - The **destructor pointer** is stored at **offset -2** from the metadata |
| pointer, behind the value witness table. This function is invoked by Swift's |
| deallocator when the class instance is destroyed. |
| - The **isa pointer** pointing to the class's Objective-C-compatible metaclass |
| record is stored at **offset 0**, in place of an integer kind discriminator. |
| - The **super pointer** pointing to the metadata record for the superclass is |
| stored at **offset 1**. If the class is a root class, it is null. |
| - Two words are reserved for use by the Objective-C runtime at **offset 2** |
| and **offset 3**. |
| - The **rodata pointer** is stored at **offset 4**; it points to an Objective-C |
| compatible rodata record for the class. This pointer value includes a tag. |
| The **low bit is always set to 1** for Swift classes and always set to 0 for |
| Objective-C classes. |
| - The **class flags** are a 32-bit field at **offset 5**. |
| - The **instance address point** is a 32-bit field following the class flags. |
| A pointer to an instance of this class points this number of bytes after the |
| beginning of the instance. |
| - The **instance size** is a 32-bit field following the instance address point. |
| This is the number of bytes of storage present in every object of this type. |
| - The **instance alignment mask** is a 16-bit field following the instance size. |
| This is a set of low bits which must not be set in a pointer to an instance |
| of this class. |
| - The **runtime-reserved field** is a 16-bit field following the instance |
| alignment mask. The compiler initializes this to zero. |
| - The **class object size** is a 32-bit field following the runtime-reserved |
| field. This is the total number of bytes of storage in the class metadata |
| object. |
| - The **class object address point** is a 32-bit field following the class |
| object size. This is the number of bytes of storage in the class metadata |
| object. |
| - The `nominal type descriptor`_ for the most-derived class type is referenced |
| at an offset immediately following the class object address point. This is |
| **offset 8** on a 64-bit platform or **offset 11** on a 32-bit platform. |
| - For each Swift class in the class's inheritance hierarchy, in order starting |
| from the root class and working down to the most derived class, the following |
| fields are present: |
| |
| * First, a reference to the **parent** metadata record is stored. |
| For classes that are members of an enclosing nominal type, this is a |
| reference to the enclosing type's metadata. For top-level classes, this is |
| null. |
| |
| TODO: The parent pointer is currently always null. |
| |
| * If the class is generic, its `generic parameter vector`_ is stored inline. |
| * The **vtable** is stored inline and contains a function pointer to the |
| implementation of every method of the class in declaration order. |
| * If the layout of a class instance is dependent on its generic parameters, |
| then a **field offset vector** is stored inline, containing offsets in |
| bytes from an instance pointer to each field of the class in declaration |
| order. (For classes with fixed layout, the field offsets are accessible |
| statically from global variables, similar to Objective-C ivar offsets.) |
| |
| Note that none of these fields are present for Objective-C base classes in |
| the inheritance hierarchy. |
| |
| Generic Parameter Vector |
| ~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Metadata records for instances of generic types contain information about their |
| generic parameters. For each parameter of the type, a reference to the metadata |
| record for the type argument is stored. After all of the type argument |
| metadata references, for each type parameter, if there are protocol |
| requirements on that type parameter, a reference to the witness table for each |
| protocol it is required to conform to is stored in declaration order. |
| |
| For example, given a generic type with the parameters ``<T, U, V>``, its |
| generic parameter record will consist of references to the metadata records |
| for ``T``, ``U``, and ``V`` in succession, as if laid out in a C struct:: |
| |
| struct GenericParameterVector { |
| TypeMetadata *T, *U, *V; |
| }; |
| |
| If we add protocol requirements to the parameters, for example, |
| ``<T: Runcible, U: Fungible & Ansible, V>``, then the type's generic |
| parameter vector contains witness tables for those protocols, as if laid out:: |
| |
| struct GenericParameterVector { |
| TypeMetadata *T, *U, *V; |
| RuncibleWitnessTable *T_Runcible; |
| FungibleWitnessTable *U_Fungible; |
| AnsibleWitnessTable *U_Ansible; |
| }; |
| |
| Nominal Type Descriptor |
| ~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The metadata records for class, struct, and enum types contain a pointer to a |
| **nominal type descriptor**, which contains basic information about the nominal |
| type such as its name, members, and metadata layout. For a generic type, one |
| nominal type descriptor is shared for all instantiations of the type. The |
| layout is as follows: |
| |
| - The **kind** of type is stored at **offset 0**, which is as follows: |
| |
| * **0** for a class, |
| * **1** for a struct, or |
| * **2** for an enum. |
| |
| - The mangled **name** is referenced as a null-terminated C string at |
| **offset 1**. This name includes no bound generic parameters. |
| - The following four fields depend on the kind of nominal type. |
| |
| * For a struct or class: |
| |
| + The **number of fields** is stored at **offset 2**. This is the length |
| of the field offset vector in the metadata record, if any. |
| + The **offset to the field offset vector** is stored at **offset 3**. |
| This is the offset in pointer-sized words of the field offset vector for |
| the type in the metadata record. If no field offset vector is stored |
| in the metadata record, this is zero. |
| + The **field names** are referenced as a doubly-null-terminated list of |
| C strings at **offset 4**. The order of names corresponds to the order |
| of fields in the field offset vector. |
| + The **field type accessor** is a function pointer at **offset 5**. If |
| non-null, the function takes a pointer to an instance of type metadata |
| for the nominal type, and returns a pointer to an array of type metadata |
| references for the types of the fields of that instance. The order matches |
| that of the field offset vector and field name list. |
| |
| * For an enum: |
| |
| + The **number of payload cases** and **payload size offset** are stored |
| at **offset 2**. The least significant 24 bits are the number of payload |
| cases, and the most significant 8 bits are the offset of the payload |
| size in the type metadata, if present. |
| + The **number of no-payload cases** is stored at **offset 3**. |
| + The **case names** are referenced as a doubly-null-terminated list of |
| C strings at **offset 4**. The names are ordered such that payload cases |
| come first, followed by no-payload cases. Within each half of the list, |
| the order of names corresponds to the order of cases in the enum |
| declaration. |
| + The **case type accessor** is a function pointer at **offset 5**. If |
| non-null, the function takes a pointer to an instance of type metadata |
| for the enum, and returns a pointer to an array of type metadata |
| references for the types of the cases of that instance. The order matches |
| that of the case name list. This function is similar to the field type |
| accessor for a struct, except also the least significant bit of each |
| element in the result is set if the enum case is an **indirect case**. |
| |
| - If the nominal type is generic, a pointer to the **metadata pattern** that |
| is used to form instances of the type is stored at **offset 6**. The pointer |
| is null if the type is not generic. |
| |
| - The **generic parameter descriptor** begins at **offset 7**. This describes |
| the layout of the generic parameter vector in the metadata record: |
| |
| * The **offset of the generic parameter vector** is stored at **offset 7**. |
| This is the offset in pointer-sized words of the generic parameter vector |
| inside the metadata record. If the type is not generic, this is zero. |
| * The **number of type parameters** is stored at **offset 8**. This count |
| includes associated types of type parameters with protocol constraints. |
| * The **number of type parameters** is stored at **offset 9**. This count |
| includes only the primary formal type parameters. |
| * For each type parameter **n**, the following fields are stored: |
| |
| + The **number of witnesses** for the type parameter is stored at |
| **offset 10+n**. This is the number of witness table pointers that are |
| stored for the type parameter in the generic parameter vector. |
| |
| Note that there is no nominal type descriptor for protocols or protocol types. |
| See the `protocol descriptor`_ description below. |
| |
| Protocol Descriptor |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| `Protocol metadata` contains references to zero, one, or more **protocol |
| descriptors** that describe the protocols values of the type are required to |
| conform to. The protocol descriptor is laid out to be compatible with |
| Objective-C ``Protocol`` objects. The layout is as follows: |
| |
| - An **isa** placeholder is stored at **offset 0**. This field is populated by |
| the Objective-C runtime. |
| - The mangled **name** is referenced as a null-terminated C string at |
| **offset 1**. |
| - If the protocol inherits one or more other protocols, a pointer to the |
| **inherited protocols list** is stored at **offset 2**. The list starts with |
| the number of inherited protocols as a pointer-sized integer, and is followed |
| by that many protocol descriptor pointers. If the protocol inherits no other |
| protocols, this pointer is null. |
| - For an ObjC-compatible protocol, its **required instance methods** are stored |
| at **offset 3** as an ObjC-compatible method list. This is null for native |
| Swift protocols. |
| - For an ObjC-compatible protocol, its **required class methods** are stored |
| at **offset 4** as an ObjC-compatible method list. This is null for native |
| Swift protocols. |
| - For an ObjC-compatible protocol, its **optional instance methods** are stored |
| at **offset 5** as an ObjC-compatible method list. This is null for native |
| Swift protocols. |
| - For an ObjC-compatible protocol, its **optional class methods** are stored |
| at **offset 6** as an ObjC-compatible method list. This is null for native |
| Swift protocols. |
| - For an ObjC-compatible protocol, its **instance properties** are stored |
| at **offset 7** as an ObjC-compatible property list. This is null for native |
| Swift protocols. |
| - The **size** of the protocol descriptor record is stored as a 32-bit integer |
| at **offset 8**. This is currently 72 on 64-bit platforms and 40 on 32-bit |
| platforms. |
| - **Flags** are stored as a 32-bit integer after the size. The following bits |
| are currently used (counting from least significant bit zero): |
| |
| * **Bit 0** is the **Swift bit**. It is set for all protocols defined in |
| Swift and unset for protocols defined in Objective-C. |
| * **Bit 1** is the **class constraint bit**. It is set if the protocol is |
| **not** class-constrained, meaning that any struct, enum, or class type |
| may conform to the protocol. It is unset if only classes can conform to |
| the protocol. (The inverted meaning is for compatibility with Objective-C |
| protocol records, in which the bit is never set. Objective-C protocols can |
| only be conformed to by classes.) |
| * **Bit 2** is the **witness table bit**. It is set if dispatch to the |
| protocol's methods is done through a witness table, which is either passed |
| as an extra parameter to generic functions or included in the `existential |
| container layout`_ of protocol types. It is unset if dispatch is done |
| through ``objc_msgSend`` and requires no additional information to accompany |
| a value of conforming type. |
| * **Bit 31** is set by the Objective-C runtime when it has done its |
| initialization of the protocol record. It is unused by the Swift runtime. |
| |
| Heap Objects |
| ------------ |
| |
| Heap Metadata |
| ~~~~~~~~~~~~~ |
| |
| Heap Object Header |
| ~~~~~~~~~~~~~~~~~~ |
| |
| Mangling |
| -------- |
| :: |
| |
| mangled-name ::= '_S' global |
| |
| All Swift-mangled names begin with this prefix. |
| |
| The basic mangling scheme is a list of 'operators' where the operators are |
| structured in a post-fix order. For example the mangling may start with an |
| identifier but only later in the mangling a type-like operator defines how this |
| identifier has to be interpreted:: |
| |
| 4Test3FooC // The trailing 'C' says that 'Foo' is a class in module 'Test' |
| |
| |
| |
| Operators are either identifiers or a sequence of one or more characters, |
| like ``C`` for class. |
| All operators share the same name-space. Important operators are a single |
| character, which means that no other operator may start with the same |
| character. |
| |
| Some less important operators are longer and may also contain one or more |
| natural numbers. But it's always important that the demangler can identify the |
| end (the last character) of an operator. For example, it's not possible to |
| determince the last character if there are two operators ``M`` and ``Ma``: |
| ``a`` could belong to ``M`` or it could be the first charater of the next |
| operator. |
| |
| The intention of the post-fix order is to optimize for common pre-fixes. |
| Regardless, if it's the mangling for a metatype or a function in a module, the |
| mangled name will start with the module name (after the ``_S``). |
| |
| In the following, productions which are only _part_ of an operator, are |
| named with uppercase letters. |
| |
| Globals |
| ~~~~~~~ |
| |
| :: |
| |
| global ::= type 'N' // type metadata (address point) |
| // -- type starts with [BCOSTV] |
| global ::= type 'Mf' // 'full' type metadata (start of object) |
| global ::= type 'MP' // type metadata pattern |
| global ::= type 'Ma' // type metadata access function |
| global ::= type 'ML' // type metadata lazy cache variable |
| global ::= nomianl-type 'Mm' // class metaclass |
| global ::= nominal-type 'Mn' // nominal type descriptor |
| global ::= protocol 'Mp' // protocol descriptor |
| global ::= type 'MF' // metadata for remote mirrors: field descriptor |
| global ::= type 'MB' // metadata for remote mirrors: builtin type descriptor |
| global ::= protocol-conformance 'MA' // metadata for remote mirrors: associated type descriptor |
| global ::= nominal-type 'MC' // metadata for remote mirrors: superclass descriptor |
| |
| // TODO check this:: |
| global ::= mangled-name 'TA' // partial application forwarder |
| global ::= mangled-name 'Ta' // ObjC partial application forwarder |
| |
| global ::= type 'w' VALUE-WITNESS-KIND // value witness |
| global ::= protocol-conformance 'Wa' // protocol witness table accessor |
| global ::= protocol-conformance 'WG' // generic protocol witness table |
| global ::= protocol-conformance 'WI' // generic protocol witness table instantiation function |
| global ::= type protocol-conformance 'WL' // lazy protocol witness table cache variable |
| global ::= entity 'Wo' // witness table offset |
| global ::= protocol-conformance 'WP' // protocol witness table |
| |
| global ::= protocol-conformance identifier 'Wt' // associated type metadata accessor |
| global ::= protocol-conformance identifier nominal-type 'WT' // associated type witness table accessor |
| global ::= type protocol-conformance 'Wl' // lazy protocol witness table accessor |
| global ::= type 'WV' // value witness table |
| global ::= entity 'Wv' DIRECTNESS // field offset |
| |
| DIRECTNESS ::= 'd' // direct |
| DIRECTNESS ::= 'i' // indirect |
| |
| A direct symbol resolves directly to the address of an object. An |
| indirect symbol resolves to the address of a pointer to the object. |
| They are distinct manglings to make a certain class of bugs |
| immediately obvious. |
| |
| The terminology is slightly overloaded when discussing offsets. A |
| direct offset resolves to a variable holding the true offset. An |
| indirect offset resolves to a variable holding an offset to be applied |
| to type metadata to get the address of the true offset. (Offset |
| variables are required when the object being accessed lies within a |
| resilient structure. When the layout of the object may depend on |
| generic arguments, these offsets must be kept in metadata. Indirect |
| field offsets are therefore required when accessing fields in generic |
| types where the metadata itself has unknown layout.) |
| |
| :: |
| |
| global ::= global 'TO' // ObjC-as-swift thunk |
| global ::= global 'To' // swift-as-ObjC thunk |
| global ::= global 'TD' // dynamic dispatch thunk |
| global ::= global 'Td' // direct method reference thunk |
| global ::= global 'TV' // vtable override thunk |
| global ::= type 'D' // type mangling for the debugger. TODO: check if we really need this |
| global ::= protocol-conformance entity 'TW' // protocol witness thunk |
| global ::= context identifier identifier 'TB' // property behavior initializer thunk (not used currently) |
| global ::= context identifier identifier 'Tb' // property behavior setter thunk (not used currently) |
| global ::= global 'T_' specialization // reset substitutions before demangling specialization |
| global ::= entity // some identifiable thing |
| global ::= type type generic-signature? 'T' REABSTRACT-THUNK-TYPE // reabstraction thunk helper function |
| |
| REABSTRACT-THUNK-TYPE ::= 'R' // reabstraction thunk helper function |
| REABSTRACT-THUNK-TYPE ::= 'r' // reabstraction thunk |
| |
| The types in a reabstraction thunk helper function are always non-polymorphic |
| ``<impl-function-type>`` types. |
| |
| :: |
| |
| VALUE-WITNESS-KIND ::= 'al' // allocateBuffer |
| VALUE-WITNESS-KIND ::= 'ca' // assignWithCopy |
| VALUE-WITNESS-KIND ::= 'ta' // assignWithTake |
| VALUE-WITNESS-KIND ::= 'de' // deallocateBuffer |
| VALUE-WITNESS-KIND ::= 'xx' // destroy |
| VALUE-WITNESS-KIND ::= 'XX' // destroyBuffer |
| VALUE-WITNESS-KIND ::= 'Xx' // destroyArray |
| VALUE-WITNESS-KIND ::= 'CP' // initializeBufferWithCopyOfBuffer |
| VALUE-WITNESS-KIND ::= 'Cp' // initializeBufferWithCopy |
| VALUE-WITNESS-KIND ::= 'cp' // initializeWithCopy |
| VALUE-WITNESS-KIND ::= 'TK' // initializeBufferWithTakeOfBuffer |
| VALUE-WITNESS-KIND ::= 'Tk' // initializeBufferWithTake |
| VALUE-WITNESS-KIND ::= 'tk' // initializeWithTake |
| VALUE-WITNESS-KIND ::= 'pr' // projectBuffer |
| VALUE-WITNESS-KIND ::= 'xs' // storeExtraInhabitant |
| VALUE-WITNESS-KIND ::= 'xg' // getExtraInhabitantIndex |
| VALUE-WITNESS-KIND ::= 'Cc' // initializeArrayWithCopy |
| VALUE-WITNESS-KIND ::= 'Tt' // initializeArrayWithTakeFrontToBack |
| VALUE-WITNESS-KIND ::= 'tT' // initializeArrayWithTakeBackToFront |
| VALUE-WITNESS-KIND ::= 'ug' // getEnumTag |
| VALUE-WITNESS-KIND ::= 'up' // destructiveProjectEnumData |
| VALUE-WITNESS-KIND ::= 'ui' // destructiveInjectEnumTag |
| |
| ``<VALUE-WITNESS-KIND>`` differentiates the kinds of value |
| witness functions for a type. |
| |
| Entities |
| ~~~~~~~~ |
| |
| :: |
| |
| entity ::= nominal-type // named type declaration |
| entity ::= context entity-spec static? curry-thunk? |
| |
| static ::= 'Z' |
| curry-thunk ::= 'Tc' |
| |
| // The leading type is the function type |
| entity-spec ::= type 'fC' // allocating constructor |
| entity-spec ::= type 'fc' // non-allocating constructor |
| entity-spec ::= type 'fU' INDEX // explicit anonymous closure expression |
| entity-spec ::= type 'fu' INDEX // implicit anonymous closure |
| entity-spec ::= 'fA' INDEX // default argument N+1 generator |
| entity-spec ::= 'fi' // non-local variable initializer |
| entity-spec ::= 'fD' // deallocating destructor; untyped |
| entity-spec ::= 'fd' // non-deallocating destructor; untyped |
| entity-spec ::= 'fE' // ivar destroyer; untyped |
| entity-spec ::= 'fe' // ivar initializer; untyped |
| |
| entity-spec ::= decl-name function-signature generic-signature? 'F' // function |
| entity-spec ::= decl-name type 'i' // subscript ('i'ndex) itself (not the individual accessors) |
| entity-spec ::= decl-name type 'v' // variable |
| entity-spec ::= decl-name type 'f' ACCESSOR |
| entity-spec ::= decl-name type 'fp' // generic type parameter (not used?) |
| entity-spec ::= decl-name type 'fo' // enum element (currently not used) |
| |
| ACCESSOR ::= 'm' // materializeForSet |
| ACCESSOR ::= 's' // setter |
| ACCESSOR ::= 'g' // getter |
| ACCESSOR ::= 'G' // global getter |
| ACCESSOR ::= 'w' // willSet |
| ACCESSOR ::= 'W' // didSet |
| ACCESSOR ::= 'a' ADDRESSOR-KIND // mutable addressor |
| ACCESSOR ::= 'l' ADDRESSOR-KIND // non-mutable addressor |
| |
| ADDRESSOR-KIND ::= 'u' // unsafe addressor (no owner) |
| ADDRESSOR-KIND ::= 'O' // owning addressor (non-native owner) |
| ADDRESSOR-KIND ::= 'o' // owning addressor (native owner) |
| ADDRESSOR-KIND ::= 'p' // pinning addressor (native owner) |
| |
| decl-name ::= identifier |
| decl-name ::= identifier 'L' INDEX // locally-discriminated declaration |
| decl-name ::= identifier identifier 'LL' // file-discriminated declaration |
| |
| The first identifier in a file-discriminated ``<decl-name>>`` is a string that |
| represents the file the original declaration came from. |
| It should be considered unique within the enclosing module. |
| The second identifier is the name of the entity. |
| Not all declarations marked ``private`` declarations will use this mangling; |
| if the entity's context is enough to uniquely identify the entity, the simple |
| ``identifier`` form is preferred. |
| |
| Declaration Contexts |
| ~~~~~~~~~~~~~~~~~~~~ |
| |
| These manglings identify the enclosing context in which an entity was declared, |
| such as its enclosing module, function, or nominal type. |
| |
| :: |
| |
| context ::= module |
| context ::= entity |
| context ::= entity module generic-signature? 'E' |
| |
| An ``extension`` mangling is used whenever an entity's declaration context is |
| an extension *and* the entity being extended is in a different module. In this |
| case the extension's module is mangled first, followed by the entity being |
| extended. If the extension and the extended entity are in the same module, the |
| plain ``entity`` mangling is preferred. If the extension is constrained, the |
| constraints on the extension are mangled in its generic signature. |
| |
| When mangling the context of a local entity within a constructor or |
| destructor, the non-allocating or non-deallocating variant is used. |
| |
| :: |
| |
| module ::= identifier // module name |
| module ::= known-module // abbreviation |
| |
| known-module ::= 's' // Swift |
| known-module ::= 'SC' // C |
| known-module ::= 'So' // Objective-C |
| |
| The Objective-C module is used as the context for mangling Objective-C |
| classes as ``<type>``\ s. |
| |
| |
| Types |
| ~~~~~ |
| |
| :: |
| |
| nominal-type ::= substitution |
| nominal-type ::= context decl-name 'C' // nominal class type |
| nominal-type ::= context decl-name 'O' // nominal enum type |
| nominal-type ::= context decl-name 'V' // nominal struct type |
| nominal-type ::= protocol 'P' // nominal protocol type |
| |
| nominal-type ::= known-nominal-type |
| |
| known-nominal-type ::= 'Sa' // Swift.Array |
| known-nominal-type ::= 'Sb' // Swift.Bool |
| known-nominal-type ::= 'Sc' // Swift.UnicodeScalar |
| known-nominal-type ::= 'Sd' // Swift.Float64 |
| known-nominal-type ::= 'Sf' // Swift.Float32 |
| known-nominal-type ::= 'Si' // Swift.Int |
| known-nominal-type ::= 'SV' // Swift.UnsafeRawPointer |
| known-nominal-type ::= 'Sv' // Swift.UnsafeMutableRawPointer |
| known-nominal-type ::= 'SP' // Swift.UnsafePointer |
| known-nominal-type ::= 'Sp' // Swift.UnsafeMutablePointer |
| known-nominal-type ::= 'SQ' // Swift.ImplicitlyUnwrappedOptional |
| known-nominal-type ::= 'Sq' // Swift.Optional |
| known-nominal-type ::= 'SR' // Swift.UnsafeBufferPointer |
| known-nominal-type ::= 'Sr' // Swift.UnsafeMutableBufferPointer |
| known-nominal-type ::= 'SS' // Swift.String |
| known-nominal-type ::= 'Su' // Swift.UInt |
| |
| protocol ::= context decl-name |
| |
| type ::= 'Bb' // Builtin.BridgeObject |
| type ::= 'BB' // Builtin.UnsafeValueBuffer |
| type ::= 'Bf' NATURAL '_' // Builtin.Float<n> |
| type ::= 'Bi' NATURAL '_' // Builtin.Int<n> |
| type ::= 'BO' // Builtin.UnknownObject |
| type ::= 'Bo' // Builtin.NativeObject |
| type ::= 'Bp' // Builtin.RawPointer |
| type ::= type 'Bv' NATURAL '_' // Builtin.Vec<n>x<type> |
| type ::= 'Bw' // Builtin.Word |
| type ::= context decl-name 'a' // Type alias (DWARF only) |
| type ::= function-signature 'c' // function type |
| type ::= function-signature 'X' FUNCTION-KIND // special function type |
| type ::= type-list+ type 'G' // bound generic type (one type-list per nesting level of type) |
| type ::= type 'Xo' // @unowned type |
| type ::= type 'Xu' // @unowned(unsafe) type |
| type ::= type 'Xw' // @weak type |
| type ::= impl-function-type 'XF' // function implementation type (currently unused) |
| type ::= type 'Xb' // SIL @box type (deprecated) |
| type ::= type-list 'Xx' // SIL box type |
| type ::= type-list type-list generic-signature 'XX' |
| // Generic SIL box type |
| type ::= type 'XD' // dynamic self type |
| type ::= type 'm' // metatype without representation |
| type ::= type 'XM' METATYPE-REPR // metatype with representation |
| type ::= type 'Xp' // existential metatype without representation |
| type ::= type 'Xm' METATYPE-REPR // existential metatype with representation |
| type ::= 'Xe' // error or unresolved type |
| |
| |
| FUNCTION-KIND ::= 'f' // @thin function type |
| FUNCTION-KIND ::= 'U' // uncurried function type (currently not used) |
| FUNCTION-KIND ::= 'K' // @auto_closure function type |
| FUNCTION-KIND ::= 'B' // objc block function type |
| FUNCTION-KIND ::= 'C' // C function pointer type |
| |
| function-signature ::= params-type params-type throws? // results and parameters |
| |
| params-type := type // tuple in case of multiple parameters |
| params-type := empty-list // shortcut for no parameters |
| |
| throws ::= 'K' // 'throws' annotation on function types |
| |
| type-list ::= list-type '_' list-type* 'd'? // list of types with optional variadic specifier |
| type-list ::= empty-list |
| |
| list-type ::= type identifier? 'z'? // type with optional label and inout convention |
| |
| METATYPE-REPR ::= 't' // Thin metatype representation |
| METATYPE-REPR ::= 'T' // Thick metatype representation |
| METATYPE-REPR ::= 'o' // ObjC metatype representation |
| |
| type ::= archetype |
| type ::= associated-type |
| type ::= nominal-type |
| type ::= protocol-list 'p' // existential type |
| type ::= type-list 't' // tuple |
| type ::= type generic-signature 'u' // generic type |
| type ::= 'x' // generic param, depth=0, idx=0 |
| type ::= 'q' GENERIC-PARAM-INDEX // dependent generic parameter |
| type ::= type assoc-type-name 'qa' // associated type of non-generic param |
| type ::= assoc-type-name 'Qy' GENERIC-PARAM-INDEX // associated type |
| type ::= assoc-type-name 'Qz' // shortcut for 'Qyz' |
| type ::= assoc-type-list 'QY' GENERIC-PARAM-INDEX // associated type at depth |
| type ::= assoc-type-list 'QZ' // shortcut for 'QYz' |
| |
| protocol-list ::= protocol '_' protocol* |
| protocol-list ::= empty-list |
| |
| assoc-type-list ::= assoc-type-name '_' assoc-type-name* |
| |
| archetype ::= 'Q' INDEX // archetype with depth=0, idx=N |
| archetype ::= 'Qd' INDEX INDEX // archetype with depth=M+1, idx=N |
| archetype ::= context 'Qq' INDEX // archetype+context (DWARF only) |
| archetype ::= associated-type |
| |
| associated-type ::= substitution |
| associated-type ::= protocol 'QP' // self type of protocol |
| associated-type ::= archetype identifier 'Qa' // associated type |
| |
| assoc-type-name ::= identifier // associated type name without protocol |
| assoc-type-name ::= identifier protocol 'P' // |
| |
| empty-list ::= 'y' |
| |
| Associated types use an abbreviated mangling when the base generic parameter |
| or associated type is constrained by a single protocol requirement. The |
| associated type in this case can be referenced unambiguously by name alone. |
| If the base has multiple conformance constraints, then the protocol name is |
| mangled in to disambiguate. |
| |
| :: |
| |
| impl-function-type ::= type* 'I' FUNC-ATTRIBUTES '_' |
| impl-function-type ::= type* generic-signature 'I' PSEUDO-GENERIC? FUNC-ATTRIBUTES '_' |
| |
| FUNC-ATTRIBUTES ::= CALLEE-CONVENTION? FUNC-REPRESENTATION PARAM-CONVENTION* RESULT-CONVENTION* ('z' RESULT-CONVENTION) |
| |
| PSEUDO-GENERIC ::= 'P' |
| |
| CALLEE-CONVENTION ::= 'y' // @callee_unowned |
| CALLEE-CONVENTION ::= 'g' // @callee_guaranteed |
| CALLEE-CONVENTION ::= 'x' // @callee_owned |
| CALLEE-CONVENTION ::= 't' // thin |
| |
| FUNC-REPRESENTATION ::= 'B' // C block invocation function |
| FUNC-REPRESENTATION ::= 'C' // C global function |
| FUNC-REPRESENTATION ::= 'M' // Swift method |
| FUNC-REPRESENTATION ::= 'J' // ObjC method |
| FUNC-REPRESENTATION ::= 'K' // closure |
| FUNC-REPRESENTATION ::= 'W' // protocol witness |
| |
| PARAM-CONVENTION ::= 'i' // indirect in |
| PARAM-CONVENTION ::= 'l' // indirect inout |
| PARAM-CONVENTION ::= 'b' // indirect inout aliasable |
| PARAM-CONVENTION ::= 'n' // indirect in guaranteed |
| PARAM-CONVENTION ::= 'x' // direct owned |
| PARAM-CONVENTION ::= 'y' // direct unowned |
| PARAM-CONVENTION ::= 'g' // direct guaranteed |
| PARAM-CONVENTION ::= 'e' // direct deallocating |
| |
| RESULT-CONVENTION ::= 'r' // indirect |
| RESULT-CONVENTION ::= 'o' // owned |
| RESULT-CONVENTION ::= 'd' // unowned |
| RESULT-CONVENTION ::= 'u' // unowned inner pointer |
| RESULT-CONVENTION ::= 'a' // auto-released |
| |
| For the most part, manglings follow the structure of formal language |
| types. However, in some cases it is more useful to encode the exact |
| implementation details of a function type. |
| |
| The ``type*`` list contains parameter and return types (including the error |
| result), in that order. |
| The number of parameters and results must match with the number of |
| ``<PARAM-CONVENTION>`` and ``<RESULT-CONVENTION>`` characters after the |
| ``<FUNC-REPRESENTATION>``. |
| The ``<generic-signature>`` is used if the function is polymorphic. |
| |
| Generics |
| ~~~~~~~~ |
| |
| :: |
| |
| protocol-conformance ::= type protocol module generic-signature? |
| |
| ``<protocol-conformance>`` refers to a type's conformance to a protocol. The |
| named module is the one containing the extension or type declaration that |
| declared the conformance. |
| |
| :: |
| |
| protocol-conformance ::= context identifier protocol identifier generic-signature? // Property behavior conformance |
| |
| Property behaviors are implemented using private protocol conformances. |
| |
| :: |
| |
| generic-signature ::= requirement* 'l' // one generic parameter |
| generic-signature ::= requirement* 'r' GENERIC-PARAM-COUNT* 'l' |
| |
| GENERIC-PARAM-COUNT ::= 'z' // zero parameters |
| GENERIC-PARAM-COUNT ::= INDEX // N+1 parameters |
| |
| requirement ::= protocol 'R' GENERIC-PARAM-INDEX // protocol requirement |
| requirement ::= protocol assoc-type-name 'Rp' GENERIC-PARAM-INDEX // protocol requirement on associated type |
| requirement ::= protocol assoc-type-list 'RP' GENERIC-PARAM-INDEX // protocol requirement on associated type at depth |
| requirement ::= protocol substitution 'RQ' // protocol requirement with substitution |
| requirement ::= type 'Rb' GENERIC-PARAM-INDEX // base class requirement |
| requirement ::= type assoc-type-name 'Rc' GENERIC-PARAM-INDEX // base class requirement on associated type |
| requirement ::= type assoc-type-list 'RC' GENERIC-PARAM-INDEX // base class requirement on associated type at depth |
| requirement ::= type substitution 'RB' // base class requirement with substitution |
| requirement ::= type 'Rs' GENERIC-PARAM-INDEX // same-type requirement |
| requirement ::= type assoc-type-name 'Rt' GENERIC-PARAM-INDEX // same-type requirement on associated type |
| requirement ::= type assoc-type-list 'RT' GENERIC-PARAM-INDEX // same-type requirement on associated type at depth |
| requirement ::= type substitution 'RS' // same-type requirement with substitution |
| |
| GENERIC-PARAM-INDEX ::= 'z' // depth = 0, idx = 0 |
| GENERIC-PARAM-INDEX ::= INDEX // depth = 0, idx = N+1 |
| GENERIC-PARAM-INDEX ::= 'd' INDEX INDEX // depth = M+1, idx = N |
| |
| |
| A generic signature begins with an optional list of requirements. |
| The ``<GENERIC-PARAM-COUNT>`` describes the number of generic parameters at |
| each depth of the signature. As a special case, no ``<GENERIC-PARAM-COUNT>`` |
| values indicates a single generic parameter at the outermost depth:: |
| |
| x_xCru // <T_0_0> T_0_0 -> T_0_0 |
| d_0__xCr_0_u // <T_0_0><T_1_0, T_1_1> T_0_0 -> T_1_1 |
| |
| A generic signature must only preceed an operator character which is different |
| from any character in a ``<GENERIC-PARAM-COUNT>``. |
| |
| Identifiers |
| ~~~~~~~~~~~ |
| |
| :: |
| |
| identifier ::= substitution |
| identifier ::= NATURAL IDENTIFIER-STRING // identifier without word substitutions |
| identifier ::= '0' IDENTIFIER-PART // identifier with word substitutions |
| |
| IDENTIFIER-PART ::= NATURAL IDENTIFIER-STRING |
| IDENTIFIER-PART ::= [a-z] // word substitution (except the last one) |
| IDENTIFIER-PART ::= [A-Z] // last word substitution in identifier |
| |
| IDENTIFIER-STRING ::= IDENTIFIER-START-CHAR IDENTIFIER-CHAR* |
| IDENTIFIER-START-CHAR ::= [_a-zA-Z] |
| IDENTIFIER-CHAR ::= [_$a-zA-Z0-9] |
| |
| ``<identifier>`` is run-length encoded: the natural indicates how many |
| characters follow. Operator characters are mapped to letter characters as |
| given. In neither case can an identifier start with a digit, so |
| there's no ambiguity with the run-length. |
| |
| If the run-length start with a ``0`` the identifier string contains |
| word substitutions. A word is a sub-string of an identifier which contains |
| letters and digits ``[A-Za-z0-9]``. Words are separated by underscores |
| ``_``. In addition a new word begins with an uppercase letter ``[A-Z]`` |
| if the previous character is not an uppercase letter:: |
| |
| Abc1DefG2HI // contains four words 'Abc1', 'Def' and 'G2' and 'HI' |
| _abc1_def_G2hi // contains three words 'abc1', 'def' and G2hi |
| |
| The words of all identifiers, which are encoded in the current mangling are |
| enumerated and assigned to a letter: a = first word, b = second word, etc. |
| |
| An identifier containing word substitutions is a sequence of run-length encoded |
| sub-strings and references to previously mangled words. |
| All but the last word-references are lowercase letters and the last one is an |
| uppercase letter. If there is no literal sub-string after the last |
| word-reference, the last word-reference is followed by a ``0``. |
| |
| Let's assume the current mangling already encoded the identifier ``AbcDefGHI``:: |
| |
| 02Myac1_B // expands to: MyAbcGHI_Def |
| |
| A maximum of 26 words in a mangling can be used for substitutions. |
| |
| :: |
| |
| identifier ::= '00' natural '_'? IDENTIFIER-CHAR+ // '_' is inserted if the identifer starts with a digit or '_'. |
| |
| Identifiers that contain non-ASCII characters are encoded using the Punycode |
| algorithm specified in RFC 3492, with the modifications that ``_`` is used |
| as the encoding delimiter, and uppercase letters A through J are used in place |
| of digits 0 through 9 in the encoding character set. The mangling then |
| consists of an ``00`` followed by the run length of the encoded string and the |
| encoded string itself. For example, the identifier ``vergüenza`` is mangled |
| to ``0012vergenza_JFa``. (The encoding in standard Punycode would be |
| ``vergenza-95a``) |
| |
| :: |
| |
| identifier ::= identifier 'o' OPERATOR-FIXITY |
| |
| OPERATOR-FIXITY ::= 'p' // prefix operator |
| OPERATOR-FIXITY ::= 'P' // postfix operator |
| OPERATOR-FIXITY ::= 'i' // infix operator |
| |
| OPERATOR-CHAR ::= 'a' // & 'and' |
| OPERATOR-CHAR ::= 'c' // @ 'commercial at' |
| OPERATOR-CHAR ::= 'd' // / 'divide' |
| OPERATOR-CHAR ::= 'e' // = 'equals' |
| OPERATOR-CHAR ::= 'g' // > 'greater' |
| OPERATOR-CHAR ::= 'l' // < 'less' |
| OPERATOR-CHAR ::= 'm' // * 'multiply' |
| OPERATOR-CHAR ::= 'n' // ! 'not' |
| OPERATOR-CHAR ::= 'o' // | 'or' |
| OPERATOR-CHAR ::= 'p' // + 'plus' |
| OPERATOR-CHAR ::= 'q' // ? 'question' |
| OPERATOR-CHAR ::= 'r' // % 'remainder' |
| OPERATOR-CHAR ::= 's' // - 'subtract' |
| OPERATOR-CHAR ::= 't' // ~ 'tilde' |
| OPERATOR-CHAR ::= 'x' // ^ 'xor' |
| OPERATOR-CHAR ::= 'z' // . 'zperiod' |
| |
| If an identifier is followed by an ``o`` its text is interpreted as an |
| operator. Each lowercase character maps to an operator character |
| (``OPERATOR-CHAR``). |
| |
| Operators that contain non-ASCII characters are mangled by first mapping the |
| ASCII operator characters to letters as for pure ASCII operator names, then |
| Punycode-encoding the substituted string. |
| For example, the infix operator ``«+»`` is mangled to |
| ``007p_qcaDcoi`` (``p_qcaDc`` being the encoding of the substituted |
| string ``«p»``). |
| |
| Substitutions |
| ~~~~~~~~~~~~~ |
| |
| :: |
| |
| substitution ::= 'A' INDEX // substiution of N+26 |
| substitution ::= 'A' [a-z]* [A-Z] // One or more consecutive substitutions of N < 26 |
| |
| |
| ``<substitution>`` is a back-reference to a previously mangled entity. The mangling |
| algorithm maintains a mapping of entities to substitution indices as it runs. |
| When an entity that can be represented by a substitution (a module, nominal |
| type, or protocol) is mangled, a substitution is first looked for in the |
| substitution map, and if it is present, the entity is mangled using the |
| associated substitution index. Otherwise, the entity is mangled normally, and |
| it is then added to the substitution map and associated with the next |
| available substitution index. |
| |
| For example, in mangling a function type |
| ``(zim.zang.zung, zim.zang.zung, zim.zippity) -> zim.zang.zoo`` (with module |
| ``zim`` and class ``zim.zang``), |
| the recurring contexts ``zim``, ``zim.zang``, and ``zim.zang.zung`` |
| will be mangled using substitutions after being mangled |
| for the first time. The first argument type will mangle in long form, |
| ``3zim4zang4zung``, and in doing so, ``zim`` will acquire substitution ``AA``, |
| ``zim.zang`` will acquire substitution ``AB``, and ``zim.zang.zung`` will |
| acquire ``AC``. The second argument is the same as the first and will mangle |
| using its substitution, ``AC``. The |
| third argument type will mangle using the substitution for ``zim``, |
| ``AA7zippity``. (It also acquires substitution ``AD`` which would be used |
| if it mangled again.) The result type will mangle using the substitution for |
| ``zim.zang``, ``AB3zoo`` (and acquire substitution ``AE``). |
| |
| There are some pre-defined substitutions, see ``<known-nominal-type>``. |
| |
| If the mangling contains two or more consecutive substitutions, it can be |
| abbreviated with the ``A`` substitution. Similar to word-substitutions the |
| index is encoded as letters, whereas the last letter is uppercase:: |
| |
| AaeB // equivalent to A_A4_A0_ |
| |
| |
| Numbers and Indexes |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| :: |
| |
| INDEX ::= '_' // 0 |
| INDEX ::= NATURAL '_' // N+1 |
| NATURAL ::= [1-9] [0-9]* |
| NATURAL_ZERO ::= [0-9]+ |
| |
| ``<INDEX>`` is a production for encoding numbers in contexts that can't |
| end in a digit; it's optimized for encoding smaller numbers. |
| |
| Function Specializations |
| ~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| :: |
| |
| specialization ::= type '_' type* 'Tg' SPEC-INFO // Generic re-abstracted specialization |
| specialization ::= type '_' type* 'TG' SPEC-INFO // Generic not re-abstracted specialization |
| |
| The types are the replacement types of the substitution list. |
| |
| :: |
| |
| specialization ::= function-signature generic-signature 'TG' SPEC-INFO // Partial generic specialization |
| |
| A type in the ``<function-signature>`` of a partial generic specialization |
| can be ``Sn`` which means that the parameter type is the same as in the |
| original function's signature. |
| |
| :: |
| |
| specialization ::= spec-arg* 'Tf' SPEC-INFO ARG-SPEC-KIND* '_' ARG-SPEC-KIND // Function signature specialization kind |
| |
| The ``<ARG-SPEC-KIND>`` describes how arguments are specialized. |
| Some kinds need arguments, which preceed ``Tf``. |
| |
| :: |
| |
| spec-arg ::= identifier |
| spec-arg ::= type |
| |
| SPEC-INFO ::= FRAGILE? PASSID |
| |
| PASSID ::= '0' // AllocBoxToStack, |
| PASSID ::= '1' // ClosureSpecializer, |
| PASSID ::= '2' // CapturePromotion, |
| PASSID ::= '3' // CapturePropagation, |
| PASSID ::= '4' // FunctionSignatureOpts, |
| PASSID ::= '5' // GenericSpecializer, |
| |
| FRAGILE ::= 'q' |
| |
| ARG-SPEC-KIND ::= 'n' // Unmodified argument |
| ARG-SPEC-KIND ::= 'c' // Consumes n 'type' arguments which are closed over types in argument order |
| // and one 'identifier' argument which is the closure symbol name |
| ARG-SPEC-KIND ::= 'p' CONST-PROP // Constant propagated argument |
| ARG-SPEC-KIND ::= 'd' 'G'? 'X'? // Dead argument, with optional owned=>guaranteed or exploded-specifier |
| ARG-SPEC-KIND ::= 'g' 'X'? // Owned => Guaranteed,, with optional exploded-specifier |
| ARG-SPEC-KIND ::= 'x' // Exploded |
| ARG-SPEC-KIND ::= 'i' // Box to value |
| ARG-SPEC-KIND ::= 's' // Box to stack |
| |
| CONST-PROP ::= 'f' // Consumes one identifier argument which is a function symbol name |
| CONST-PROP ::= 'g' // Consumes one identifier argument which is a global symbol name |
| CONST-PROP ::= 'i' NATURAL_ZERO // 64-bit-integer |
| CONST-PROP ::= 'd' NATURAL_ZERO // float-as-64-bit-integer |
| CONST-PROP ::= 's' ENCODING // string literal. Consumes one identifier argument. |
| |
| ENCODING ::= 'b' // utf8 |
| ENCODING ::= 'w' // utf16 |
| ENCODING ::= 'c' // utf16 |
| |
| If the first character of the string literal is a digit ``[0-9]`` or an |
| underscore ``_``, the identifier for the string literal is prefixed with an |
| additional underscore ``_``. |