blob: 4d6feb3e516bd468f66c75dd0a02d6b374d55d74 [file] [log] [blame]
:orphan:
.. _ABI:
.. highlight:: none
Type Layout
-----------
Hard Constraints on Resilience
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The root of a class hierarchy must remain stable, at pain of
invalidating the metaclass hierarchy. Note that a Swift class without an
explicit base class is implicitly rooted in the SwiftObject
Objective-C class.
Fragile Struct and Tuple Layout
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structs and tuples currently share the same layout algorithm, noted as the
"Universal" layout algorithm in the compiler implementation. The algorithm
is as follows:
- Start with a **size** of **0** and an **alignment** of **1**.
- Iterate through the fields, in element order for tuples, or in ``var``
declaration order for structs. For each field:
* Update **size** by rounding up to the **alignment of the field**, that is,
increasing it to the least value greater or equal to **size** and evenly
divisible by the **alignment of the field**.
* Assign the **offset of the field** to the current value of **size**.
* Update **size** by adding the **size of the field**.
* Update **alignment** to the max of **alignment** and the
**alignment of the field**.
- The final **size** and **alignment** are the size and alignment of the
aggregate. The **stride** of the type is the final **size** rounded up to
**alignment**.
Note that this differs from C or LLVM's normal layout rules in that *size*
and *stride* are distinct; whereas C layout requires that an embedded struct's
size be padded out to its alignment and that nothing be laid out there,
Swift layout allows an outer struct to lay out fields in the inner struct's
tail padding, alignment permitting. Unlike C, zero-sized structs and tuples
are also allowed, and take up no storage in enclosing aggregates. The Swift
compiler emits LLVM packed struct types with manual padding to get the
necessary control over the binary layout. Some examples:
::
// LLVM <{ i64, i8 }>
struct S {
var x: Int
var y: UInt8
}
// LLVM <{ i8, [7 x i8], <{ i64, i8 }>, i8 }>
struct S2 {
var x: UInt8
var s: S
var y: UInt8
}
// LLVM <{}>
struct Empty {}
// LLVM <{ i64, i64 }>
struct ContainsEmpty {
var x: Int
var y: Empty
var z: Int
}
Class Layout
~~~~~~~~~~~~
Swift relies on the following assumptions about the Objective-C runtime,
which are therefore now part of the Objective-C ABI:
- 32-bit platforms never have tagged pointers. ObjC pointer types are
either nil or an object pointer.
- On x86-64, a tagged pointer either sets the lowest bit of the pointer
or the highest bit of the pointer. Therefore, both of these bits are
zero if and only if the value is not a tagged pointer.
- On ARM64, a tagged pointer always sets the highest bit of the pointer.
- 32-bit platforms never perform any isa masking. ``object_getClass``
is always equivalent to ``*(Class*)object``.
- 64-bit platforms perform isa masking only if the runtime exports a
symbol ``uintptr_t objc_debug_isa_class_mask;``. If this symbol
is exported, ``object_getClass`` on a non-tagged pointer is always
equivalent to ``(Class)(objc_debug_isa_class_mask & *(uintptr_t*)object)``.
- The superclass field of a class object is always stored immediately
after the isa field. Its value is either nil or a pointer to the
class object for the superclass; it never has other bits set.
The following assumptions are part of the Swift ABI:
- Swift class pointers are never tagged pointers.
TODO
Fragile Enum Layout
~~~~~~~~~~~~~~~~~~~
In laying out enum types, the ABI attempts to avoid requiring additional
storage to store the tag for the enum case. The ABI chooses one of five
strategies based on the layout of the enum:
Empty Enums
```````````
In the degenerate case of an enum with no cases, the enum is an empty type.
::
enum Empty {} // => empty type
Single-Case Enums
`````````````````
In the degenerate case of an enum with a single case, there is no
discriminator needed, and the enum type has the exact same layout as its
case's data type, or is empty if the case has no data type.
::
enum EmptyCase { case X } // => empty type
enum DataCase { case Y(Int, Double) } // => LLVM <{ i64, double }>
C-Like Enums
````````````
If none of the cases has a data type (a "C-like" enum), then the enum
is laid out as an integer tag with the minimal number of bits to contain
all of the cases. The machine-level layout of the type then follows LLVM's
data layout rules for integer types on the target platform. The cases are
assigned tag values in declaration order.
::
enum EnumLike2 { // => LLVM i1
case A // => i1 0
case B // => i1 1
}
enum EnumLike8 { // => LLVM i3
case A // => i3 0
case B // => i3 1
case C // => i3 2
case D // etc.
case E
case F
case G
case H
}
Discriminator values after the one used for the last case become *extra
inhabitants* of the enum type (see `Single-Payload Enums`_).
Single-Payload Enums
````````````````````
If an enum has a single case with a data type and one or more no-data cases
(a "single-payload" enum), then the case with data type is represented using
the data type's binary representation, with added zero bits for tag if
necessary. If the data type's binary representation
has **extra inhabitants**, that is, bit patterns with the size and alignment of
the type but which do not form valid values of that type, they are used to
represent the no-data cases, with extra inhabitants in order of ascending
numeric value matching no-data cases in declaration order. If the type
has *spare bits* (see `Multi-Payload Enums`_), they are used to form extra
inhabitants. The enum value is then represented as an integer with the storage
size in bits of the data type. Extra inhabitants of the payload type not used
by the enum type become extra inhabitants of the enum type itself.
::
enum CharOrSectionMarker { => LLVM i32
case Paragraph => i32 0x0020_0000
case Char(UnicodeScalar) => i32 (zext i21 %Char to i32)
case Chapter => i32 0x0020_0001
}
CharOrSectionMarker.Char('\x00') => i32 0x0000_0000
CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF
enum CharOrSectionMarkerOrFootnoteMarker { => LLVM i32
case CharOrSectionMarker(CharOrSectionMarker) => i32 %CharOrSectionMarker
case Asterisk => i32 0x0020_0002
case Dagger => i32 0x0020_0003
case DoubleDagger => i32 0x0020_0004
}
If the data type has no extra inhabitants, or there are not enough extra
inhabitants to represent all of the no-data cases, then a tag bit is added
to the enum's representation. The tag bit is set for the no-data cases, which
are then assigned values in the data area of the enum in declaration order.
::
enum IntOrInfinity { => LLVM <{ i64, i1 }>
case NegInfinity => <{ i64, i1 }> { 0, 1 }
case Int(Int) => <{ i64, i1 }> { %Int, 0 }
case PosInfinity => <{ i64, i1 }> { 1, 1 }
}
IntOrInfinity.Int( 0) => <{ i64, i1 }> { 0, 0 }
IntOrInfinity.Int(20721) => <{ i64, i1 }> { 20721, 0 }
Multi-Payload Enums
```````````````````
If an enum has more than one case with data type, then a tag is necessary to
discriminate the data types. The ABI will first try to find common
**spare bits**, that is, bits in the data types' binary representations which are
either fixed-zero or ignored by valid values of all of the data types. The tag
will be scattered into these spare bits as much as possible. Currently only
spare bits of primitive integer types, such as the high bits of an ``i21``
type, are considered. The enum data is represented as an integer with the
storage size in bits of the largest data type.
::
enum TerminalChar { => LLVM i32
case Plain(UnicodeScalar) => i32 (zext i21 %Plain to i32)
case Bold(UnicodeScalar) => i32 (or (zext i21 %Bold to i32), 0x0020_0000)
case Underline(UnicodeScalar) => i32 (or (zext i21 %Underline to i32), 0x0040_0000)
case Blink(UnicodeScalar) => i32 (or (zext i21 %Blink to i32), 0x0060_0000)
case Empty => i32 0x0080_0000
case Cursor => i32 0x0080_0001
}
If there are not enough spare bits to contain the tag, then additional bits are
added to the representation to contain the tag. Tag values are
assigned to data cases in declaration order. If there are no-data cases, they
are collected under a common tag, and assigned values in the data area of the
enum in declaration order.
::
class Bignum {}
enum IntDoubleOrBignum { => LLVM <{ i64, i2 }>
case Int(Int) => <{ i64, i2 }> { %Int, 0 }
case Double(Double) => <{ i64, i2 }> { (bitcast %Double to i64), 1 }
case Bignum(Bignum) => <{ i64, i2 }> { (ptrtoint %Bignum to i64), 2 }
}
Existential Container Layout
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Values of protocol type, protocol composition type, or ``Any`` type are laid
out using **existential containers** (so-called because these types are
"existential types" in type theory).
Opaque Existential Containers
`````````````````````````````
If there is no class constraint on a protocol or protocol composition type,
the existential container has to accommodate a value of arbitrary size and
alignment. It does this using a **fixed-size buffer**, which is three pointers
in size and pointer-aligned. This either directly contains the value, if its
size and alignment are both less than or equal to the fixed-size buffer's, or
contains a pointer to a side allocation owned by the existential container.
The type of the contained value is identified by its `type metadata` record,
and witness tables for all of the required protocol conformances are included.
The layout is as if declared in the following C struct::
struct OpaqueExistentialContainer {
void *fixedSizeBuffer[3];
Metadata *type;
WitnessTable *witnessTables[NUM_WITNESS_TABLES];
};
Class Existential Containers
````````````````````````````
If one or more of the protocols in a protocol or protocol composition type
have a class constraint, then only class values can be stored in the existential
container, and a more efficient representation is used. Class instances are
always a single pointer in size, so a fixed-size buffer and potential side
allocation is not needed, and class instances always have a reference to their
own type metadata, so the separate metadata record is not needed. The
layout is thus as if declared in the following C struct::
struct ClassExistentialContainer {
HeapObject *value;
WitnessTable *witnessTables[NUM_WITNESS_TABLES];
};
Note that if no witness tables are needed, such as for the "any class" type
``protocol<class>`` or an Objective-C protocol type, then the only element of
the layout is the heap object pointer. This is ABI-compatible with ``id``
and ``id <Protocol>`` types in Objective-C.