docs/ABI/TypeLayout.rst - third_party/swift - Git at Google

 :orphan:

 .. _ABI:

 .. highlight:: none

 Type Layout
 -----------

 Hard Constraints on Resilience
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 The root of a class hierarchy must remain stable, at pain of
 invalidating the metaclass hierarchy.  Note that a Swift class without an
 explicit base class is implicitly rooted in the SwiftObject
 Objective-C class.


 Fragile Struct and Tuple Layout
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Structs and tuples currently share the same layout algorithm, noted as the
 "Universal" layout algorithm in the compiler implementation. The algorithm
 is as follows:

 - Start with a **size** of **0** and an **alignment** of **1**.
 - Iterate through the fields, in element order for tuples, or in ``var``
   declaration order for structs. For each field:

   * Update **size** by rounding up to the **alignment of the field**, that is,
     increasing it to the least value greater or equal to **size** and evenly
     divisible by the **alignment of the field**.
   * Assign the **offset of the field** to the current value of **size**.
   * Update **size** by adding the **size of the field**.
   * Update **alignment** to the max of **alignment** and the
     **alignment of the field**.

 - The final **size** and **alignment** are the size and alignment of the
   aggregate. The **stride** of the type is the final **size** rounded up to
   **alignment**.

 Note that this differs from C or LLVM's normal layout rules in that *size*
 and *stride* are distinct; whereas C layout requires that an embedded struct's
 size be padded out to its alignment and that nothing be laid out there,
 Swift layout allows an outer struct to lay out fields in the inner struct's
 tail padding, alignment permitting. Unlike C, zero-sized structs and tuples
 are also allowed, and take up no storage in enclosing aggregates. The Swift
 compiler emits LLVM packed struct types with manual padding to get the
 necessary control over the binary layout. Some examples:

 ::

   // LLVM <{ i64, i8 }>
   struct S {
     var x: Int
     var y: UInt8
   }

   // LLVM <{ i8, [7 x i8], <{ i64, i8 }>, i8 }>
   struct S2 {
     var x: UInt8
     var s: S
     var y: UInt8
   }

   // LLVM <{}>
   struct Empty {}

   // LLVM <{ i64, i64 }>
   struct ContainsEmpty {
     var x: Int
     var y: Empty
     var z: Int
   }

 Class Layout
 ~~~~~~~~~~~~

 Swift relies on the following assumptions about the Objective-C runtime,
 which are therefore now part of the Objective-C ABI:

 - 32-bit platforms never have tagged pointers.  ObjC pointer types are
   either nil or an object pointer.

 - On x86-64, a tagged pointer either sets the lowest bit of the pointer
   or the highest bit of the pointer.  Therefore, both of these bits are
   zero if and only if the value is not a tagged pointer.

 - On ARM64, a tagged pointer always sets the highest bit of the pointer.

 - 32-bit platforms never perform any isa masking.  ``object_getClass``
   is always equivalent to ``*(Class*)object``.

 - 64-bit platforms perform isa masking only if the runtime exports a
   symbol ``uintptr_t objc_debug_isa_class_mask;``.  If this symbol
   is exported, ``object_getClass`` on a non-tagged pointer is always
   equivalent to ``(Class)(objc_debug_isa_class_mask & *(uintptr_t*)object)``.

 - The superclass field of a class object is always stored immediately
   after the isa field.  Its value is either nil or a pointer to the
   class object for the superclass; it never has other bits set.

 The following assumptions are part of the Swift ABI:

 - Swift class pointers are never tagged pointers.

 TODO

 Fragile Enum Layout
 ~~~~~~~~~~~~~~~~~~~

 In laying out enum types, the ABI attempts to avoid requiring additional
 storage to store the tag for the enum case. The ABI chooses one of five
 strategies based on the layout of the enum:

 Empty Enums
 ```````````

 In the degenerate case of an enum with no cases, the enum is an empty type.

 ::

   enum Empty {} // => empty type

 Single-Case Enums
 `````````````````

 In the degenerate case of an enum with a single case, there is no
 discriminator needed, and the enum type has the exact same layout as its
 case's data type, or is empty if the case has no data type.

 ::

   enum EmptyCase { case X }             // => empty type
   enum DataCase { case Y(Int, Double) } // => LLVM <{ i64, double }>

 C-Like Enums
 ````````````

 If none of the cases has a data type (a "C-like" enum), then the enum
 is laid out as an integer tag with the minimal number of bits to contain
 all of the cases. The machine-level layout of the type then follows LLVM's
 data layout rules for integer types on the target platform. The cases are
 assigned tag values in declaration order.

 ::

   enum EnumLike2 { // => LLVM i1
     case A         // => i1 0
     case B         // => i1 1
   }

   enum EnumLike8 { // => LLVM i3
     case A         // => i3 0
     case B         // => i3 1
     case C         // => i3 2
     case D         // etc.
     case E
     case F
     case G
     case H
   }

 Discriminator values after the one used for the last case become *extra
 inhabitants* of the enum type (see `Single-Payload Enums`_).

 Single-Payload Enums
 ````````````````````

 If an enum has a single case with a data type and one or more no-data cases
 (a "single-payload" enum), then the case with data type is represented using
 the data type's binary representation, with added zero bits for tag if
 necessary. If the data type's binary representation
 has **extra inhabitants**, that is, bit patterns with the size and alignment of
 the type but which do not form valid values of that type, they are used to
 represent the no-data cases, with extra inhabitants in order of ascending
 numeric value matching no-data cases in declaration order. If the type
 has *spare bits* (see `Multi-Payload Enums`_), they are used to form extra
 inhabitants. The enum value is then represented as an integer with the storage
 size in bits of the data type. Extra inhabitants of the payload type not used
 by the enum type become extra inhabitants of the enum type itself.

 ::

   enum CharOrSectionMarker { => LLVM i32
     case Paragraph            => i32 0x0020_0000
     case Char(UnicodeScalar)  => i32 (zext i21 %Char to i32)
     case Chapter              => i32 0x0020_0001
   }

   CharOrSectionMarker.Char('\x00') => i32 0x0000_0000
   CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF

   enum CharOrSectionMarkerOrFootnoteMarker { => LLVM i32
     case CharOrSectionMarker(CharOrSectionMarker) => i32 %CharOrSectionMarker
     case Asterisk                                 => i32 0x0020_0002
     case Dagger                                   => i32 0x0020_0003
     case DoubleDagger                             => i32 0x0020_0004
   }

 If the data type has no extra inhabitants, or there are not enough extra
 inhabitants to represent all of the no-data cases, then a tag bit is added
 to the enum's representation. The tag bit is set for the no-data cases, which
 are then assigned values in the data area of the enum in declaration order.

 ::

   enum IntOrInfinity { => LLVM <{ i64, i1 }>
     case NegInfinity    => <{ i64, i1 }> {    0, 1 }
     case Int(Int)       => <{ i64, i1 }> { %Int, 0 }
     case PosInfinity    => <{ i64, i1 }> {    1, 1 }
   }

   IntOrInfinity.Int(    0) => <{ i64, i1 }> {     0, 0 }
   IntOrInfinity.Int(20721) => <{ i64, i1 }> { 20721, 0 }

 Multi-Payload Enums
 ```````````````````

 If an enum has more than one case with data type, then a tag is necessary to
 discriminate the data types. The ABI will first try to find common
 **spare bits**, that is, bits in the data types' binary representations which are
 either fixed-zero or ignored by valid values of all of the data types. The tag
 will be scattered into these spare bits as much as possible. Currently only
 spare bits of primitive integer types, such as the high bits of an ``i21``
 type, are considered. The enum data is represented as an integer with the
 storage size in bits of the largest data type.

 ::

   enum TerminalChar {             => LLVM i32
     case Plain(UnicodeScalar)     => i32     (zext i21 %Plain     to i32)
     case Bold(UnicodeScalar)      => i32 (or (zext i21 %Bold      to i32), 0x0020_0000)
     case Underline(UnicodeScalar) => i32 (or (zext i21 %Underline to i32), 0x0040_0000)
     case Blink(UnicodeScalar)     => i32 (or (zext i21 %Blink     to i32), 0x0060_0000)
     case Empty                    => i32 0x0080_0000
     case Cursor                   => i32 0x0080_0001
   }

 If there are not enough spare bits to contain the tag, then additional bits are
 added to the representation to contain the tag. Tag values are
 assigned to data cases in declaration order. If there are no-data cases, they
 are collected under a common tag, and assigned values in the data area of the
 enum in declaration order.

 ::

   class Bignum {}

   enum IntDoubleOrBignum { => LLVM <{ i64, i2 }>
     case Int(Int)           => <{ i64, i2 }> {           %Int,            0 }
     case Double(Double)     => <{ i64, i2 }> { (bitcast  %Double to i64), 1 }
     case Bignum(Bignum)     => <{ i64, i2 }> { (ptrtoint %Bignum to i64), 2 }
   }

 Existential Container Layout
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Values of protocol type, protocol composition type, or ``Any`` type are laid
 out using **existential containers** (so-called because these types are
 "existential types" in type theory).

 Opaque Existential Containers
 `````````````````````````````

 If there is no class constraint on a protocol or protocol composition type,
 the existential container has to accommodate a value of arbitrary size and
 alignment. It does this using a **fixed-size buffer**, which is three pointers
 in size and pointer-aligned. This either directly contains the value, if its
 size and alignment are both less than or equal to the fixed-size buffer's, or
 contains a pointer to a side allocation owned by the existential container.
 The type of the contained value is identified by its `type metadata` record,
 and witness tables for all of the required protocol conformances are included.
 The layout is as if declared in the following C struct::

   struct OpaqueExistentialContainer {
     void *fixedSizeBuffer[3];
     Metadata *type;
     WitnessTable *witnessTables[NUM_WITNESS_TABLES];
   };

 Class Existential Containers
 ````````````````````````````

 If one or more of the protocols in a protocol or protocol composition type
 have a class constraint, then only class values can be stored in the existential
 container, and a more efficient representation is used. Class instances are
 always a single pointer in size, so a fixed-size buffer and potential side
 allocation is not needed, and class instances always have a reference to their
 own type metadata, so the separate metadata record is not needed. The
 layout is thus as if declared in the following C struct::

   struct ClassExistentialContainer {
     HeapObject *value;
     WitnessTable *witnessTables[NUM_WITNESS_TABLES];
   };

 Note that if no witness tables are needed, such as for the "any class" type
 ``protocol<class>`` or an Objective-C protocol type, then the only element of
 the layout is the heap object pointer. This is ABI-compatible with ``id``
 and ``id <Protocol>`` types in Objective-C.
	:orphan:

	.. _ABI:

	.. highlight:: none

	Type Layout
	-----------

	Hard Constraints on Resilience
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	The root of a class hierarchy must remain stable, at pain of
	invalidating the metaclass hierarchy. Note that a Swift class without an
	explicit base class is implicitly rooted in the SwiftObject
	Objective-C class.


	Fragile Struct and Tuple Layout
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Structs and tuples currently share the same layout algorithm, noted as the
	"Universal" layout algorithm in the compiler implementation. The algorithm
	is as follows:

	- Start with a size of 0 and an alignment of 1.
	- Iterate through the fields, in element order for tuples, or in ``var``
	declaration order for structs. For each field:

	* Update size by rounding up to the alignment of the field, that is,
	increasing it to the least value greater or equal to size and evenly
	divisible by the alignment of the field.
	* Assign the offset of the field to the current value of size.
	* Update size by adding the size of the field.
	* Update alignment to the max of alignment and the
	alignment of the field.

	- The final size and alignment are the size and alignment of the
	aggregate. The stride of the type is the final size rounded up to
	alignment.

	Note that this differs from C or LLVM's normal layout rules in that size
	and stride are distinct; whereas C layout requires that an embedded struct's
	size be padded out to its alignment and that nothing be laid out there,
	Swift layout allows an outer struct to lay out fields in the inner struct's
	tail padding, alignment permitting. Unlike C, zero-sized structs and tuples
	are also allowed, and take up no storage in enclosing aggregates. The Swift
	compiler emits LLVM packed struct types with manual padding to get the
	necessary control over the binary layout. Some examples:

	::

	// LLVM <{ i64, i8 }>
	struct S {
	var x: Int
	var y: UInt8
	}

	// LLVM <{ i8, [7 x i8], <{ i64, i8 }>, i8 }>
	struct S2 {
	var x: UInt8
	var s: S
	var y: UInt8
	}

	// LLVM <{}>
	struct Empty {}

	// LLVM <{ i64, i64 }>
	struct ContainsEmpty {
	var x: Int
	var y: Empty
	var z: Int
	}

	Class Layout
	~~~~~~~~~~~~

	Swift relies on the following assumptions about the Objective-C runtime,
	which are therefore now part of the Objective-C ABI:

	- 32-bit platforms never have tagged pointers. ObjC pointer types are
	either nil or an object pointer.

	- On x86-64, a tagged pointer either sets the lowest bit of the pointer
	or the highest bit of the pointer. Therefore, both of these bits are
	zero if and only if the value is not a tagged pointer.

	- On ARM64, a tagged pointer always sets the highest bit of the pointer.

	- 32-bit platforms never perform any isa masking. ``object_getClass``
	is always equivalent to ``(Class)object``.

	- 64-bit platforms perform isa masking only if the runtime exports a
	symbol ``uintptr_t objc_debug_isa_class_mask;``. If this symbol
	is exported, ``object_getClass`` on a non-tagged pointer is always
	equivalent to ``(Class)(objc_debug_isa_class_mask & (uintptr_t)object)``.

	- The superclass field of a class object is always stored immediately
	after the isa field. Its value is either nil or a pointer to the
	class object for the superclass; it never has other bits set.

	The following assumptions are part of the Swift ABI:

	- Swift class pointers are never tagged pointers.

	TODO

	Fragile Enum Layout
	~~~~~~~~~~~~~~~~~~~

	In laying out enum types, the ABI attempts to avoid requiring additional
	storage to store the tag for the enum case. The ABI chooses one of five
	strategies based on the layout of the enum:

	Empty Enums
	```````````

	In the degenerate case of an enum with no cases, the enum is an empty type.

	::

	enum Empty {} // => empty type

	Single-Case Enums
	`````````````````

	In the degenerate case of an enum with a single case, there is no
	discriminator needed, and the enum type has the exact same layout as its
	case's data type, or is empty if the case has no data type.

	::

	enum EmptyCase { case X } // => empty type
	enum DataCase { case Y(Int, Double) } // => LLVM <{ i64, double }>

	C-Like Enums
	````````````

	If none of the cases has a data type (a "C-like" enum), then the enum
	is laid out as an integer tag with the minimal number of bits to contain
	all of the cases. The machine-level layout of the type then follows LLVM's
	data layout rules for integer types on the target platform. The cases are
	assigned tag values in declaration order.

	::

	enum EnumLike2 { // => LLVM i1
	case A // => i1 0
	case B // => i1 1
	}

	enum EnumLike8 { // => LLVM i3
	case A // => i3 0
	case B // => i3 1
	case C // => i3 2
	case D // etc.
	case E
	case F
	case G
	case H
	}

	Discriminator values after the one used for the last case become *extra
	inhabitants* of the enum type (see `Single-Payload Enums`_).

	Single-Payload Enums
	````````````````````

	If an enum has a single case with a data type and one or more no-data cases
	(a "single-payload" enum), then the case with data type is represented using
	the data type's binary representation, with added zero bits for tag if
	necessary. If the data type's binary representation
	has extra inhabitants, that is, bit patterns with the size and alignment of
	the type but which do not form valid values of that type, they are used to
	represent the no-data cases, with extra inhabitants in order of ascending
	numeric value matching no-data cases in declaration order. If the type
	has spare bits (see `Multi-Payload Enums`_), they are used to form extra
	inhabitants. The enum value is then represented as an integer with the storage
	size in bits of the data type. Extra inhabitants of the payload type not used
	by the enum type become extra inhabitants of the enum type itself.

	::

	enum CharOrSectionMarker { => LLVM i32
	case Paragraph => i32 0x0020_0000
	case Char(UnicodeScalar) => i32 (zext i21 %Char to i32)
	case Chapter => i32 0x0020_0001
	}

	CharOrSectionMarker.Char('\x00') => i32 0x0000_0000
	CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF

	enum CharOrSectionMarkerOrFootnoteMarker { => LLVM i32
	case CharOrSectionMarker(CharOrSectionMarker) => i32 %CharOrSectionMarker
	case Asterisk => i32 0x0020_0002
	case Dagger => i32 0x0020_0003
	case DoubleDagger => i32 0x0020_0004
	}

	If the data type has no extra inhabitants, or there are not enough extra
	inhabitants to represent all of the no-data cases, then a tag bit is added
	to the enum's representation. The tag bit is set for the no-data cases, which
	are then assigned values in the data area of the enum in declaration order.

	::

	enum IntOrInfinity { => LLVM <{ i64, i1 }>
	case NegInfinity => <{ i64, i1 }> { 0, 1 }
	case Int(Int) => <{ i64, i1 }> { %Int, 0 }
	case PosInfinity => <{ i64, i1 }> { 1, 1 }
	}

	IntOrInfinity.Int( 0) => <{ i64, i1 }> { 0, 0 }
	IntOrInfinity.Int(20721) => <{ i64, i1 }> { 20721, 0 }

	Multi-Payload Enums
	```````````````````

	If an enum has more than one case with data type, then a tag is necessary to
	discriminate the data types. The ABI will first try to find common
	spare bits, that is, bits in the data types' binary representations which are
	either fixed-zero or ignored by valid values of all of the data types. The tag
	will be scattered into these spare bits as much as possible. Currently only
	spare bits of primitive integer types, such as the high bits of an ``i21``
	type, are considered. The enum data is represented as an integer with the
	storage size in bits of the largest data type.

	::

	enum TerminalChar { => LLVM i32
	case Plain(UnicodeScalar) => i32 (zext i21 %Plain to i32)
	case Bold(UnicodeScalar) => i32 (or (zext i21 %Bold to i32), 0x0020_0000)
	case Underline(UnicodeScalar) => i32 (or (zext i21 %Underline to i32), 0x0040_0000)
	case Blink(UnicodeScalar) => i32 (or (zext i21 %Blink to i32), 0x0060_0000)
	case Empty => i32 0x0080_0000
	case Cursor => i32 0x0080_0001
	}

	If there are not enough spare bits to contain the tag, then additional bits are
	added to the representation to contain the tag. Tag values are
	assigned to data cases in declaration order. If there are no-data cases, they
	are collected under a common tag, and assigned values in the data area of the
	enum in declaration order.

	::

	class Bignum {}

	enum IntDoubleOrBignum { => LLVM <{ i64, i2 }>
	case Int(Int) => <{ i64, i2 }> { %Int, 0 }
	case Double(Double) => <{ i64, i2 }> { (bitcast %Double to i64), 1 }
	case Bignum(Bignum) => <{ i64, i2 }> { (ptrtoint %Bignum to i64), 2 }
	}

	Existential Container Layout
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Values of protocol type, protocol composition type, or ``Any`` type are laid
	out using existential containers (so-called because these types are
	"existential types" in type theory).

	Opaque Existential Containers
	`````````````````````````````

	If there is no class constraint on a protocol or protocol composition type,
	the existential container has to accommodate a value of arbitrary size and
	alignment. It does this using a fixed-size buffer, which is three pointers
	in size and pointer-aligned. This either directly contains the value, if its
	size and alignment are both less than or equal to the fixed-size buffer's, or
	contains a pointer to a side allocation owned by the existential container.
	The type of the contained value is identified by its `type metadata` record,
	and witness tables for all of the required protocol conformances are included.
	The layout is as if declared in the following C struct::

	struct OpaqueExistentialContainer {
	void *fixedSizeBuffer[3];
	Metadata *type;
	WitnessTable *witnessTables[NUM_WITNESS_TABLES];
	};

	Class Existential Containers
	````````````````````````````

	If one or more of the protocols in a protocol or protocol composition type
	have a class constraint, then only class values can be stored in the existential
	container, and a more efficient representation is used. Class instances are
	always a single pointer in size, so a fixed-size buffer and potential side
	allocation is not needed, and class instances always have a reference to their
	own type metadata, so the separate metadata record is not needed. The
	layout is thus as if declared in the following C struct::

	struct ClassExistentialContainer {
	HeapObject *value;
	WitnessTable *witnessTables[NUM_WITNESS_TABLES];
	};

	Note that if no witness tables are needed, such as for the "any class" type
	``protocol<class>`` or an Objective-C protocol type, then the only element of
	the layout is the heap object pointer. This is ABI-compatible with ``id``
	and ``id <Protocol>`` types in Objective-C.