blob: 0d5ddee44b798d99a7a4f1700015281fec010ca2 [file] [log] [blame]
.. highlight:: none
Swift Intermediate Language (SIL)
=================================
.. contents::
Abstract
--------
SIL is an SSA-form IR with high-level semantic information designed to implement
the Swift programming language. SIL accommodates the following use cases:
- A set of guaranteed high-level optimizations that provide a predictable
baseline for runtime and diagnostic behavior.
- Diagnostic dataflow analysis passes that enforce Swift language requirements,
such as definitive initialization of variables and constructors, code
reachability, switch coverage.
- High-level optimization passes, including retain/release optimization,
dynamic method devirtualization, closure inlining, promoting heap allocations
to stack allocations, promoting stack allocations to SSA registers, scalar
replacement of aggregates (splitting aggregate allocations into multiple
smaller allocations), and generic function instantiation.
- A stable distribution format that can be used to distribute "fragile"
inlineable or generic code with Swift library modules, to be optimized into
client binaries.
In contrast to LLVM IR, SIL is a generally target-independent format
representation that can be used for code distribution, but it can also express
target-specific concepts as well as LLVM can.
For more information on developing the implementation of SIL and SIL passes, see
`SILProgrammersManual.md <SILProgrammersManual.md>`_.
SIL in the Swift Compiler
-------------------------
At a high level, the Swift compiler follows a strict pipeline architecture:
- The *Parse* module constructs an AST from Swift source code.
- The *Sema* module type-checks the AST and annotates it with type information.
- The *SILGen* module generates *raw SIL* from an AST.
- A series of *Guaranteed Optimization Passes* and *Diagnostic Passes* are run
over the raw SIL both to perform optimizations and to emit
language-specific diagnostics. These are always run, even at -Onone, and
produce *canonical SIL*.
- General SIL *Optimization Passes* optionally run over the canonical SIL to
improve performance of the resulting executable. These are enabled and
controlled by the optimization level and are not run at -Onone.
- *IRGen* lowers canonical SIL to LLVM IR.
- The LLVM backend (optionally) applies LLVM optimizations, runs the LLVM code
generator and emits binary code.
The stages pertaining to SIL processing in particular are as follows:
SILGen
~~~~~~
SILGen produces *raw SIL* by walking a type-checked Swift AST.
The form of SIL emitted by SILGen has the following properties:
- Variables are represented by loading and storing mutable memory locations
instead of being in strict SSA form. This is similar to the initial
``alloca``-heavy LLVM IR emitted by frontends such as Clang. However, Swift
represents variables as reference-counted "boxes" in the most general case,
which can be retained, released, and captured into closures.
- Dataflow requirements, such as definitive assignment, function returns,
switch coverage (TBD), etc. have not yet been enforced.
- ``transparent`` function optimization has not yet been honored.
These properties are addressed by subsequent guaranteed optimization and
diagnostic passes which are always run against the raw SIL.
Guaranteed Optimization and Diagnostic Passes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After SILGen, a deterministic sequence of optimization passes is run over the
raw SIL. We do not want the diagnostics produced by the compiler to change as
the compiler evolves, so these passes are intended to be simple and
predictable.
- **Mandatory inlining** inlines calls to "transparent" functions.
- **Memory promotion** is implemented as two optimization phases, the first
of which performs capture analysis to promote ``alloc_box`` instructions to
``alloc_stack``, and the second of which promotes non-address-exposed ``alloc_stack``
instructions to SSA registers.
- **Constant propagation** folds constant expressions and propagates the constant values.
If an arithmetic overflow occurs during the constant expression computation, a diagnostic
is issued.
- **Return analysis** verifies that each function returns a value on every
code path and doesn't "fall off the end" of its definition, which is an error.
It also issues an error when a ``noreturn`` function returns.
- **Critical edge splitting** splits all critical edges from terminators that
don't support arbitrary basic block arguments (all non cond_branch
terminators).
If all diagnostic passes succeed, the final result is the
*canonical SIL* for the program.
TODO:
- Generic specialization
- Basic ARC optimization for acceptable performance at -Onone.
General Optimization Passes
~~~~~~~~~~~~~~~~~~~~~~~~~~~
SIL captures language-specific type information, making it possible to
perform high-level optimizations that are difficult to perform on LLVM
IR.
- **Generic Specialization** analyzes specialized calls to generic
functions and generates new specialized version of the
functions. Then it rewrites all specialized usages of the generic
to a direct call of the appropriate specialized function.
- **Witness and VTable Devirtualization** for a given type looks up
the associated method from a class's vtable or a type witness table
and replaces the indirect virtual call with a call to the mapped
function.
- **Performance Inlining**
- **Reference Counting Optimizations**
- **Memory Promotion/Optimizations**
- **High-level domain specific optimizations** The Swift compiler implements
high-level optimizations on basic Swift containers such as Array or String.
Domain specific optimizations require a defined interface between
the standard library and the optimizer. More details can be found here:
`HighLevelSILOptimizations <HighLevelSILOptimizations.rst>`_
Syntax
------
SIL is reliant on Swift's type system and declarations, so SIL syntax
is an extension of Swift's. A ``.sil`` file is a Swift source file
with added SIL definitions. The Swift source is parsed only for its
declarations; Swift ``func`` bodies (except for nested declarations)
and top-level code are ignored by the SIL parser. In a ``.sil`` file,
there are no implicit imports; the ``swift`` and/or ``Builtin``
standard modules must be imported explicitly if used.
Here is an example of a ``.sil`` file::
sil_stage canonical
import Swift
// Define types used by the SIL function.
struct Point {
var x : Double
var y : Double
}
class Button {
func onClick()
func onMouseDown()
func onMouseUp()
}
// Declare a Swift function. The body is ignored by SIL.
func taxicabNorm(_ a:Point) -> Double {
return a.x + a.y
}
// Define a SIL function.
// The name @_T5norms11taxicabNormfT1aV5norms5Point_Sd is the mangled name
// of the taxicabNorm Swift function.
sil @_T5norms11taxicabNormfT1aV5norms5Point_Sd : $(Point) -> Double {
bb0(%0 : $Point):
// func Swift.+(Double, Double) -> Double
%1 = function_ref @_Tsoi1pfTSdSd_Sd
%2 = struct_extract %0 : $Point, #Point.x
%3 = struct_extract %0 : $Point, #Point.y
%4 = apply %1(%2, %3) : $(Double, Double) -> Double
return %4 : Double
}
// Define a SIL vtable. This matches dynamically-dispatched method
// identifiers to their implementations for a known static class type.
sil_vtable Button {
#Button.onClick: @_TC5norms6Button7onClickfS0_FT_T_
#Button.onMouseDown: @_TC5norms6Button11onMouseDownfS0_FT_T_
#Button.onMouseUp: @_TC5norms6Button9onMouseUpfS0_FT_T_
}
SIL Stage
~~~~~~~~~
::
decl ::= sil-stage-decl
sil-stage-decl ::= 'sil_stage' sil-stage
sil-stage ::= 'raw'
sil-stage ::= 'canonical'
There are different invariants on SIL depending on what stage of processing
has been applied to it.
* **Raw SIL** is the form produced by SILGen that has not been run through
guaranteed optimizations or diagnostic passes. Raw SIL may not have a
fully-constructed SSA graph. It may contain dataflow errors. Some instructions
may be represented in non-canonical forms, such as ``assign`` and
``destroy_addr`` for non-address-only values. Raw SIL should not be used
for native code generation or distribution.
* **Canonical SIL** is SIL as it exists after guaranteed optimizations and
diagnostics. Dataflow errors must be eliminated, and certain instructions
must be canonicalized to simpler forms. Performance optimization and native
code generation are derived from this form, and a module can be distributed
containing SIL in this (or later) forms.
SIL files declare the processing stage of the included SIL with one of the
declarations ``sil_stage raw`` or ``sil_stage canonical`` at top level. Only
one such declaration may appear in a file.
SIL Types
~~~~~~~~~
::
sil-type ::= '$' '*'? generic-parameter-list? type
SIL types are introduced with the ``$`` sigil. SIL's type system is
closely related to Swift's, and so the type after the ``$`` is parsed
largely according to Swift's type grammar.
Type Lowering
`````````````
A *formal type* is the type of a value in Swift, such as an expression
result. Swift's formal type system intentionally abstracts over a
large number of representational issues like ownership transfer
conventions and directness of arguments. However, SIL aims to
represent most such implementation details, and so these differences
deserve to be reflected in the SIL type system. *Type lowering* is
the process of turning a formal type into its *lowered type*.
It is important to be aware that the lowered type of a declaration
need not be the lowered type of the formal type of that declaration.
For example, the lowered type of a declaration reference:
- will usually be thin,
- may have a non-Swift calling convention,
- may use bridged types in its interface, and
- may use ownership conventions that differ from Swift's default
conventions.
Abstraction Difference
``````````````````````
Generic functions working with values of unconstrained type must
generally work with them indirectly, e.g. by allocating sufficient
memory for them and then passing around pointers to that memory.
Consider a generic function like this:
::
func generateArray<T>(n : Int, generator : () -> T) -> [T]
The function ``generator`` will be expected to store its result
indirectly into an address passed in an implicit parameter. There's
really just no reasonable alternative when working with a value of
arbitrary type:
- We don't want to generate a different copy of ``generateArray`` for
every type ``T``.
- We don't want to give every type in the language a common
representation.
- We don't want to dynamically construct a call to ``generator``
depending on the type ``T``.
But we also don't want the existence of the generic system to force
inefficiencies on non-generic code. For example, we'd like a function
of type ``() -> Int`` to be able to return its result directly; and
yet, ``() -> Int`` is a valid substitution of ``() -> T``, and a
caller of ``generateArray<Int>`` should be able to pass an arbitrary
``() -> Int`` in as the generator.
Therefore, the representation of a formal type in a generic context
may differ from the representation of a substitution of that formal type.
We call such differences *abstraction differences*.
SIL's type system is designed to make abstraction differences always
result in differences between SIL types. The goal is that a properly-
abstracted value should be correctly usable at any level of substitution.
In order to achieve this, the formal type of a generic entity should
always be lowered using the abstraction pattern of its unsubstituted
formal type. For example, consider the following generic type:
::
struct Generator<T> {
var fn : () -> T
}
var intGen : Generator<Int>
``intGen.fn`` has the substituted formal type ``() -> Int``, which
would normally lower to the type ``@callee_owned () -> Int``, i.e.
returning its result directly. But if that type is properly lowered
with the pattern of its unsubstituted type ``() -> T``, it becomes
``@callee_owned () -> @out Int``.
When a type is lowered using the abstraction pattern of an
unrestricted type, it is lowered as if the pattern were replaced with
a type sharing the same structure but replacing all materializable
types with fresh type variables.
For example, if ``g`` has type ``Generator<(Int, Int) -> Float>``, ``g.fn`` is
lowered using the pattern ``() -> T``, which eventually causes ``(Int, Int)
-> Float`` to be lowered using the pattern ``T``, which is the same as
lowering it with the pattern ``U -> V``; the result is that ``g.fn``
has the following lowered type::
@callee_owned () -> @owned @callee_owned (@in (Int, Int)) -> @out Float.
As another example, suppose that ``h`` has type
``Generator<(Int, inout Int) -> Float>``. Neither ``(Int, inout Int)``
nor ``inout Int`` are potential results of substitution because they
aren't materializable, so ``h.fn`` has the following lowered type::
@callee_owned () -> @owned @callee_owned (@in Int, @inout Int) -> @out Float
This system has the property that abstraction patterns are preserved
through repeated substitutions. That is, you can consider a lowered
type to encode an abstraction pattern; lowering ``T`` by ``R`` is
equivalent to lowering ``T`` by (``S`` lowered by ``R``).
SILGen has procedures for converting values between abstraction
patterns.
At present, only function and tuple types are changed by abstraction
differences.
Legal SIL Types
```````````````
The type of a value in SIL is either:
- an *object type* ``$T``, where ``T`` is a legal loadable type, or
- an *address type* ``$*T``, where ``T`` is a legal SIL type (loadable or
address-only).
A type ``T`` is a *legal SIL type* if:
- it is a function type which satisfies the constraints (below) on
function types in SIL,
- it is a metatype type which describes its representation,
- it is a tuple type whose element types are legal SIL types,
- it is ``Optional<U>``, where ``U`` is a legal SIL type,
- it is a legal Swift type that is not a function, tuple, optional,
metatype, or l-value type, or
- it is a ``@box`` containing a legal SIL type.
Note that types in other recursive positions in the type grammar are
still formal types. For example, the instance type of a metatype or
the type arguments of a generic type are still formal Swift types, not
lowered SIL types.
Address Types
`````````````
The *address of T* ``$*T`` is a pointer to memory containing a value
of any reference or value type ``$T``. This can be an internal
pointer into a data structure. Addresses of loadable types can be
loaded and stored to access values of those types.
Addresses of address-only types (see below) can only be used with
instructions that manipulate their operands indirectly by address, such
as ``copy_addr`` or ``destroy_addr``, or as arguments to functions.
It is illegal to have a value of type ``$T`` if ``T`` is address-only.
Addresses are not reference-counted pointers like class values are. They
cannot be retained or released.
Address types are not *first-class*: they cannot appear in recursive
positions in type expressions. For example, the type ``$**T`` is not
a legal type.
The address of an address cannot be directly taken. ``$**T`` is not a representable
type. Values of address type thus cannot be allocated, loaded, or stored
(though addresses can of course be loaded from and stored to).
Addresses can be passed as arguments to functions if the corresponding
parameter is indirect. They cannot be returned.
Box Types
`````````
Captured local variables and the payloads of ``indirect`` value types are stored
on the heap. The type ``@box T`` is a reference-counted type that references
a box containing a mutable value of type ``T``. Boxes always use Swift-native
reference counting, so they can be queried for uniqueness and cast to the
``Builtin.NativeObject`` type.
Metatype Types
``````````````
A concrete or existential metatype in SIL must describe its representation.
This can be:
- ``@thin``, meaning that it requires no storage and thus necessarily
represents an exact type (only allowed for concrete metatypes);
- ``@thick``, meaning that it stores a reference to a type or (if a
concrete class) a subclass of that type; or
- ``@objc``, meaning that it stores a reference to a class type (or a
subclass thereof) using an Objective-C class object representation
rather than the native Swift type-object representation.
Function Types
``````````````
Function types in SIL are different from function types in Swift in a
number of ways:
- A SIL function type may be generic. For example, accessing a
generic function with ``function_ref`` will give a value of
generic function type.
- A SIL function type may be declared ``@noescape``. This is required for any
function type passed to a parameter not declared with ``@escaping``
declaration modifier. ``@noescape`` function types may be either
``@convention(thin)`` or ``@callee_guaranteed``. They have an
unowned context--the context's lifetime must be independently guaranteed.
- A SIL function type declares its conventional treatment of its
context value:
- If it is ``@convention(thin)``, the function requires no context value.
Such types may also be declared ``@noescape``, which trivially has no effect
passing the context value.
- If it is ``@callee_guaranteed``, the context value is treated as a direct
parameter. This implies ``@convention(thick)``. If the function type is also
``@noescape``, then the context value is unowned, otherwise it is
guaranteed.
- If it is ``@callee_owned``, the context value is treated as an owned direct
parameter. This implies ``@convention(thick)`` and is mutually exclusive
with ``@noescape``.
- If it is ``@convention(block)``, the context value is treated as an unowned
direct parameter.
- Other function type conventions are described in ``Properties of Types`` and
``Calling Convention``.
- A SIL function type declares the conventions for its parameters.
The parameters are written as an unlabeled tuple; the elements of that
tuple must be legal SIL types, optionally decorated with one of the
following convention attributes.
The value of an indirect parameter has type ``*T``; the value of a
direct parameter has type ``T``.
- An ``@in`` parameter is indirect. The address must be of an
initialized object; the function is responsible for destroying
the value held there.
- An ``@inout`` parameter is indirect. The address must be of an
initialized object. The memory must remain initialized for the duration
of the call until the function returns. The function may mutate the
pointee, and furthermore may weakly assume that there are no aliasing
reads from or writes to the argument, though must preserve a valid
value at the argument so that well-ordered aliasing violations do not
compromise memory safety. This allows for optimizations such as local
load and store propagation, introduction or elimination of temporary
copies, and promotion of the ``@inout`` parameter to an ``@owned`` direct
parameter and result pair, but does not admit "take" optimization out
of the parameter or other optimization that would leave memory in an
uninitialized state.
- An ``@inout_aliasable`` parameter is indirect. The address must be of an
initialized object. The memory must remain initialized for the duration
of the call until the function returns. The function may mutate the
pointee, and must assume that other aliases may mutate it as well. These
aliases however can be assumed to be well-typed and well-ordered; ill-typed
accesses and data races to the parameter are still undefined.
- An ``@owned`` parameter is an owned direct parameter.
- A ``@guaranteed`` parameter is a guaranteed direct parameter.
- An ``@in_guaranteed`` parameter is indirect. The address must be of an
initialized object; both the caller and callee promise not to mutate the
pointee, allowing the callee to read it.
- An ``@in_constant`` parameter is indirect. The address must be of an
initialized object; the function will treat the value held there as read-only.
- Otherwise, the parameter is an unowned direct parameter.
- A SIL function type declares the conventions for its results.
The results are written as an unlabeled tuple; the elements of that
tuple must be legal SIL types, optionally decorated with one of the
following convention attributes. Indirect and direct results may
be interleaved.
Indirect results correspond to implicit arguments of type ``*T`` in
function entry blocks and in the arguments to ``apply`` and ``try_apply``
instructions. These arguments appear in the order in which they appear
in the result list, always before any parameters.
Direct results correspond to direct return values of type ``T``. A
SIL function type has a ``return type`` derived from its direct results
in the following way: when there is a single direct result, the return
type is the type of that result; otherwise, it is the tuple type of the
types of all the direct results, in the order they appear in the results
list. The return type is the type of the operand of ``return``
instructions, the type of ``apply`` instructions, and the type of
the normal result of ``try_apply`` instructions.
- An ``@out`` result is indirect. The address must be of an
uninitialized object. The function is required to leave an
initialized value there unless it terminates with a ``throw``
instruction or it has a non-Swift calling convention.
- An ``@owned`` result is an owned direct result.
- An ``@autoreleased`` result is an autoreleased direct result.
If there is an autoreleased result, it must be the only direct result.
- Otherwise, the parameter is an unowned direct result.
A direct parameter or result of trivial type must always be unowned.
An owned direct parameter or result is transferred to the recipient,
which becomes responsible for destroying the value. This means that
the value is passed at +1.
An unowned direct parameter or result is instantaneously valid at the
point of transfer. The recipient does not need to worry about race
conditions immediately destroying the value, but should copy it
(e.g. by ``strong_retain``\ ing an object pointer) if the value will be
needed sooner rather than later.
A guaranteed direct parameter is like an unowned direct parameter
value, except that it is guaranteed by the caller to remain valid
throughout the execution of the call. This means that any
``strong_retain``, ``strong_release`` pairs in the callee on the
argument can be eliminated.
An autoreleased direct result must have a type with a retainable
pointer representation. Autoreleased results are nominally transferred
at +0, but the runtime takes steps to ensure that a +1 can be safely
transferred, and those steps require precise code-layout control.
Accordingly, the SIL pattern for an autoreleased convention looks exactly
like the SIL pattern for an owned convention, and the extra runtime
instrumentation is inserted on both sides when the SIL is lowered into
LLVM IR. An autoreleased ``apply`` of a function that is defined with
an autoreleased result has the effect of a +1 transfer of the result.
An autoreleased ``apply`` of a function that is not defined with
an autoreleased result has the effect of performing a strong retain in
the caller. A non-autoreleased ``apply`` of a function that is defined
with an autoreleased result has the effect of performing an
autorelease in the callee.
- SIL function types may provide an optional error result, written by
placing ``@error`` on a result. An error result is always
implicitly ``@owned``. Only functions with a native calling
convention may have an error result.
A function with an error result cannot be called with ``apply``.
It must be called with ``try_apply``.
There is one exception to this rule: a function with an error result can be
called with ``apply [nothrow]`` if the compiler can prove that the function
does not actually throw.
``return`` produces a normal result of the function. To return
an error result, use ``throw``.
Type lowering lowers the ``throws`` annotation on formal function
types into more concrete error propagation:
- For native Swift functions, ``throws`` is turned into an error
result.
- For non-native Swift functions, ``throws`` is turned in an
explicit error-handling mechanism based on the imported API. The
importer only imports non-native methods and types as ``throws``
when it is possible to do this automatically.
- SIL function types may provide a pattern signature and substitutions
to express that values of the type use a particular generic abstraction
pattern. Both must be provided together. If a pattern signature is
present, the component types (parameters, yields, and results) must be
expressed in terms of the generic parameters of that signature.
The pattern substitutions should be expressed in terms of the generic
parameters of the overall generic signature, if any, or else
the enclosing generic context, if any.
A pattern signature follows the ``@substituted`` attribute, which
must be the final attribute preceding the function type. Pattern
substitutions follow the function type, preceded by the ``for``
keyword. For example::
@substituted <T: Collection> (@in T) -> @out T.Element for Array<Int>
The low-level representation of a value of this type may not match
the representation of a value of the substituted-through version of it::
(@in Array<Int>) -> @out Int
Substitution differences at the outermost level of a function value
may be adjusted using the ``convert_function`` instruction. Note that
this only works at the outermost level and not in nested positions.
For example, a function which takes a parameter of the first type above
cannot be converted by ``convert_function`` to a function which takes
a parameter of the second type; such a conversion must be done with a
thunk.
Type substitution on a function type with a pattern signature and
substitutions only substitutes into the substitutions; the component
types are preserved with their exact original structure.
- In the implementation, a SIL function type may also carry substitutions
for its generic signature. This is a convenience for working with
applied generic types and is not generally a formal part of the SIL
language; in particular, values should not have such types. Such a
type behaves like a non-generic type, as if the substitutions were
actually applied to the underlying function type.
Async Functions
```````````````
SIL function types may be ``@async``. ``@async`` functions run inside async
tasks, and can have explicit *suspend points* where they suspend execution.
``@async`` functions can only be called from other ``@async`` functions, but
otherwise can be invoked with the normal ``apply`` and ``try_apply``
instructions (or ``begin_apply`` if they are coroutines).
In Swift, the ``withUnsafeContinuation`` primitive is used to implement
primitive suspend points. In SIL, ``@async`` functions represent this
abstraction using the ``get_async_continuation[_addr]`` and
``await_async_continuation`` instructions. ``get_async_continuation[_addr]``
accesses a *continuation* value that can be used to resume the coroutine after
it suspends. The resulting continuation value can then be passed into a
completion handler, registered with an event loop, or scheduled by some other
mechanism. Operations on the continuation can resume the async function's
execution by passing a value back to the async function, or passing in an error
that propagates as an error in the async function's context.
The ``await_async_continuation`` instruction suspends execution of
the coroutine until the continuation is invoked to resume it. A use of
``withUnsafeContinuation`` in Swift::
func waitForCallback() async -> Int {
return await withUnsafeContinuation { cc in
registerCallback { cc.resume($0) }
}
}
might lower to the following SIL::
sil @waitForCallback : $@convention(thin) @async () -> Int {
entry:
%cc = get_async_continuation $Int
%closure = function_ref @waitForCallback_closure
: $@convention(thin) (UnsafeContinuation<Int>) -> ()
apply %closure(%cc)
await_async_continuation %cc, resume resume_cc
resume_cc(%result : $Int):
return %result
}
The closure may then be inlined into the ``waitForCallback`` function::
sil @waitForCallback : $@convention(thin) @async () -> Int {
entry:
%cc = get_async_continuation $Int
%registerCallback = function_ref @registerCallback
: $@convention(thin) (@convention(thick) () -> ()) -> ()
%callback_fn = function_ref @waitForCallback_callback
%callback = partial_apply %callback_fn(%cc)
apply %registerCallback(%callback)
await_async_continuation %cc, resume resume_cc
resume_cc(%result : $Int):
return %result
}
Every continuation value must be used exactly once to resume its associated
async coroutine once. It is undefined behavior to attempt to resume the same
continuation more than once. On the flip side, failing to resume a continuation
will leave the async task stuck in the suspended state, leaking any memory or
other resources it owns.
Coroutine Types
```````````````
A coroutine is a function which can suspend itself and return control to
its caller without terminating the function. That is, it does not need to
obey a strict stack discipline. SIL coroutines have control flow that is
tightly integrated with their callers, and they pass information back and forth
between caller and callee in a structured way through yield points.
*Generalized accessors* and *generators* in Swift fit this description: a
``read`` or ``modify`` accessor coroutine projects a single value, yields
ownership of that one value temporarily to the caller, and then takes ownership
back when resumed, allowing the coroutine to clean up resources or otherwise
react to mutations done by the caller. *Generators* similarly yield a stream of
values one at a time to their caller, temporarily yielding ownership of each
value in turn to the caller. The tight coupling of the caller's control flow
with these coroutines allows the caller to *borrow* values produced by the
coroutine, where a normal function return would need to transfer ownership of
its return value, since a normal function's context ceases to exist and be able
to maintain ownership of the value after it returns.
To support these concepts, SIL supports two kinds of coroutine:
``@yield_many`` and ``@yield_once``. Either of these attributes may be
written before a function type to indicate that it is a coroutine type.
``@yield_many`` and ``@yield_once`` coroutines are allowed to also be
``@async``. (Note that ``@async`` functions are not themselves modeled
explicitly as coroutines in SIL, although the implementation may use a coroutine
lowering strategy.)
A coroutine type may declare any number of *yielded values*, which is to
say, values which are provided to the caller at a yield point. Yielded
values are written in the result list of a function type, prefixed by
the ``@yields`` attribute. A yielded value may have a convention attribute,
taken from the set of parameter attributes and interpreted as if the yield
site were calling back to the calling function.
Currently, a coroutine may not have normal results.
Coroutine functions may be used in many of the same ways as normal
function values. However, they cannot be called with the standard
``apply`` or ``try_apply`` instructions. A non-throwing yield-once
coroutine can be called with the ``begin_apply`` instruction. There
is no support yet for calling a throwing yield-once coroutine or for
calling a yield-many coroutine of any kind.
Coroutines may contain the special ``yield`` and ``unwind``
instructions.
A ``@yield_many`` coroutine may yield as many times as it desires.
A ``@yield_once`` coroutine may yield exactly once before returning,
although it may also ``throw`` before reaching that point.
Properties of Types
```````````````````
SIL classifies types into additional subgroups based on ABI stability and
generic constraints:
- *Loadable types* are types with a fully exposed concrete representation:
* Reference types
* Builtin value types
* Fragile struct types in which all element types are loadable
* Tuple types in which all element types are loadable
* Class protocol types
* Archetypes constrained by a class protocol
Values of loadable types are loaded and stored by loading and storing
individual components of their representation. As a consequence:
* values of loadable types can be loaded into SIL SSA values and stored
from SSA values into memory without running any user-written code,
although compiler-generated reference counting operations can happen.
* values of loadable types can be take-initialized (moved between
memory locations) with a bitwise copy.
A *loadable aggregate type* is a tuple or struct type that is loadable.
A *trivial type* is a loadable type with trivial value semantics.
Values of trivial type can be loaded and stored without any retain or
release operations and do not need to be destroyed.
- *Runtime-sized types* are restricted value types for which the compiler
does not know the size of the type statically:
* Resilient value types
* Fragile struct or tuple types that contain resilient types as elements at
any depth
* Archetypes not constrained by a class protocol
- *Address-only types* are restricted value types which cannot be
loaded or otherwise worked with as SSA values:
* Runtime-sized types
* Non-class protocol types
* @weak types
* Types that can't satisfy the requirements for being loadable because they
care about the exact location of their value in memory and need to run some
user-written code when they are copied or moved. Most commonly, types "care"
about the addresses of values because addresses of values are registered in
some global data structure, or because values may contain pointers into
themselves. For example:
* Addresses of values of Swift ``@weak`` types are registered in a global
table. That table needs to be adjusted when a ``@weak`` value is copied
or moved to a new address.
* A non-COW collection type with a heap allocation (like ``std::vector`` in
C++) needs to allocate memory and copy the collection elements when the
collection is copied.
* A non-COW string type that implements a small string optimization (like
many implementations of ``std::string`` in C++) can contain a pointer
into the value itself. That pointer needs to be recomputed when the
string is copied or moved.
Values of address-only type ("address-only values") must reside in
memory and can only be referenced in SIL by address. Addresses of
address-only values cannot be loaded from or stored to. SIL provides
special instructions for indirectly manipulating address-only
values, such as ``copy_addr`` and ``destroy_addr``.
Some additional meaningful categories of type:
- A *heap object reference* type is a type whose representation consists of a
single strong-reference-counted pointer. This includes all class types,
the ``Builtin.NativeObject`` and ``AnyObject`` types, and
archetypes that conform to one or more class protocols.
- A *reference type* is more general in that its low-level representation may
include additional global pointers alongside a strong-reference-counted
pointer. This includes all heap object reference types and adds
thick function types and protocol/protocol composition types that conform to
one or more class protocols. All reference types can be ``retain``-ed and
``release``-d. Reference types also have *ownership semantics* for their
referenced heap object; see `Reference Counting`_ below.
- A type with *retainable pointer representation* is guaranteed to
be compatible (in the C sense) with the Objective-C ``id`` type.
The value at runtime may be ``nil``. This includes classes,
class metatypes, block functions, and class-bounded existentials with
only Objective-C-compatible protocol constraints, as well as one
level of ``Optional`` or ``ImplicitlyUnwrappedOptional`` applied to any of the
above. Types with retainable pointer representation can be returned
via the ``@autoreleased`` return convention.
SILGen does not always map Swift function types one-to-one to SIL function
types. Function types are transformed in order to encode additional attributes:
- The **convention** of the function, indicated by the
.. parsed-literal::
@convention(*convention*)
attribute. This is similar to the language-level ``@convention``
attribute, though SIL extends the set of supported conventions with
additional distinctions not exposed at the language level:
- ``@convention(thin)`` indicates a "thin" function reference, which uses
the Swift calling convention with no special "self" or "context" parameters.
- ``@convention(thick)`` indicates a "thick" function reference, which
uses the Swift calling convention and carries a reference-counted context
object used to represent captures or other state required by the function.
This attribute is implied by ``@callee_owned`` or ``@callee_guaranteed``.
- ``@convention(block)`` indicates an Objective-C compatible block reference.
The function value is represented as a reference to the block object,
which is an ``id``-compatible Objective-C object that embeds its invocation
function within the object. The invocation function uses the C calling
convention.
- ``@convention(c)`` indicates a C function reference. The function value
carries no context and uses the C calling convention.
- ``@convention(objc_method)`` indicates an Objective-C method implementation.
The function uses the C calling convention, with the SIL-level ``self``
parameter (by SIL convention mapped to the final formal parameter)
mapped to the ``self`` and ``_cmd`` arguments of the implementation.
- ``@convention(method)`` indicates a Swift instance method implementation.
The function uses the Swift calling convention, using the special ``self``
parameter.
- ``@convention(witness_method)`` indicates a Swift protocol method
implementation. The function's polymorphic convention is emitted in such
a way as to guarantee that it is polymorphic across all possible
implementors of the protocol.
Layout Compatible Types
```````````````````````
(This section applies only to Swift 1.0 and will hopefully be obviated in
future releases.)
SIL tries to be ignorant of the details of type layout, and low-level
bit-banging operations such as pointer casts are generally undefined. However,
as a concession to implementation convenience, some types are allowed to be
considered **layout compatible**. Type ``T`` is *layout compatible* with type
``U`` iff:
- an address of type ``$*U`` can be cast by
``address_to_pointer``/``pointer_to_address`` to ``$*T`` and a valid value
of type ``T`` can be loaded out (or indirectly used, if ``T`` is address-
only),
- if ``T`` is a nontrivial type, then ``retain_value``/``release_value`` of
the loaded ``T`` value is equivalent to ``retain_value``/``release_value`` of
the original ``U`` value.
This is not always a commutative relationship; ``T`` can be layout-compatible
with ``U`` whereas ``U`` is not layout-compatible with ``T``. If the layout
compatible relationship does extend both ways, ``T`` and ``U`` are
**commutatively layout compatible**. It is however always transitive; if ``T``
is layout-compatible with ``U`` and ``U`` is layout-compatible with ``V``, then
``T`` is layout-compatible with ``V``. All types are layout-compatible with
themselves.
The following types are considered layout-compatible:
- ``Builtin.RawPointer`` is commutatively layout compatible with all heap
object reference types, and ``Optional`` of heap object reference types.
(Note that ``RawPointer`` is a trivial type, so does not have ownership
semantics.)
- ``Builtin.RawPointer`` is commutatively layout compatible with
``Builtin.Word``.
- Structs containing a single stored property are commutatively layout
compatible with the type of that property.
- A heap object reference is commutatively layout compatible with any type
that can correctly reference the heap object. For instance, given a class
``B`` and a derived class ``D`` inheriting from ``B``, a value of
type ``B`` referencing an instance of type ``D`` is layout compatible with
both ``B`` and ``D``, as well as ``Builtin.NativeObject`` and
``AnyObject``. It is not layout compatible with an unrelated class
type ``E``.
- For payloaded enums, the payload type of the first payloaded case is
layout-compatible with the enum (*not* commutatively).
Values and Operands
~~~~~~~~~~~~~~~~~~~
::
sil-identifier ::= [A-Za-z_0-9]+
sil-value-name ::= '%' sil-identifier
sil-value ::= sil-value-name
sil-value ::= 'undef'
sil-operand ::= sil-value ':' sil-type
SIL values are introduced with the ``%`` sigil and named by an
alphanumeric identifier, which references the instruction or basic block
argument that produces the value. SIL values may also refer to the keyword
'undef', which is a value of undefined contents.
Unlike LLVM IR, SIL instructions that take value operands *only* accept
value operands. References to literal constants, functions, global variables, or
other entities require specialized instructions such as ``integer_literal``,
``function_ref``, ``global_addr``, etc.
Functions
~~~~~~~~~
::
decl ::= sil-function
sil-function ::= 'sil' sil-linkage? sil-function-attribute+
sil-function-name ':' sil-type
'{' sil-basic-block+ '}'
sil-function-name ::= '@' [A-Za-z_0-9]+
SIL functions are defined with the ``sil`` keyword. SIL function names
are introduced with the ``@`` sigil and named by an alphanumeric
identifier. This name will become the LLVM IR name for the function,
and is usually the mangled name of the originating Swift declaration.
The ``sil`` syntax declares the function's name and SIL type, and
defines the body of the function inside braces. The declared type must
be a function type, which may be generic.
Function Attributes
```````````````````
::
sil-function-attribute ::= '[canonical]'
The function is in canonical SIL even if the module is still in raw SIL.
::
sil-function-attribute ::= '[ossa]'
The function is in OSSA (ownership SSA) form.
::
sil-function-attribute ::= '[transparent]'
Transparent functions are always inlined and don't keep their source
information when inlined.
::
sil-function-attribute ::= '[' sil-function-thunk ']'
sil-function-thunk ::= 'thunk'
sil-function-thunk ::= 'signature_optimized_thunk'
sil-function-thunk ::= 'reabstraction_thunk'
The function is a compiler generated thunk.
::
sil-function-attribute ::= '[dynamically_replacable]'
The function can be replaced at runtime with a different implementation.
Optimizations must not assume anything about such a function, even if the SIL
of the function body is available.
::
sil-function-attribute ::= '[dynamic_replacement_for' identifier ']'
sil-function-attribute ::= '[objc_replacement_for' identifier ']'
Specifies for which function this function is a replacement.
::
sil-function-attribute ::= '[exact_self_class]'
The function is a designated initializers, where it is known that the static
type being allocated is the type of the class that defines the designated
initializer.
::
sil-function-attribute ::= '[without_actually_escaping]'
The function is a thunk for closures which are not actually escaping.
::
sil-function-attribute ::= '[' sil-function-purpose ']'
sil-function-purpose ::= 'global_init'
The implied semantics are:
- side-effects can occur any time before the first invocation.
- all calls to the same ``global_init`` function have the same side-effects.
- any operation that may observe the initializer's side-effects must be
preceded by a call to the initializer.
This is currently true if the function is an addressor that was lazily
generated from a global variable access. Note that the initialization
function itself does not need this attribute. It is private and only
called within the addressor.
::
sil-function-purpose ::= 'lazy_getter'
The function is a getter of a lazy property for which the backing storage is
an ``Optional`` of the property's type. The getter contains a top-level
`switch_enum`_ (or `switch_enum_addr`_), which tests if the lazy property
is already computed. In the ``None``-case, the property is computed and stored
to the backing storage of the property.
After the first call of a lazy property getter, it is guaranteed that the
property is computed and consecutive calls always execute the ``Some``-case of
the top-level `switch_enum`_.
::
sil-function-attribute ::= '[weak_imported]'
Cross-module references to this function should always use weak linking.
::
sil-function-attribute ::= '[available' sil-version-tuple ']'
sil-version-tuple ::= [0-9]+ ('.' [0-9]+)*
The minimal OS-version where the function is available.
::
sil-function-attribute ::= '[' sil-function-inlining ']'
sil-function-inlining ::= 'never'
The function is never inlined.
::
sil-function-inlining ::= 'always'
The function is always inlined, even in a ``Onone`` build.
::
sil-function-attribute ::= '[' sil-function-optimization ']'
sil-function-inlining ::= 'Onone'
sil-function-inlining ::= 'Ospeed'
sil-function-inlining ::= 'Osize'
The function is optimized according to this attribute, overriding the setting
from the command line.
::
sil-function-attribute ::= '[' sil-function-effects ']'
sil-function-effects ::= 'readonly'
sil-function-effects ::= 'readnone'
sil-function-effects ::= 'readwrite'
sil-function-effects ::= 'releasenone'
The specified memory effects of the function.
::
sil-function-attribute ::= '[_semantics "' [A-Za-z._0-9]+ '"]'
The specified high-level semantics of the function. The optimizer can use this
information to perform high-level optimizations before such functions are
inlined. For example, ``Array`` operations are annotated with semantic
attributes to let the optimizer perform redundant bounds check elimination and
similar optimizations.
::
sil-function-attribute ::= '[_specialize "' [A-Za-z._0-9]+ '"]'
Specifies for which types specialized code should be generated.
::
sil-function-attribute ::= '[clang "' identifier '"]'
The clang node owner.
Basic Blocks
~~~~~~~~~~~~
::
sil-basic-block ::= sil-label sil-instruction-def* sil-terminator
sil-label ::= sil-identifier ('(' sil-argument (',' sil-argument)* ')')? ':'
sil-value-ownership-kind ::= @owned
sil-value-ownership-kind ::= @guaranteed
sil-value-ownership-kind ::= @unowned
sil-argument ::= sil-value-name ':' sil-value-ownership-kind? sil-type
sil-instruction-result ::= sil-value-name
sil-instruction-result ::= '(' (sil-value-name (',' sil-value-name)*)? ')'
sil-instruction-source-info ::= (',' sil-scope-ref)? (',' sil-loc)?
sil-instruction-def ::=
(sil-instruction-result '=')? sil-instruction sil-instruction-source-info
A function body consists of one or more basic blocks that correspond
to the nodes of the function's control flow graph. Each basic block
contains one or more instructions and ends with a terminator
instruction. The function's entry point is always the first basic
block in its body.
In SIL, basic blocks take arguments, which are used as an alternative to LLVM's
phi nodes. Basic block arguments are bound by the branch from the predecessor
block::
sil @iif : $(Builtin.Int1, Builtin.Int64, Builtin.Int64) -> Builtin.Int64 {
bb0(%cond : $Builtin.Int1, %ifTrue : $Builtin.Int64, %ifFalse : $Builtin.Int64):
cond_br %cond : $Builtin.Int1, then, else
then:
br finish(%ifTrue : $Builtin.Int64)
else:
br finish(%ifFalse : $Builtin.Int64)
finish(%result : $Builtin.Int64):
return %result : $Builtin.Int64
}
Arguments to the entry point basic block, which has no predecessor,
are bound by the function's caller::
sil @foo : $@convention(thin) (Int) -> Int {
bb0(%x : $Int):
return %x : $Int
}
sil @bar : $@convention(thin) (Int, Int) -> () {
bb0(%x : $Int, %y : $Int):
%foo = function_ref @foo
%1 = apply %foo(%x) : $(Int) -> Int
%2 = apply %foo(%y) : $(Int) -> Int
%3 = tuple ()
return %3 : $()
}
When a function is in Ownership SSA, arguments additionally have an explicit
annotated convention that describe the ownership semantics of the argument
value::
sil [ossa] @baz : $@convention(thin) (Int, @owned String, @guaranteed String, @unowned String) -> () {
bb0(%x : $Int, %y : @owned $String, %z : @guaranteed $String, %w : @unowned $String):
...
}
Note that the first argument (``%x``) has an implicit ownership kind of
``@none`` since all trivial values have ``@none`` ownership.
Debug Information
~~~~~~~~~~~~~~~~~
::
sil-scope-ref ::= 'scope' [0-9]+
sil-scope ::= 'sil_scope' [0-9]+ '{'
sil-loc
'parent' scope-parent
('inlined_at' sil-scope-ref)?
'}'
scope-parent ::= sil-function-name ':' sil-type
scope-parent ::= sil-scope-ref
sil-loc ::= 'loc' string-literal ':' [0-9]+ ':' [0-9]+
Each instruction may have a debug location and a SIL scope reference
at the end. Debug locations consist of a filename, a line number, and
a column number. If the debug location is omitted, it defaults to the
location in the SIL source file. SIL scopes describe the position
inside the lexical scope structure that the Swift expression a SIL
instruction was generated from had originally. SIL scopes also hold
inlining information.
Declaration References
~~~~~~~~~~~~~~~~~~~~~~
::
sil-decl-ref ::= '#' sil-identifier ('.' sil-identifier)* sil-decl-subref?
sil-decl-subref ::= '!' sil-decl-subref-part ('.' sil-decl-lang)? ('.' sil-decl-autodiff)?
sil-decl-subref ::= '!' sil-decl-lang
sil-decl-subref ::= '!' sil-decl-autodiff
sil-decl-subref-part ::= 'getter'
sil-decl-subref-part ::= 'setter'
sil-decl-subref-part ::= 'allocator'
sil-decl-subref-part ::= 'initializer'
sil-decl-subref-part ::= 'enumelt'
sil-decl-subref-part ::= 'destroyer'
sil-decl-subref-part ::= 'deallocator'
sil-decl-subref-part ::= 'globalaccessor'
sil-decl-subref-part ::= 'ivardestroyer'
sil-decl-subref-part ::= 'ivarinitializer'
sil-decl-subref-part ::= 'defaultarg' '.' [0-9]+
sil-decl-lang ::= 'foreign'
sil-decl-autodiff ::= sil-decl-autodiff-kind '.' sil-decl-autodiff-indices
sil-decl-autodiff-kind ::= 'jvp'
sil-decl-autodiff-kind ::= 'vjp'
sil-decl-autodiff-indices ::= [SU]+
Some SIL instructions need to reference Swift declarations directly. These
references are introduced with the ``#`` sigil followed by the fully qualified
name of the Swift declaration. Some Swift declarations are
decomposed into multiple entities at the SIL level. These are distinguished by
following the qualified name with ``!`` and one or more ``.``-separated component
entity discriminators:
- ``getter``: the getter function for a ``var`` declaration
- ``setter``: the setter function for a ``var`` declaration
- ``allocator``: a ``struct`` or ``enum`` constructor, or a ``class``\ 's *allocating constructor*
- ``initializer``: a ``class``\ 's *initializing constructor*
- ``enumelt``: a member of a ``enum`` type.
- ``destroyer``: a class's destroying destructor
- ``deallocator``: a class's deallocating destructor
- ``globalaccessor``: the addressor function for a global variable
- ``ivardestroyer``: a class's ivar destroyer
- ``ivarinitializer``: a class's ivar initializer
- ``defaultarg.``\ *n*: the default argument-generating function for
the *n*\ -th argument of a Swift ``func``
- ``foreign``: a specific entry point for C/Objective-C interoperability
Linkage
~~~~~~~
::
sil-linkage ::= 'public'
sil-linkage ::= 'hidden'
sil-linkage ::= 'shared'
sil-linkage ::= 'private'
sil-linkage ::= 'public_external'
sil-linkage ::= 'hidden_external'
sil-linkage ::= 'non_abi'
A linkage specifier controls the situations in which two objects in
different SIL modules are *linked*, i.e. treated as the same object.
A linkage is *external* if it ends with the suffix ``external``. An
object must be a definition if its linkage is not external.
All functions, global variables, and witness tables have linkage.
The default linkage of a definition is ``public``. The default linkage of a
declaration is ``public_external``. (These may eventually change to ``hidden``
and ``hidden_external``, respectively.)
On a global variable, an external linkage is what indicates that the
variable is not a definition. A variable lacking an explicit linkage
specifier is presumed a definition (and thus gets the default linkage
for definitions, ``public``.)
Definition of the *linked* relation
```````````````````````````````````
Two objects are linked if they have the same name and are mutually
visible:
- An object with ``public`` or ``public_external`` linkage is always
visible.
- An object with ``hidden``, ``hidden_external``, or ``shared``
linkage is visible only to objects in the same Swift module.
- An object with ``private`` linkage is visible only to objects in
the same SIL module.
Note that the *linked* relationship is an equivalence relation: it is
reflexive, symmetric, and transitive.
Requirements on linked objects
``````````````````````````````
If two objects are linked, they must have the same type.
If two objects are linked, they must have the same linkage, except:
- A ``public`` object may be linked to a ``public_external`` object.
- A ``hidden`` object may be linked to a ``hidden_external`` object.
If two objects are linked, at most one may be a definition, unless:
- both objects have ``shared`` linkage or
- at least one of the objects has an external linkage.
If two objects are linked, and both are definitions, then the
definitions must be semantically equivalent. This equivalence may
exist only on the level of user-visible semantics of well-defined
code; it should not be taken to guarantee that the linked definitions
are exactly operationally equivalent. For example, one definition of
a function might copy a value out of an address parameter, while
another may have had an analysis applied to prove that said value is
not needed.
If an object has any uses, then it must be linked to a definition
with non-external linkage.
Public non-ABI linkage
``````````````````````
The `non_abi` linkage is a special linkage used for definitions which
only exist in serialized SIL, and do not define visible symbols in the
object file.
A definition with `non_abi` linkage behaves like it has `shared` linkage,
except that it must be serialized in the SIL module even if not referenced
from anywhere else in the module. For example, this means it is considered
a root for dead function elimination.
When a `non_abi` definition is deserialized, it will have `shared_external`
linkage.
There is no `non_abi_external` linkage. Instead, when referencing a
`non_abi` declaration that is defined in a different translation unit from
the same Swift module, you must use `hidden_external` linkage.
Summary
```````
- ``public`` definitions are unique and visible everywhere in the
program. In LLVM IR, they will be emitted with ``external``
linkage and ``default`` visibility.
- ``hidden`` definitions are unique and visible only within the
current Swift module. In LLVM IR, they will be emitted with
``external`` linkage and ``hidden`` visibility.
- ``private`` definitions are unique and visible only within the
current SIL module. In LLVM IR, they will be emitted with
``private`` linkage.
- ``shared`` definitions are visible only within the current Swift
module. They can be linked only with other ``shared``
definitions, which must be equivalent; therefore, they only need
to be emitted if actually used. In LLVM IR, they will be emitted
with ``linkonce_odr`` linkage and ``hidden`` visibility.
- ``public_external`` and ``hidden_external`` objects always have
visible definitions somewhere else. If this object nonetheless
has a definition, it's only for the benefit of optimization or
analysis. In LLVM IR, declarations will have ``external`` linkage
and definitions (if actually emitted as definitions) will have
``available_externally`` linkage.
VTables
~~~~~~~
::
decl ::= sil-vtable
sil-vtable ::= 'sil_vtable' identifier '{' sil-vtable-entry* '}'
sil-vtable-entry ::= sil-decl-ref ':' sil-linkage? sil-function-name
SIL represents dynamic dispatch for class methods using the `class_method`_,
`super_method`_, `objc_method`_, and `objc_super_method`_ instructions.
The potential destinations for `class_method`_ and `super_method`_ are
tracked in ``sil_vtable`` declarations for every class type. The declaration
contains a mapping from every method of the class (including those inherited
from its base class) to the SIL function that implements the method for that
class::
class A {
func foo()
func bar()
func bas()
}
sil @A_foo : $@convention(thin) (@owned A) -> ()
sil @A_bar : $@convention(thin) (@owned A) -> ()
sil @A_bas : $@convention(thin) (@owned A) -> ()
sil_vtable A {
#A.foo: @A_foo
#A.bar: @A_bar
#A.bas: @A_bas
}
class B : A {
func bar()
}
sil @B_bar : $@convention(thin) (@owned B) -> ()
sil_vtable B {
#A.foo: @A_foo
#A.bar: @B_bar
#A.bas: @A_bas
}
class C : B {
func bas()
}
sil @C_bas : $@convention(thin) (@owned C) -> ()
sil_vtable C {
#A.foo: @A_foo
#A.bar: @B_bar
#A.bas: @C_bas
}
Note that the declaration reference in the vtable is to the least-derived method
visible through that class (in the example above, ``B``'s vtable references
``A.bar`` and not ``B.bar``, and ``C``'s vtable references ``A.bas`` and not
``C.bas``). The Swift AST maintains override relationships between declarations
that can be used to look up overridden methods in the SIL vtable for a derived
class (such as ``C.bas`` in ``C``'s vtable).
In case the SIL function is a thunk, the function name is preceded with the
linkage of the original implementing function.
Witness Tables
~~~~~~~~~~~~~~
::
decl ::= sil-witness-table
sil-witness-table ::= 'sil_witness_table' sil-linkage?
normal-protocol-conformance '{' sil-witness-entry* '}'
SIL encodes the information needed for dynamic dispatch of generic types into
witness tables. This information is used to produce runtime dispatch tables when
generating binary code. It can also be used by SIL optimizations to specialize
generic functions. A witness table is emitted for every declared explicit
conformance. Generic types share one generic witness table for all of their
instances. Derived classes inherit the witness tables of their base class.
::
protocol-conformance ::= normal-protocol-conformance
protocol-conformance ::= 'inherit' '(' protocol-conformance ')'
protocol-conformance ::= 'specialize' '<' substitution* '>'
'(' protocol-conformance ')'
protocol-conformance ::= 'dependent'
normal-protocol-conformance ::= identifier ':' identifier 'module' identifier
Witness tables are keyed by *protocol conformance*, which is a unique identifier
for a concrete type's conformance to a protocol.
- A *normal protocol conformance* names a (potentially unbound generic) type,
the protocol it conforms to, and the module in which the type or extension
declaration that provides the conformance appears. These correspond 1:1 to
protocol conformance declarations in the source code.
- If a derived class conforms to a protocol through inheritance from its base
class, this is represented by an *inherited protocol conformance*, which
simply references the protocol conformance for the base class.
- If an instance of a generic type conforms to a protocol, it does so with a
*specialized conformance*, which provides the generic parameter bindings
to the normal conformance, which should be for a generic type.
Witness tables are only directly associated with normal conformances.
Inherited and specialized conformances indirectly reference the witness table of
the underlying normal conformance.
::
sil-witness-entry ::= 'base_protocol' identifier ':' protocol-conformance
sil-witness-entry ::= 'method' sil-decl-ref ':' sil-function-name
sil-witness-entry ::= 'associated_type' identifier
sil-witness-entry ::= 'associated_type_protocol'
'(' identifier ':' identifier ')' ':' protocol-conformance
Witness tables consist of the following entries:
- *Base protocol entries* provide references to the protocol conformances that
satisfy the witnessed protocols' inherited protocols.
- *Method entries* map a method requirement of the protocol to a SIL function
that implements that method for the witness type. One method entry must exist
for every required method of the witnessed protocol.
- *Associated type entries* map an associated type requirement of the protocol
to the type that satisfies that requirement for the witness type. Note that
the witness type is a source-level Swift type and not a SIL type. One
associated type entry must exist for every required associated type of the
witnessed protocol.
- *Associated type protocol entries* map a protocol requirement on an associated
type to the protocol conformance that satisfies that requirement for the
associated type.
Default Witness Tables
~~~~~~~~~~~~~~~~~~~~~~
::
decl ::= sil-default-witness-table
sil-default-witness-table ::= 'sil_default_witness_table'
identifier minimum-witness-table-size
'{' sil-default-witness-entry* '}'
minimum-witness-table-size ::= integer
SIL encodes requirements with resilient default implementations in a default
witness table. We say a requirement has a resilient default implementation if
the following conditions hold:
- The requirement has a default implementation
- The requirement is either the last requirement in the protocol, or all
subsequent requirements also have resilient default implementations
The set of requirements with resilient default implementations is stored in
protocol metadata.
The minimum witness table size is the size of the witness table, in words,
not including any requirements with resilient default implementations.
Any conforming witness table must have a size between the minimum size, and
the maximum size, which is equal to the minimum size plus the number of
default requirements.
At load time, if the runtime encounters a witness table with fewer than the
maximum number of witnesses, the witness table is copied, with default
witnesses copied in. This ensures that callers can always expect to find
the correct number of requirements in each witness table, and new
requirements can be added by the framework author, without breaking client
code, as long as the new requirements have resilient default implementations.
Default witness tables are keyed by the protocol itself. Only protocols with
public visibility need a default witness table; private and internal protocols
are never seen outside the module, therefore there are no resilience issues
with adding new requirements.
::
sil-default-witness-entry ::= 'method' sil-decl-ref ':' sil-function-name
Default witness tables currently contain only one type of entry:
- *Method entries* map a method requirement of the protocol to a SIL function
that implements that method in a manner suitable for all witness types.
Global Variables
~~~~~~~~~~~~~~~~
::
decl ::= sil-global-variable
static-initializer ::= '=' '{' sil-instruction-def* '}'
sil-global-variable ::= 'sil_global' sil-linkage identifier ':' sil-type
(static-initializer)?
SIL representation of a global variable.
Global variable access is performed by the ``alloc_global``, ``global_addr``
and ``global_value`` instructions.
A global can have a static initializer if its initial value can be
composed of literals. The static initializer is represented as a list of
literal and aggregate instructions where the last instruction is the top-level
value of the static initializer::
sil_global hidden @$S4test3varSiv : $Int {
%0 = integer_literal $Builtin.Int64, 27
%initval = struct $Int (%0 : $Builtin.Int64)
}
If a global does not have a static initializer, the ``alloc_global``
instruction must be performed prior an access to initialize the storage.
Once a global's storage has been initialized, ``global_addr`` is used to
project the value.
If the last instruction in the static initializer is an ``object`` instruction
the global variable is a statically initialized object. In this case the
variable cannot be used as l-value, i.e. the reference to the object cannot be
modified. As a consequence the variable cannot be accessed with ``global_addr``
but only with ``global_value``.
Differentiability Witnesses
~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
decl ::= sil-differentiability-witness
sil-differentiability-witness ::=
'sil_differentiability_witness'
sil-linkage?
'[' 'parameters' sil-differentiability-witness-function-index-list ']'
'[' 'results' sil-differentiability-witness-function-index-list ']'
generic-parameter-clause?
sil-function-name ':' sil-type
sil-differentiability-witness-body?
sil-differentiability-witness-body ::=
'{' sil-differentiability-witness-entry?
sil-differentiability-witness-entry? '}'
sil-differentiability-witness-entry ::=
sil-differentiability-witness-entry-kind ':'
sil-entry-name ':' sil-type
sil-differentiability-witness-entry-kind ::= 'jvp' | 'vjp'
SIL encodes function differentiability via differentiability witnesses.
Differentiability witnesses map a "key" (including an "original" SIL function)
to derivative SIL functions.
Differentiability witnesses are keyed by the following:
- An "original" SIL function name.
- Differentiability parameter indices.
- Differentiability result indices.
- A generic parameter clause, representing differentiability generic
requirements.
Differentiability witnesses may have a body, specifying derivative functions for
the key. Verification checks that derivative functions have the expected type
based on the key.
::
sil_differentiability_witness hidden [parameters 0] [results 0] <T where T : Differentiable> @id : $@convention(thin) (T) -> T {
jvp: @id_jvp : $@convention(thin) (T) -> (T, @owned @callee_guaranteed (T.TangentVector) -> T.TangentVector)
vjp: @id_vjp : $@convention(thin) (T) -> (T, @owned @callee_guaranteed (T.TangentVector) -> T.TangentVector)
}
During SILGen, differentiability witnesses are emitted for the following:
- `@differentiable` declaration attributes.
- `@derivative` declaration attributes. Registered derivative functions
become differentiability witness entries.
The SIL differentiation transform canonicalizes differentiability witnesses,
filling in missing entries.
Differentiability witness entries are accessed via the
`differentiability_witness_function` instruction.
Dataflow Errors
---------------
*Dataflow errors* may exist in raw SIL. Swift's semantics defines these
conditions as errors, so they must be diagnosed by diagnostic
passes and must not exist in canonical SIL.
Definitive Initialization
~~~~~~~~~~~~~~~~~~~~~~~~~
Swift requires that all local variables be initialized before use. In
constructors, all instance variables of a struct, enum, or class type must
be initialized before the object is used and before the constructor is returned
from.
Unreachable Control Flow
~~~~~~~~~~~~~~~~~~~~~~~~
The ``unreachable`` terminator is emitted in raw SIL to mark incorrect control
flow, such as a non-``Void`` function failing to ``return`` a value, or a
``switch`` statement failing to cover all possible values of its subject.
The guaranteed dead code elimination pass can eliminate truly unreachable
basic blocks, or ``unreachable`` instructions may be dominated by applications
of functions returning uninhabited types. An ``unreachable`` instruction that
survives guaranteed DCE and is not immediately preceded by a no-return
application is a dataflow error.
Ownership SSA
-------------
A SILFunction marked with the ``[ossa]`` function attribute is considered to be
in Ownership SSA form. Ownership SSA is an augmented version of SSA that
enforces ownership invariants by imbuing value-operand edges with semantic
ownership information. All SIL values are assigned a constant ownership kind
that defines the ownership semantics that the value models. All SIL operands
that use a SIL value are required to be able to be semantically partitioned in
between "non-lifetime ending uses" that just require the value to be live and
"lifetime ending uses" that end the lifetime of the value and after which the
value can no longer be used. Since by definition operands that are lifetime
ending uses end their associated value's lifetime, we must have that the
lifetime ending use points jointly post-dominate all non-lifetime ending use
points and that a value must have exactly one lifetime ending use along all
reachable program paths, preventing leaks and use-after-frees. As an example,
consider the following SIL example with partitioned defs/uses annotated inline::
sil @stash_and_cast : $@convention(thin) (@owned Klass) -> @owned SuperKlass {
bb0(%kls1 : @owned $Klass): // Definition of %kls1
// "Normal Use" kls1.
// Definition of %kls2.
%kls2 = copy_value %kls1 : $Klass
// "Consuming Use" of %kls2 to store it into a global. Stores in ossa are
// consuming since memory is generally assumed to have "owned"
// semantics. After this instruction executes, we can no longer use %kls2
// without triggering an ownership violation.
store %kls2 to [init] %globalMem : $*Klass
// "Consuming Use" of %kls1.
// Definition of %kls1Casted.
%kls1Casted = upcast %kls1 : $Klass to $SuperKlass
// "Consuming Use" of %kls1Casted
return %kls1Casted : $SuperKlass
}
Notice how every value in the SIL above has a partionable set of uses with
normal uses always before consuming uses. Any such violations of ownership
semantics would trigger a SILVerifier error allowing us to know that we
do not have any leaks or use-after-frees in the above code.
Ownership Kind
~~~~~~~~~~~~~~
The semantics in the previous example is of just one form of ownership semantics
supported: "owned" semantics. In SIL, we map these "ownership semantics" into a
form that a compiler can reason about by mapping semantics onto a lattice with
the following elements: `None`_, `Owned`_, `Guaranteed`_, `Unowned`_, `Any`. We
call this the lattice of "Ownership Kinds" and each individual value an
"Ownership Kind". This lattice is defined as a 3-level lattice with::
1. None being Top.
2. Any being Bottom.
3. All non-Any, non-None OwnershipKinds being defined as a mid-level elements of the lattice
We can graphically represent the lattice via a diagram like the following::
+------+
+-------- | None | ---------+
| +------+ |
| | |
v v v ^
+-------+ +-----+------+ +---------+ |
| Owned | | Guaranteed | | Unowned | +--- Value Ownership Kinds and
+-------+ +-----+------+ +---------+ Ownership Constraints
| | |
| v | +--- Only Ownership Constraints
| +-----+ | |
+-------->| Any |<----------+ v
+-----+
One moves down the lattice by performing a "meet" operation::
None meet OtherOwnershipKind -> OtherOwnershipKind
Unowned meet Owned -> Any
Owned meet Guaranteed -> Any
and one moves up the lattice by performing a "join" operation, e.x.::
Any join OtherOwnershipKind -> OtherOwnershipKind
Owned join Any -> Owned
Owned join Guaranteed -> None
This lattice is applied to SIL by requiring well formed SIL to:
1. Define a map of each SIL value to a constant OwnershipKind that classify the
semantics that the SIL value obeys. This ownership kind may be static (i.e.:
the same for all instances of an instruction) or dynamic (e.x.: forwarding
instructions set their ownership upon construction). We call this subset of
OwnershipKind to be the set of `Value Ownership Kind`_: `None`_, `Unowned`_,
`Guaranteed`_, `Owned`_ (note conspiciously missing `Any`). This is because
in our model `Any` represents an unknown ownership semantics and since our
model is strict, we do not allow for values to have unknown ownership.
2. Define a map from each operand of a SILInstruction, `i`, to a constant
Ownership Kind, Boolean pair called the operand's `Ownership
Constraint`_. The Ownership Kind element of the `Ownership Constraint`_
determines semantically which ownership kind's the operand's value can take
on. The Boolean value is used to know if an operand will end the lifetime of
the incoming value when checking dataflow rules. The dataflow rules that each
`Value Ownership Kind`_ obeys is documented for each `Value Ownership Kind`_
in its detailed description below.
Then we take these two maps and require that valid SIL has the property that
given an operand, ``op(i)`` of an instruction ``i`` and a value ``v`` that
``op(i)`` can only use ``v`` if the ``join`` of
``OwnershipConstraint(operand(i))`` with ``ValueOwnershipKind(v)`` is equal to
the ``ValueOwnershipKind`` of ``v``. In symbols, we must have that::
join : (OwnershipConstraint, ValueOwnershipKind) -> ValueOwnershipKind
OwnershipConstraint(operand(i)) join ValueOwnershipKind(v) = ValueOwnershipKind(v)
In words, a value can be passed to an operand if applying the operand's
ownership constraint to the value's ownership does not change the value's
ownership. Operationally this has a few interesting effects on SIL::
1. We have defined away invalid value-operand (aka def-use) pairing since the
SILVerifier validates the aforementioned relationship on all SIL values,
uses at all points of the pipeline until ossa is lowered.
2. Many SIL instructions do not care about the ownership kind that their value
will take. They can just define all of their operand's as having an
ownership constraint of Any.
Now lets go into more depth upon `Value Ownership Kind`_ and `Ownership Constraint`_.
Value Ownership Kind
~~~~~~~~~~~~~~~~~~~~
As mentioned above, each SIL value is statically mapped to an `Ownership Kind`_
called the value's "ValueOwnershipKind" that classify the semantics of the
value. Below, we map each ValueOwnershipKind to a short summary of the semantics
implied upon the parent value:
* **None**. This is used to represent values that do not require memory
management and are outside of Ownership SSA invariants. Examples: trivial
values (e.x.: Int, Float), non-payloaded cases of non-trivial enums (e.x.:
Optional<T>.none), all address types.
* **Owned**. A value that exists independently of any other value and is
consumed exactly once along all paths through a function by either a
destroy_value (actually destroying the value) or by a consuming instruction
that rebinds the value in some manner (e.x.: apply, casts, store).
* **Guaranteed**. A value with a scoped lifetime whose liveness is dependent on
the lifetime of some other "base" owned or guaranteed value. Consumed by
instructions like `end_borrow`_. The "base" value is statically guaranteed to
be live at all of the value's paired end_borrow instructions.
* **Unowned**. A value that is only guaranteed to be instantaneously valid and
must be copied before the value is used in an ``@owned`` or ``@guaranteed``
context. This is needed both to model argument values with the ObjC unsafe
unowned argument convention and also to model the ownership resulting from
bitcasting a trivial type to a non-trivial type. This value should never be
consumed.
We describe each of these semantics in below in more detail.
Owned
`````
Owned ownership models "move only" values. We require that each such value is
consumed exactly once along all program paths. The IR verifier will flag values
that are not consumed along a path as a leak and any double consumes as
use-after-frees. We model move operations via `forwarding uses`_ such as casts
and transforming terminators (e.x.: `switch_enum`_, `checked_cast_br`_) that
transform the input value, consuming it in the process, and producing a new
transformed owned value as a result.
Putting this all together, one can view each owned SIL value as being
effectively a "move only value" except when explicitly copied by a
copy_value. This of course implies that ARC operations can be assumed to only
semantically effect the specific value that they are applied to /and/ that each
ARC constraint is able to be verified independently for each owned SILValue
derived from the ARC object. As an example, consider the following Swift/SIL::
// testcase.swift.
func doSomething(x : Klass) -> OtherKlass? {
return x as? OtherKlass
}
// testcase.sil. A possible SILGen lowering
sil [ossa] @doSomething : $@convention(thin) (@guaranteed Klass) -> () {
bb0(%0 : @guaranteed Klass):
// Definition of '%1'
%1 = copy_value %0 : $Klass
// Consume '%1'. This means '%1' can no longer be used after this point. We
// rebind '%1' in the destination blocks (bbYes, bbNo).
checked_cast_br %1 : $Klass to $OtherKlass, bbYes, bbNo
bbYes(%2 : @owned $OtherKlass): // On success, the checked_cast_br forwards
// '%1' into '%2' after casting to OtherKlass.
// Forward '%2' into '%3'. '%2' can not be used past this point in the
// function.
%3 = enum $Optional<OtherKlass>, case #Optional.some!enumelt, %2 : $OtherKlass
// Forward '%3' into the branch. '%3' can not be used past this point.
br bbEpilog(%3 : $Optional<OtherKlass>)
bbNo(%3 : @owned $Klass): // On failure, since we consumed '%1' already, we
// return the original '%1' as a new value '%3'
// so we can use it below.
// Actually destroy the underlying copy (``%1``) created by the copy_value
// in bb0.
destroy_value %3 : $Klass
// We want to return nil here. So we create a new non-payloaded enum and
// pass it off to bbEpilog.
%4 = enum $Optional<OtherKlass>, case #Optional.none!enumelt
br bbEpilog(%4 : $Optional<OtherKlass>)
bbEpilog(%5 : @owned $Optional<OtherKlass>):
// Consumes '%5' to return to caller.
return %5 : $Optional<OtherKlass>
}
Notice how our individual copy (``%1``) threads its way through the IR using
`forwarding uses`_ of ``@owned`` ownership. These `forwarding uses`_ partition the
lifetime of the result of the copy_value into a set of disjoint individual owned
lifetimes (``%2``, ``%3``, ``%5``).
Guaranteed
``````````
Guaranteed ownership models values that have a scoped dependent lifetime on a
"base value" with owned or guaranteed ownership. Due to this lifetime
dependence, the base value is required to be statically live over the entire
scope where the guaranteed value is valid.
These explicit scopes are introduced into SIL by begin scope instructions (e.x.:
`begin_borrow`_, `load_borrow`_) that are paired with sets of jointly
post-dominating scope ending instructions (e.x.: `end_borrow`_)::
sil [ossa] @guaranteed_values : $@convention(thin) (@owned Klass) -> () {
bb0(%0 : @owned $Klass):
%1 = begin_borrow %0 : $Klass
cond_br ..., bb1, bb2
bb1:
...
end_borrow %1 : $Klass
destroy_value %0 : $Klass
br bb3
bb2:
...
end_borrow %1 : $Klass
destroy_value %0 : $Klass
br bb3
bb3:
...
}
Notice how the `end_borrow`_ allow for a SIL generator to communicate to
optimizations that they can never shrink the lifetime of ``%0`` by moving
`destroy_value`_ above ``%1``.
Values with guaranteed ownership follow a dataflow rule that states that
non-consuming `forwarding uses`_ of the guaranteed value are also guaranteed and
are recursively validated as being in the original values scope. This was a
choice we made to reduce idempotent scopes in the IR::
sil [ossa] @get_first_elt : $@convention(thin) (@guaranteed (String, String)) -> @owned String {
bb0(%0 : @guaranteed $(String, String)):
// %1 is validated as if it was apart of %0 and does not need its own begin_borrow/end_borrow.
%1 = tuple_extract %0 : $(String, String)
// So this copy_value is treated as a use of %0.
%2 = copy_value %1 : $String
return %2 : $String
}
None
````
Values with None ownership are inert values that exist outside of the guarantees
of Ownership SSA. Some examples of such values are:
* Trivially typed values such as: Int, Float, Double
* Non-payloaded non-trivial enums.
* Address types.
Since values with none ownership exist outside of ownership SSA, they can be
used like normal SSA without violating ownership SSA invariants. This does not
mean that code does not potentially violate other SIL rules (consider memory
lifetime invariants)::
sil @none_values : $@convention(thin) (Int, @in Klass) -> Int {
bb0(%0 : $Int, %1 : $*Klass):
// %0, %1 are normal SSA values that can be used anywhere in the function
// without breaking Ownership SSA invariants. It could violate other
// invariants if for instance, we load from %1 after we destroy the object
// there.
destroy_addr %1 : $*Klass
// If uncommented, this would violate memory lifetime invariants due to
// the ``destroy_addr %1`` above. But this would not violate the rules of
// Ownership SSA since addresses exist outside of the guarantees of
// Ownership SSA.
//
// %2 = load [take] %1 : $*Klass
// I can return this object without worrying about needing to copy since
// none objects can be arbitrarily returned.
return %0 : $Int
}
Unowned
```````
This is a form of ownership that is used to model two different use cases:
* Arguments of functions with ObjC convention. This convention requires the
callee to copy the value before using it (preferably before any other code
runs). We do not model this flow sensitive property in SIL today, but we do
not allow for unowned values to be passed as owned or guaranteed values
without copying it first.
* Values that are a conversion from a trivial value with None ownership to a
non-trivial value. As an example of this consider an unsafe bit cast of a
trivial pointer to a class. In that case, since we have no reason to assume
that the object will remain alive, we need to make a copy of the value.
Ownership Constraint
~~~~~~~~~~~~~~~~~~~~
NOTE: We assume that one has read the section above on `Ownership Kind`_.
As mentioned above, every operand ``operand(i)`` of a SIL instruction ``i`` has
statically mapped to it:
1. An ownership kind that acts as an "Ownership Constraint" upon what "Ownership
Kind" a value can take.
2. A boolean value that defines whether or not the execution of the operand's
instruction will cause the operand's value to be invalidated. This is often
times referred to as an operand acting as a "lifetime ending use".
Forwarding Uses
~~~~~~~~~~~~~~~
NOTE: In the following, we assumed that one read the section above, `Ownership
Kind`_, `Value Ownership Kind`_ and `Ownership Constraint`_.
A subset of SIL instructions define the value ownership kind of their results in
terms of the value ownership kind of their operands. Such an instruction is
called a "forwarding instruction" and any use with such a user instruction a
"forwarding use". This inference generally occurs upon instruction construction
and as a result:
* When manipulating forwarding instructions programatically, one must manually
update their forwarded ownership since most of the time the ownership is
stored in the instruction itself. Don't worry though because the SIL verifier
will catch this error for you if you forget to do so!
* Textual SIL does not represent the ownership of forwarding instructions
explicitly. Instead, the instruction's ownership is inferred normally from the
parsed operand. Since the SILVerifier runs on Textual SIL after parsing, you
can feel confident that ownership constraints were inferred correctly.
Forwarding has slightly different ownership semantics depending on the value
ownership kind of the operand on construction and the result's type. We go
through each below:
* Given an ``@owned`` operand, the forwarding instruction is assumed to end the
lifetime of the operand and produce an ``@owned`` value if non-trivially typed
and ``@none`` if trivially typed. Example: This is used to represent the
semantics of casts::
sil @unsafelyCastToSubClass : $@convention(thin) (@owned Klass) -> @owned SubKlass {
bb0(%0 : @owned $Klass): // %0 is defined here.
// %0 is consumed here and can no longer be used after this point.
// %1 is defined here and after this point must be used to access the object
// passed in via %0.
%1 = unchecked_ref_cast %0 : $Klass to $SubKlass
// Then %1's lifetime ends here and we return the casted argument to our
// caller as an @owned result.
return %1 : $SubKlass
}
* Given a ``@guaranteed`` operand, the forwarding instruction is assumed to
produce ``@guaranteed`` non-trivially typed values and ``@none`` trivially
typed values. Given the non-trivial case, the instruction is assumed to begin
a new implicit borrow scope for the incoming value. Since the borrow scope is
implicit, we validate the uses of the result as if they were uses of the
operand (recursively). This of course means that one should never see
end_borrows on any guaranteed forwarded results, the end_borrow is always on
the instruction that "introduces" the borrowed value. An example of a
guaranteed forwarding instruction is ``struct_extract``::
// In this function, I have a pair of Klasses and I want to grab some state
// and then call the hand off function for someone else to continue
// processing the pair.
sil @accessLHSStateAndHandOff : $@convention(thin) (@owned KlassPair) -> @owned State {
bb0(%0 : @owned $KlassPair): // %0 is defined here.
// Begin the borrow scope for %0. We want to access %1's subfield in a
// read only way that doesn't involve destructuring and extra copies. So
// we construct a guaranteed scope here so we can safely use a
// struct_extract.
%1 = begin_borrow %0 : $KlassPair
// Now we perform our struct_extract operation. This operation
// structurally grabs a value out of a struct without safety relying on
// the guaranteed ownership of its operand to know that %1 is live at all
// use points of %2, its result.
%2 = struct_extract %1 : $KlassPair, #KlassPair.lhs
// Then grab the state from our left hand side klass and copy it so we
// can pass off our klass pair to handOff for continued processing.
%3 = ref_element_addr %2 : $Klass, #Klass.state
%4 = load [copy] %3 : $*State
// Now that we have finished accessing %1, we end the borrow scope for %1.
end_borrow %1 : $KlassPair
%handOff = function_ref @handOff : $@convention(thin) (@owned KlassPair) -> ()
apply %handOff(%0) : $@convention(thin) (@owned KlassPair) -> ()
return %4 : $State
}
* Given an ``@none`` operand, the result value must have ``@none`` ownership.
* Given an ``@unowned`` operand, the result value will have ``@unowned``
ownership. It will be validated just like any other ``@unowned`` value, namely
that it must be copied before use.
An additional wrinkle here is that even though the vast majority of forwarding
instructions forward all types of ownership, this is not true in general. To see
why this is necessary, lets compare/contrast `struct_extract`_ (which does not
forward ``@owned`` ownership) and `unchecked_enum_data`_ (which can forward
/all/ ownership kinds). The reason for this difference is that `struct_extract`_
inherently can only extract out a single field of a larger object implying that
the instruction could only represent consuming a sub-field of a value instead of
the entire value at once. This violates our constraint that owned values can
never be partially consumed: a value is either completely alive or completely
dead. In contrast, enums always represent their payloads as elements in a single
tuple value. This means that `unchecked_enum_data`_ when it extracts that
payload from an enum, can consume the entire enum+payload.
To handle cases where we want to use `struct_extract`_ in a consuming way, we
instead are able to use the `destructure_struct`_ instruction that consumes the
entire struct at once and gives one back the structs individual constituant
parts::
struct KlassPair {
var fieldOne: Klass
var fieldTwo: Klass
}
sil @getFirstPairElt : $@convention(thin) (@owned KlassPair) -> @owned Klass {
bb0(%0 : @owned $KlassPair):
// If we were to just do this directly and consume KlassPair to access
// fieldOne... what would happen to fieldTwo? Would it be consumed?
//
// %1 = struct_extract %0 : $KlassPair, #KlassPair.fieldOne
//
// Instead we need to destructure to ensure we consume the entire owned value at once.
(%1, %2) = destructure_struct $KlassPair
// We only want to return %1, so we need to cleanup %2.
destroy_value %2 : $Klass
// Then return %1 to our caller
return %1 : $Klass
}
Borrowed Object based Safe Interior Pointers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What is an "Unsafe Interior Pointer"
````````````````````````````````````
An unsafe interior pointer is a bare pointer into the innards of an object. A
simple example of this in C++ would be using the method std::vector::data() to
get to the innards of a std::vector. In general interior pointers are unsafe to
use since languages do not provide any guarantees that the interior pointer will
not be used after the underlying object has been deallocated. To see this,
consider the following C++ example::
int unfortunateFunction() {
int *unsafeInteriorPointer = nullptr;
{
std::vector<int> vector;
vector.push_back(5);
unsafeInteriorPointer = vector.data();
printf("%d\n", *unsafeInteriorPointer); // Prints "5".
} // vector deallocated here
return *unsafeInteriorPointer; // Kaboom
}
In words, C++ allows for us to get the interior pointer into the vector, but
then lets us do whatever we want with the pointer, including use it after the
underlying memory has been invalidated.
From a user's perspective, interior pointers are really useful since one can use
it to pass data to other APIs that are only expecting a pointer and also since
one can use it to sometimes get better performance. But from a language designer
perspective, this sort of API verboten and leads to bugs, crashes, and security
vulnerabilities. That being said, clearly users have a need for such
functionality, so we, as language designers, should figure out manners to
express these sorts of patterns in our various languages in a safe way that
prevents user’s from foot-gunning themselves. In SIL, we have solved this
problem via the direct modeling of interior pointer instructions as a high level
concept in our IR.
Safe Interior Pointers in SIL
`````````````````````````````
In contrast to LLVM-IR, SIL provides mechanisms that language designers can use
to express concepts like the above in a manner that allows the language to
define away compiler generated unsafe interior pointer usage using "Safe
Interior Pointers". This is implemented in SIL by:
1. Classifying a set of instructions as being "interior pointer" instructions.
2. Enforcing in the SILVerifier that all "interior pointer" instructions can
only have operands with `Guaranteed`_ ownership.
3. Enforcing in the SILVerifier that any transitive address use of the interior
pointer to be a liveness requirement of the "interior pointer"'s
operand.
Note that the transitive address use verifier from (3) does not attempt to
classify uses directly. Instead the verifier:
1. Has an explicit list of instructions that it understands as requiring
liveness of the base object.
2. Has a second list of instructions that require liveness and produce a address
whose transitive uses need to be recursively processed.
3. Asserts on any instructions that are not known to the verifier. This ensures
that the verifier is kept up to date with new instructions.
Note that typically instructions in category (1) are instructions whose uses do
not propagate the pointer value, so they are safe. In contrast, some other
instructions in category (1) are escaping uses of the address such as
`pointer_to_address`_. Those uses are unsafe--the user is reponsible for
managing unsafe pointer lifetimes and the compiler must not extend those pointer
lifetimes.
These rules ensure statically that any uses of the address that are not escaped
explicitly by an instruction like `pointer_to_address`_ are within the
guaranteed pointers scope where the guaranteed value is statically known to be
live. As a result, in SIL it is impossible to express such a bug in compiler
generated code. As an example, consider the following unsafe interior pointer
SIL::
class Klass { var k: KlassField }
struct KlassWrapper { var k: Klass }
// ...
// Today SIL restricts interior pointer instructions to only have operands
// with guaranteed ownership.
%1 = begin_borrow %0 : $Klass
// %2 is an interior pointer into %1. Since %2 is an address, it's uses are
// not treated as uses of underlying borrowed object %1 in the ownership
// system. This is because at the ownership level objects with None
// ownership are not verified and do not have any constraints on how they
// are used from the ownership system.
//
// Instead the ownership verifier gathers up all such uses and treats them
// as uses of the object from which the interior pointer was projected from
// transitively. This means that this is a constraint on the guaranteed
// objects use, not on the trivial values.
%2 = ref_element_addr %1 : $Klass, #Klass.k // %2 is a $*KlassWrapper
%3 = struct_element_addr %2 : $*KlassWrapper, #KlassWrapper.k // %3 is a $*Klass
// So if we end the borrow %1 at this point, invalidating the addresses
// ``%2`` and ``%3``.
end_borrow %1 : $Klass
// We would here be loading from an invalidated address. This would cause a
// verifier error since %3's use here is a regular use that is inferred up
// on %1.
%4 = load [copy] %3 : $*KlassWrapper
// ...
Notice how due to a possible bug in the compiler, we are loading from
potentially uninitialized memory ``%4``. This would have caused a verifier error
stating that ``%4`` was an interior pointer based use-after-free of ``%1``
implying this is mal-formed SIL.
NOTE: This is a constraint on the base object, not on the addresses themselves
which are viewed as outside of the ownership system since they have `None`_
ownership.
In contrast to the previous example, the following example follows ownership
invariants and is valid SIL::
class Klass { var k: KlassField }
struct KlassWrapper { var k: Klass }
// ...
%1 = begin_borrow %0 : $Klass
// %2 is an interior pointer into the Klass k. Since %2 is an address and
// addresses have None ownership, it's uses are not treated as uses of the
// underlying object %1.
%2 = ref_element_addr %1 : $Klass, #Klass.k // %2 is a $*KlassWrapper
// Destroying %1 at this location would result in a verifier error since
// %2's uses are considered to be uses of %1.
//
// end_lifetime %1 : $Klass
// We are statically not loading from an invalidated address here since we
// are within the lifetime of ``%1``.
%3 = struct_element_addr %2 : $*KlassWrapper, #KlassWrapper.k
%4 = load [copy] %3 : $*Klass // %1 must be live here transitively
// ``%1``'s lifetime ends. Importantly we know that within the lifetime of
// ``%1``, ``%0``'s lifetime can not shrink past this point, implying
// transitive static safety.
end_borrow %1 : $Klass
In the second example, we show a well-formed SIL program showing off SIL's Safe
Interior Pointers. All of the uses of ``%2``, the interior pointer, are
transitively uses of the base underlying object, ``%0``.
The current list of interior pointer SIL instructions are:
* `project_box`_ - projects a pointer out of a reference counted box. (*)
* `ref_element_addr`_ - projects a field out of a reference counted class.
* `ref_tail_addr`_ - projects out a pointer to a class’s tail allocated array
memory (assuming the class was initialized to have such an array).
* `open_existential_box`_ - projects the address of the value out of a boxed
existential container using the current function context/protocol conformance
to create an "opened archetype".
* `project_existential_box`_ - projects a pointer to the value inside a boxed
existential container. Must be the type for which the box was initially
allocated for and not for an "opened" archetype.
(*) We still need to finish adding support for project_box, but all other
interior pointers are guarded already.
Runtime Failure
---------------
Some operations, such as failed unconditional `checked conversions`_ or the
``Builtin.trap`` compiler builtin, cause a *runtime failure*, which
unconditionally terminates the current actor. If it can be proven that a
runtime failure will occur or did occur, runtime failures may be reordered so
long as they remain well-ordered relative to operations external to the actor
or the program as a whole. For instance, with overflow checking on integer
arithmetic enabled, a simple ``for`` loop that reads inputs in from one or more
arrays and writes outputs to another array, all local
to the current actor, may cause runtime failure in the update operations::
// Given unknown start and end values, this loop may overflow
for var i = unknownStartValue; i != unknownEndValue; ++i {
...
}
It is permitted to hoist the overflow check and associated runtime failure out
of the loop itself and check the bounds of the loop prior to entering it, so
long as the loop body has no observable effect outside of the current actor.
Undefined Behavior
------------------
Incorrect use of some operations is *undefined behavior*, such as invalid
unchecked casts involving ``Builtin.RawPointer`` types, or use of compiler
builtins that lower to LLVM instructions with undefined behavior at the LLVM
level. A SIL program with undefined behavior is meaningless, much like undefined
behavior in C, and has no predictable semantics. Undefined behavior should not
be triggered by valid SIL emitted by a correct Swift program using a correct
standard library, but cannot in all cases be diagnosed or verified at the SIL
level.
Calling Convention
------------------
This section describes how Swift functions are emitted in SIL.
Swift Calling Convention @convention(swift)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Swift calling convention is the one used by default for native Swift
functions.
Tuples in the input type of the function are recursively destructured into
separate arguments, both in the entry point basic block of the callee, and
in the ``apply`` instructions used by callers::
func foo(_ x:Int, y:Int)
sil @foo : $(x:Int, y:Int) -> () {
entry(%x : $Int, %y : $Int):
...
}
func bar(_ x:Int, y:(Int, Int))
sil @bar : $(x:Int, y:(Int, Int)) -> () {
entry(%x : $Int, %y0 : $Int, %y1 : $Int):
...
}
func call_foo_and_bar() {
foo(1, 2)
bar(4, (5, 6))
}
sil @call_foo_and_bar : $() -> () {
entry:
...
%foo = function_ref @foo : $(x:Int, y:Int) -> ()
%foo_result = apply %foo(%1, %2) : $(x:Int, y:Int) -> ()
...
%bar = function_ref @bar : $(x:Int, y:(Int, Int)) -> ()
%bar_result = apply %bar(%4, %5, %6) : $(x:Int, y:(Int, Int)) -> ()
}
Calling a function with trivial value types as inputs and outputs
simply passes the arguments by value. This Swift function::
func foo(_ x:Int, y:Float) -> UnicodeScalar
foo(x, y)
gets called in SIL as::
%foo = constant_ref $(Int, Float) -> UnicodeScalar, @foo
%z = apply %foo(%x, %y) : $(Int, Float) -> UnicodeScalar
Reference Counts
````````````````
*NOTE* This section only is speaking in terms of rules of thumb. The
actual behavior of arguments with respect to arguments is defined by
the argument's convention attribute (e.g. ``@owned``), not the
calling convention itself.
Reference type arguments are passed in at +1 retain count and consumed
by the callee. A reference type return value is returned at +1 and
consumed by the caller. Value types with reference type components
have their reference type components each retained and released the
same way. This Swift function::
class A {}
func bar(_ x:A) -> (Int, A) { ... }
bar(x)
gets called in SIL as::
%bar = function_ref @bar : $(A) -> (Int, A)
strong_retain %x : $A
%z = apply %bar(%x) : $(A) -> (Int, A)
// ... use %z ...
%z_1 = tuple_extract %z : $(Int, A), 1
strong_release %z_1
When applying a thick function value as a callee, the function value is also
consumed at +1 retain count.
Address-Only Types
``````````````````
For address-only arguments, the caller allocates a copy and passes the address
of the copy to the callee. The callee takes ownership of the copy and is
responsible for destroying or consuming the value, though the caller must still
deallocate the memory. For address-only return values, the
caller allocates an uninitialized buffer and passes its address as the first
argument to the callee. The callee must initialize this buffer before
returning. This Swift function::
@API struct A {}
func bas(_ x:A, y:Int) -> A { return x }
var z = bas(x, y)
// ... use z ...
gets called in SIL as::
%bas = function_ref @bas : $(A, Int) -> A
%z = alloc_stack $A
%x_arg = alloc_stack $A
copy_addr %x to [initialize] %x_arg : $*A
apply %bas(%z, %x_arg, %y) : $(A, Int) -> A
dealloc_stack %x_arg : $*A // callee consumes %x.arg, caller deallocs
// ... use %z ...
destroy_addr %z : $*A
dealloc_stack stack %z : $*A
The implementation of ``@bas`` is then responsible for consuming ``%x_arg`` and
initializing ``%z``.
Tuple arguments are destructured regardless of the
address-only-ness of the tuple type. The destructured fields are passed
individually according to the above convention. This Swift function::
@API struct A {}
func zim(_ x:Int, y:A, (z:Int, w:(A, Int)))
zim(x, y, (z, w))
gets called in SIL as::
%zim = function_ref @zim : $(x:Int, y:A, (z:Int, w:(A, Int))) -> ()
%y_arg = alloc_stack $A
copy_addr %y to [initialize] %y_arg : $*A
%w_0_addr = element_addr %w : $*(A, Int), 0
%w_0_arg = alloc_stack $A
copy_addr %w_0_addr to [initialize] %w_0_arg : $*A
%w_1_addr = element_addr %w : $*(A, Int), 1
%w_1 = load %w_1_addr : $*Int
apply %zim(%x, %y_arg, %z, %w_0_arg, %w_1) : $(x:Int, y:A, (z:Int, w:(A, Int))) -> ()
dealloc_stack %w_0_arg
dealloc_stack %y_arg
Variadic Arguments
``````````````````
Variadic arguments and tuple elements are packaged into an array and passed as
a single array argument. This Swift function::
func zang(_ x:Int, (y:Int, z:Int...), v:Int, w:Int...)
zang(x, (y, z0, z1), v, w0, w1, w2)
gets called in SIL as::
%zang = function_ref @zang : $(x:Int, (y:Int, z:Int...), v:Int, w:Int...) -> ()
%zs = <<make array from %z1, %z2>>
%ws = <<make array from %w0, %w1, %w2>>
apply %zang(%x, %y, %zs, %v, %ws) : $(x:Int, (y:Int, z:Int...), v:Int, w:Int...) -> ()
@inout Arguments
````````````````
``@inout`` arguments are passed into the entry point by address. The callee
does not take ownership of the referenced memory. The referenced memory must
be initialized upon function entry and exit. If the ``@inout`` argument
refers to a fragile physical variable, then the argument is the address of that
variable. If the ``@inout`` argument refers to a logical property, then the
argument is the address of a caller-owned writeback buffer. It is the caller's
responsibility to initialize the buffer by storing the result of the property
getter prior to calling the function and to write back to the property
on return by loading from the buffer and invoking the setter with the final
value. This Swift function::
func inout(_ x: inout Int) {
x = 1
}
gets lowered to SIL as::
sil @inout : $(@inout Int) -> () {
entry(%x : $*Int):
%1 = integer_literal $Int, 1
store %1 to %x
return
}
Swift Method Calling Convention @convention(method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The method calling convention is currently identical to the freestanding
function convention. Methods are considered to be curried functions, taking
the "self" argument as their outer argument clause, and the method arguments
as the inner argument clause(s). The "self" argument is thus passed last::
struct Foo {
func method(_ x:Int) -> Int {}
}
sil @Foo_method_1 : $((x : Int), @inout Foo) -> Int { ... }
Witness Method Calling Convention @convention(witness_method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The witness method calling convention is used by protocol witness methods in
`witness tables`_. It is identical to the ``method`` calling convention
except that its handling of generic type parameters. For non-witness methods,
the machine-level convention for passing type parameter metadata may be
arbitrarily dependent on static aspects of the function signature, but because
witnesses must be polymorphically dispatchable on their ``Self`` type,
the ``Self``-related metadata for a witness must be passed in a maximally
abstracted manner.
C Calling Convention @convention(c)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In Swift's C module importer, C types are always mapped to Swift types
considered trivial by SIL. SIL does not concern itself with platform
ABI requirements for indirect return, register vs. stack passing, etc.; C
function arguments and returns in SIL are always by value regardless of the
platform calling convention.
SIL (and therefore Swift) cannot currently invoke variadic C functions.
Objective-C Calling Convention @convention(objc_method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reference Counts
````````````````
Objective-C methods use the same argument and return value ownership rules as
ARC Objective-C. Selector families and the ``ns_consumed``,
``ns_returns_retained``, etc. attributes from imported Objective-C definitions
are honored.
Applying a ``@convention(block)`` value does not consume the block.
Method Currying
```````````````
In SIL, the "self" argument of an Objective-C method is uncurried to the last
argument of the uncurried type, just like a native Swift method::
@objc class NSString {
func stringByPaddingToLength(Int) withString(NSString) startingAtIndex(Int)
}
sil @NSString_stringByPaddingToLength_withString_startingAtIndex \
: $((Int, NSString, Int), NSString)
That ``self`` is passed as the first argument at the IR level is abstracted
away in SIL, as is the existence of the ``_cmd`` selector argument.
Type Based Alias Analysis
-------------------------
SIL supports two types of Type Based Alias Analysis (TBAA): Class TBAA and
Typed Access TBAA.
Class TBAA
~~~~~~~~~~
Class instances and other *heap object references* are pointers at the
implementation level, but unlike SIL addresses, they are first class values and
can be ``capture``-d and aliased. Swift, however, is memory-safe and statically
typed, so aliasing of classes is constrained by the type system as follows:
* A ``Builtin.NativeObject`` may alias any native Swift heap object,
including a Swift class instance, a box allocated by ``alloc_box``,
or a thick function's closure context.
It may not alias natively Objective-C class instances.
* An ``AnyObject`` or ``Builtin.BridgeObject`` may alias any class instance,
whether Swift or Objective-C, but may not alias non-class-instance
heap objects.
* Two values of the same class type ``$C`` may alias. Two values of related
class type ``$B`` and ``$D``, where there is a subclass relationship between
``$B`` and ``$D``, may alias. Two values of unrelated class types may not
alias. This includes different instantiations of a generic class type, such
as ``$C<Int>`` and ``$C<Float>``, which currently may never alias.
* Without whole-program visibility, values of archetype or protocol type must
be assumed to potentially alias any class instance. Even if it is locally
apparent that a class does not conform to that protocol, another component
may introduce a conformance by an extension. Similarly, a generic class
instance, such as ``$C<T>`` for archetype ``T``, must be assumed to
potentially alias concrete instances of the generic type, such as
``$C<Int>``, because ``Int`` is a potential substitution for ``T``.
A violation of the above aliasing rules only results in undefined
behavior if the aliasing references are dereferenced within Swift code.
For example,
``__SwiftNativeNS[Array|Dictionary|String]`` classes alias with
``NS[Array|Dictionary|String]`` classes even though they are not
statically related. Since Swift never directly accesses stored
properties on the Foundation classes, this aliasing does not pose a
danger.
Typed Access TBAA
~~~~~~~~~~~~~~~~~
Define a *typed access* of an address or reference as one of the following:
* Any instruction that performs a typed read or write operation upon the memory
at the given location (e.x. ``load``, ``store``).
* Any instruction that yields a typed offset of the pointer by performing a
typed projection operation (e.x. ``ref_element_addr``,
``tuple_element_addr``).
With limited exceptions, it is undefined behavior to perform a typed access to
an address or reference addressed memory is not bound to the relevant type.
This allows the optimizer to assume that two addresses cannot alias if
there does not exist a substitution of archetypes that could cause one
of the types to be the type of a subobject of the other. Additionally,
this applies to the types of the values from which the addresses were
derived via a typed projection.
Consider the following SIL::
struct Element {
var i: Int
}
struct S1 {
var elt: Element
}
struct S2 {
var elt: Element
}
%adr1 = struct_element_addr %ptr1 : $*S1, #S.elt
%adr2 = struct_element_addr %ptr2 : $*S2, #S.elt
The optimizer may assume that ``%adr1`` does not alias with ``%adr2``
because the values that the addresses are derived from (``%ptr1`` and
``%ptr2``) have unrelated types. However, in the following example,
the optimizer cannot assume that ``%adr1`` does not alias with
``%adr2`` because ``%adr2`` is derived from a cast, and any subsequent
typed operations on the address will refer to the common ``Element`` type::
%adr1 = struct_element_addr %ptr1 : $*S1, #S.elt
%adr2 = pointer_to_address %ptr2 : $Builtin.RawPointer to $*Element
Exceptions to typed access TBAA rules are only allowed for blessed
alias-introducing operations. This permits limited type-punning. The only
current exception is the non-struct ``pointer_to_address`` variant. The
optimizer must be able to defensively determine that none of the *roots* of an
address are alias-introducing operations. An address root is the operation that
produces the address prior to applying any typed projections, indexing, or
casts. The following are valid address roots:
* Object allocation that generates an address, such as ``alloc_stack``
and ``alloc_box``.
* Address-type function arguments. These are crucially *not* considered
alias-introducing operations. It is illegal for the SIL optimizer to
form a new function argument from an arbitrary address-type
value. Doing so would require the optimizer to guarantee that the
new argument is both has a non-alias-introducing address root and
can be properly represented by the calling convention (address types
do not have a fixed representation).
* A strict cast from an untyped pointer, ``pointer_to_address [strict]``. It is
illegal for ``pointer_to_address [strict]`` to derive its address from an
alias-introducing operation's value. A type punned address may only be
produced from an opaque pointer via a non-strict ``pointer_to_address`` at the
point of conversion.
Address-to-address casts, via ``unchecked_addr_cast``, transparently
forward their source's address root, just like typed projections.
Address-type basic block arguments can be conservatively considered
aliasing-introducing operations; they are uncommon enough not to
matter and may eventually be prohibited altogether.
Although some pointer producing intrinsics exist, they do not need to be
considered alias-introducing exceptions to TBAA rules. ``Builtin.inttoptr``
produces a ``Builtin.RawPointer`` which is not interesting because by definition
it may alias with everything. Similarly, the LLVM builtins ``Builtin.bitcast``
and ``Builtin.trunc|sext|zextBitCast`` cannot produce typed pointers. These
pointer values must be converted to an address via ``pointer_to_address`` before
typed access can occur. Whether the ``pointer_to_address`` is strict determines
whether aliasing may occur.
Memory may be rebound to an unrelated type. Addresses to unrelated types may
alias as long as typed access only occurs while memory is bound to the relevant
type. Consequently, the optimizer cannot outright assume that addresses accessed
as unrelated types are nonaliasing. For example, pointer comparison cannot be
eliminated simply because the two addresses derived from those pointers are
accessed as unrelated types at different program points.
Value Dependence
----------------
In general, analyses can assume that independent values are
independently assured of validity. For example, a class method may
return a class reference::
bb0(%0 : $MyClass):
%1 = class_method %0 : $MyClass, #MyClass.foo
%2 = apply %1(%0) : $@convention(method) (@guaranteed MyClass) -> @owned MyOtherClass
// use of %2 goes here; no use of %1
strong_release %2 : $MyOtherClass
strong_release %1 : $MyClass
The optimizer is free to move the release of ``%1`` to immediately
after the call here, because ``%2`` can be assumed to be an
independently-managed value, and because Swift generally permits the
reordering of destructors.
However, some instructions do create values that are intrinsically
dependent on their operands. For example, the result of
``ref_element_addr`` will become a dangling pointer if the base is
released too soon. This is captured by the concept of *value dependence*,
and any transformation which can reorder of destruction of a value
around another operation must remain conscious of it.
A value ``%1`` is said to be *value-dependent* on a value ``%0`` if:
- ``%1`` is the result and ``%0`` is the first operand of one of the
following instructions:
- ``ref_element_addr``
- ``struct_element_addr``
- ``tuple_element_addr``
- ``unchecked_take_enum_data_addr``
- ``pointer_to_address``
- ``address_to_pointer``
- ``index_addr``
- ``index_raw_pointer``
- possibly some other conversions
- ``%1`` is the result of ``mark_dependence`` and ``%0`` is either of
the operands.
- ``%1`` is the value address of a box allocation instruction of which
``%0`` is the box reference.
- ``%1`` is the result of a ``struct``, ``tuple``, or ``enum``
instruction and ``%0`` is an operand.
- ``%1`` is the result of projecting out a subobject of ``%0``
with ``tuple_extract``, ``struct_extract``, ``unchecked_enum_data``,
``select_enum``, or ``select_enum_addr``.
- ``%1`` is the result of ``select_value`` and ``%0`` is one of the cases.
- ``%1`` is a basic block parameter and ``%0`` is the corresponding
argument from a branch to that block.
- ``%1`` is the result of a ``load`` from ``%0``. However, the value
dependence is cut after the first attempt to manage the value of
``%1``, e.g. by retaining it.
- Transitivity: there exists a value ``%2`` which ``%1`` depends on
and which depends on ``%0``. However, transitivity does not apply
to different subobjects of a struct, tuple, or enum.
Note, however, that an analysis is not required to track dependence
through memory. Nor is it required to consider the possibility of
dependence being established "behind the scenes" by opaque code, such
as by a method returning an unsafe pointer to a class property. The
dependence is required to be locally obvious in a function's SIL
instructions. Precautions must be taken against this either by SIL
generators (by using ``mark_dependence`` appropriately) or by the user
(by using the appropriate intrinsics and attributes with unsafe
language or library features).
Only certain types of SIL value can carry value-dependence:
- SIL address types
- unmanaged pointer types:
- ``@sil_unmanaged`` types
- ``Builtin.RawPointer``
- aggregates containing such a type, such as ``UnsafePointer``,
possibly recursively
- non-trivial types (but they can be independently managed)
This rule means that casting a pointer to an integer type breaks
value-dependence. This restriction is necessary so that reading an
``Int`` from a class doesn't force the class to be kept around!
A class holding an unsafe reference to an object must use some
sort of unmanaged pointer type to do so.
This rule does not include generic or resilient value types which
might contain unmanaged pointer types. Analyses are free to assume
that e.g. a ``copy_addr`` of a generic or resilient value type yields
an independently-managed value. The extension of value dependence to
types containing obvious unmanaged pointer types is an affordance to
make the use of such types more convenient; it does not shift the
ultimate responsibility for assuring the safety of unsafe
language/library features away from the user.
Copy-on-Write Representation
----------------------------
Copy-on-Write (COW) data structures are implemented by a reference to an object
which is copied on mutation in case it's not uniquely referenced.
A COW mutation sequence in SIL typically looks like::
(%uniq, %buffer) = begin_cow_mutation %immutable_buffer : $BufferClass
cond_br %uniq, bb_uniq, bb_not_unique
bb_uniq:
br bb_mutate(%buffer : $BufferClass)
bb_not_unique:
%copied_buffer = apply %copy_buffer_function(%buffer) : ...
br bb_mutate(%copied_buffer : $BufferClass)
bb_mutate(%mutable_buffer : $BufferClass):
%field = ref_element_addr %mutable_buffer : $BufferClass, #BufferClass.Field
store %value to %field : $ValueType
%new_immutable_buffer = end_cow_mutation %buffer : $BufferClass
Loading from a COW data structure looks like::
%field1 = ref_element_addr [immutable] %immutable_buffer : $BufferClass, #BufferClass.Field
%value1 = load %field1 : $*FieldType
...
%field2 = ref_element_addr [immutable] %immutable_buffer : $BufferClass, #BufferClass.Field
%value2 = load %field2 : $*FieldType
The ``immutable`` attribute means that loading values from ``ref_element_addr``
and ``ref_tail_addr`` instructions, which have the *same* operand, are
equivalent.
In other words, it's guaranteed that a buffer's properties are not mutated
between two ``ref_element/tail_addr [immutable]`` as long as they have the
same buffer reference as operand.
This is even true if e.g. the buffer 'escapes' to an unknown function.
In the example above, ``%value2`` is equal to ``%value1`` because the operand
of both ``ref_element_addr`` instructions is the same ``%immutable_buffer``.
Conceptually, the content of a COW buffer object can be seen as part of
the same *static* (immutable) SSA value as the buffer reference.
The lifetime of a COW value is strictly separated into *mutable* and
*immutable* regions by ``begin_cow_mutation`` and
``end_cow_mutation`` instructions::
%b1 = alloc_ref $BufferClass
// The buffer %b1 is mutable
%b2 = end_cow_mutation %b1 : $BufferClass
// The buffer %b2 is immutable
(%u1, %b3) = begin_cow_mutation %b1 : $BufferClass
// The buffer %b3 is mutable
%b4 = end_cow_mutation %b3 : $BufferClass
// The buffer %b4 is immutable
...
Both, ``begin_cow_mutation`` and ``end_cow_mutation``, consume their operand
and return the new buffer as an *owned* value.
The ``begin_cow_mutation`` will compile down to a uniqueness check and
``end_cow_mutation`` will compile to a no-op.
Although the physical pointer value of the returned buffer reference is the
same as the operand, it's important to generate a *new* buffer reference in
SIL. It prevents the optimizer from moving buffer accesses from a *mutable* into
a *immutable* region and vice versa.
Because the buffer *content* is conceptually part of the
buffer *reference* SSA value, there must be a new buffer reference every time
the buffer content is mutated.
To illustrate this, let's look at an example, where a COW value is mutated in
a loop. As with a scalar SSA value, also mutating a COW buffer will enforce a
phi-argument in the loop header block (for simplicity the code for copying a
non-unique buffer is not shown)::
header_block(%b_phi : $BufferClass):
(%u, %b_mutate) = begin_cow_mutation %b_phi : $BufferClass
// Store something to %b_mutate
%b_immutable = end_cow_mutation %b_mutate : $BufferClass
cond_br %loop_cond, exit_block, backedge_block
backedge_block:
br header_block(b_immutable : $BufferClass)
exit_block:
Two adjacent ``begin_cow_mutation`` and ``end_cow_mutation`` instructions
don't need to be in the same function.
Instruction Set
---------------
Allocation and Deallocation
~~~~~~~~~~~~~~~~~~~~~~~~~~~
These instructions allocate and deallocate memory.
alloc_stack
```````````
::
sil-instruction ::= 'alloc_stack' '[dynamic_lifetime]'? sil-type (',' debug-var-attr)*
%1 = alloc_stack $T
// %1 has type $*T
Allocates uninitialized memory that is sufficiently aligned on the stack
to contain a value of type ``T``. The result of the instruction is the address
of the allocated memory.
``alloc_stack`` always allocates memory on the stack even for runtime-sized type.
``alloc_stack`` marks the start of the lifetime of the value; the
allocation must be balanced with a ``dealloc_stack`` instruction to
mark the end of its lifetime. All ``alloc_stack`` allocations must be
deallocated prior to returning from a function. If a block has multiple
predecessors, the stack height and order of allocations must be consistent
coming from all predecessor blocks. ``alloc_stack`` allocations must be
deallocated in last-in, first-out stack order.
The ``dynamic_lifetime`` attribute specifies that the initialization and
destruction of the stored value cannot be verified at compile time.
This is the case, e.g. for conditionally initialized objects.
The memory is not retainable. To allocate a retainable box for a value
type, use ``alloc_box``.
alloc_ref
`````````
::
sil-instruction ::= 'alloc_ref'
('[' 'objc' ']')?
('[' 'stack' ']')?
('[' 'tail_elems' sil-type '*' sil-operand ']')*
sil-type
%1 = alloc_ref [stack] $T
%1 = alloc_ref [tail_elems $E * %2 : Builtin.Word] $T
// $T must be a reference type
// %1 has type $T
// $E is the type of the tail-allocated elements
// %2 must be of a builtin integer type
Allocates an object of reference type ``T``. The object will be initialized
with retain count 1; its state will be otherwise uninitialized. The
optional ``objc`` attribute indicates that the object should be
allocated using Objective-C's allocation methods (``+allocWithZone:``).
The optional ``stack`` attribute indicates that the object can be allocated
on the stack instead on the heap. In this case the instruction must have
balanced with a ``dealloc_ref [stack]`` instruction to mark the end of the
object's lifetime.
Note that the ``stack`` attribute only specifies that stack allocation is
possible. The final decision on stack allocation is done during llvm IR
generation. This is because the decision also depends on the object size,
which is not necessarily known at SIL level.
The optional ``tail_elems`` attributes specifies the amount of space to be
reserved for tail-allocated arrays of given element types and element counts.
If there are more than one ``tail_elems`` attributes then the tail arrays are
allocated in the specified order.
The count-operand must be of a builtin integer type.
The instructions ``ref_tail_addr`` and ``tail_addr`` can be used to project
the tail elements.
The ``objc`` attribute cannot be used together with ``tail_elems``.
alloc_ref_dynamic
`````````````````
::
sil-instruction ::= 'alloc_ref_dynamic'
('[' 'objc' ']')?
('[' 'tail_elems' sil-type '*' sil-operand ']')*
sil-operand ',' sil-type
%1 = alloc_ref_dynamic %0 : $@thick T.Type, $T
%1 = alloc_ref_dynamic [objc] %0 : $@objc_metatype T.Type, $T
%1 = alloc_ref_dynamic [tail_elems $E * %2 : Builtin.Word] %0 : $@thick T.Type, $T
// $T must be a class type
// %1 has type $T
// $E is the type of the tail-allocated elements
// %2 must be of a builtin integer type
Allocates an object of class type ``T`` or a subclass thereof. The
dynamic type of the resulting object is specified via the metatype
value ``%0``. The object will be initialized with retain count 1; its
state will be otherwise uninitialized.
The optional ``tail_elems`` and ``objc`` attributes have the same effect as
for ``alloc_ref``. See ``alloc_ref`` for details.
alloc_box
`````````
::
sil-instruction ::= 'alloc_box' sil-type (',' debug-var-attr)*
%1 = alloc_box $T
// %1 has type $@box T
Allocates a reference-counted ``@box`` on the heap large enough to hold a value
of type ``T``, along with a retain count and any other metadata required by the
runtime. The result of the instruction is the reference-counted ``@box``
reference that owns the box. The ``project_box`` instruction is used to retrieve
the address of the value inside the box.
The box will be initialized with a retain count of 1; the storage will be
uninitialized. The box owns the contained value, and releasing it to a retain
count of zero destroys the contained value as if by ``destroy_addr``.
Releasing a box is undefined behavior if the box's value is uninitialized.
To deallocate a box whose value has not been initialized, ``dealloc_box``
should be used.
alloc_value_buffer
``````````````````
::
sil-instruction ::= 'alloc_value_buffer' sil-type 'in' sil-operand
%1 = alloc_value_buffer $(Int, T) in %0 : $*Builtin.UnsafeValueBuffer
// The operand must have the exact type shown.
// The result has type $*(Int, T).
Given the address of an unallocated value buffer, allocate space in it
for a value of the given type. This instruction has undefined
behavior if the value buffer is currently allocated.
The type operand must be a lowered object type.
alloc_global
````````````
::
sil-instruction ::= 'alloc_global' sil-global-name
alloc_global @foo
Initialize the storage for a global variable. This instruction has
undefined behavior if the global variable has already been initialized.
The type operand must be a lowered object type.
get_async_continuation
``````````````````````
::
sil-instruction ::= 'get_async_continuation' '[throws]'? sil-type
%0 = get_async_continuation $T
%0 = get_async_continuation [throws] $U
Begins a suspension of an ``@async`` function. This instruction can only be
used inside an ``@async`` function. The result of the instruction is an
``UnsafeContinuation<T>`` value, where ``T`` is the formal type argument to the
instruction, or an ``UnsafeThrowingContinuation<T>`` if the instruction
carries the ``[throws]`` attribute. ``T`` must be a loadable type.
The continuation must be consumed by a ``await_async_continuation`` terminator
on all paths. Between ``get_async_continuation`` and
``await_async_continuation``, the following restrictions apply:
- The function cannot ``return``, ``throw``, ``yield``, or ``unwind``.
- There cannot be nested suspend points; namely, the function cannot call
another ``@async`` function, nor can it initiate another suspend point with
``get_async_continuation``.
The function suspends execution when the matching ``await_async_continuation``
terminator is reached, and resumes execution when the continuation is resumed.
The continuation resumption operation takes a value of type ``T`` which is
passed back into the function when it resumes execution in the ``await_async_continuation`` instruction's
``resume`` successor block. If the instruction
has the ``[throws]`` attribute, it can also be resumed in an error state, in
which case the matching ``await_async_continuation`` instruction must also
have an ``error`` successor.
Within the enclosing SIL function, the result continuation is consumed by the
``await_async_continuation``, and cannot be referenced after the
``await_async_continuation`` executes. Dynamically, the continuation value must
be resumed exactly once in the course of the program's execution; it is
undefined behavior to resume the continuation more than once. Conversely,
failing to resume the continuation will leave the suspended async coroutine
hung in its suspended state, leaking any resources it may be holding.
get_async_continuation_addr
```````````````````````````
::
sil-instruction ::= 'get_async_continuation_addr' '[throws]'? sil-type ',' sil-operand
%1 = get_async_continuation_addr $T, %0 : $*T
%1 = get_async_continuation_addr [throws] $U, %0 : $*U
Begins a suspension of an ``@async`` function, like ``get_async_continuation``,
additionally binding a specific memory location for receiving the value
when the result continuation is resumed. The operand must be an address whose
type is the maximally-abstracted lowered type of the formal resume type. The
memory must be uninitialized, and must remain allocated until the matching
``await_async_continuation`` instruction(s) consuming the result continuation
have executed. The behavior is otherwise the same as
``get_async_continuation``, and the same restrictions apply on code appearing
between ``get_async_continuation_addr`` and ``await_async_continuation`` as
apply between ``get_async_continuation`` and ``await_async_continuation``.
Additionally, the state of the memory referenced by the operand is indefinite
between the execution of ``get_async_continuation_addr`` and
``await_async_continuation``, and it is undefined behavior to read or modify
the memory during this time. After the ``await_async_continuation`` resumes
normally to its ``resume`` successor, the memory referenced by the operand is
initialized with the resume value, and that value is then owned by the current
function. If ``await_async_continuation`` instead resumes to its ``error``
successor, then the memory remains uninitialized.
hop_to_executor
```````````````
::
sil-instruction ::= 'hop_to_executor' sil-operand
hop_to_executor %0 : $T
// $T must conform to the Actor protocol
Ensures that all instructions, which need to run on the actor's executor
actually run on that executor.
This instruction can only be used inside an ``@async`` function.
Checks if the current executor is the one which is bound to the operand actor.
If not, begins a suspension point and enqueues the continuation to the executor
which is bound to the operand actor.
The operand is a guaranteed operand, i.e. not consumed.
dealloc_stack
`````````````
::
sil-instruction ::= 'dealloc_stack' sil-operand
dealloc_stack %0 : $*T
// %0 must be of $*T type
Deallocates memory previously allocated by ``alloc_stack``. The
allocated value in memory must be uninitialized or destroyed prior to
being deallocated. This instruction marks the end of the lifetime for
the value created by the corresponding ``alloc_stack`` instruction. The operand
must be the shallowest live ``alloc_stack`` allocation preceding the
deallocation. In other words, deallocations must be in last-in, first-out
stack order.
dealloc_box
```````````
::
sil-instruction ::= 'dealloc_box' sil-operand
dealloc_box %0 : $@box T
Deallocates a box, bypassing the reference counting mechanism. The box
variable must have a retain count of one. The boxed type must match the
type passed to the corresponding ``alloc_box`` exactly, or else
undefined behavior results.
This does not destroy the boxed value. The contents of the
value must have been fully uninitialized or destroyed before
``dealloc_box`` is applied.
project_box
```````````
::
sil-instruction ::= 'project_box' sil-operand
%1 = project_box %0 : $@box T
// %1 has type $*T
Given a ``@box T`` reference, produces the address of the value inside the box.
dealloc_ref
```````````
::
sil-instruction ::= 'dealloc_ref' ('[' 'stack' ']')? sil-operand
dealloc_ref [stack] %0 : $T
// $T must be a class type
Deallocates an uninitialized class type instance, bypassing the reference
counting mechanism.
The type of the operand must match the allocated type exactly, or else
undefined behavior results.
The instance must have a retain count of one.
This does not destroy stored properties of the instance. The contents
of stored properties must be fully uninitialized at the time
``dealloc_ref`` is applied.
The ``stack`` attribute indicates that the instruction is the balanced
deallocation of its operand which must be a ``alloc_ref [stack]``.
In this case the instruction marks the end of the object's lifetime but
has no other effect.
dealloc_partial_ref
```````````````````
::
sil-instruction ::= 'dealloc_partial_ref' sil-operand sil-metatype
dealloc_partial_ref %0 : $T, %1 : $U.Type
// $T must be a class type
// $T must be a subclass of U
Deallocates a partially-initialized class type instance, bypassing
the reference counting mechanism.
The type of the operand must be a supertype of the allocated type, or
else undefined behavior results.
The instance must have a retain count of one.
All stored properties in classes more derived than the given metatype
value must be initialized, and all other stored properties must be
uninitialized. The initialized stored properties are destroyed before
deallocating the memory for the instance.
This does not destroy the reference type instance. The contents of the
heap object must have been fully uninitialized or destroyed before
``dealloc_ref`` is applied.
dealloc_value_buffer
````````````````````
::
sil-instruction ::= 'dealloc_value_buffer' sil-type 'in' sil-operand
dealloc_value_buffer $(Int, T) in %0 : $*Builtin.UnsafeValueBuffer
// The operand must have the exact type shown.
Given the address of a value buffer, deallocate the storage in it.
This instruction has undefined behavior if the value buffer is not
currently allocated, or if it was allocated with a type other than the
type operand.
The type operand must be a lowered object type.
project_value_buffer
````````````````````
::
sil-instruction ::= 'project_value_buffer' sil-type 'in' sil-operand
%1 = project_value_buffer $(Int, T) in %0 : $*Builtin.UnsafeValueBuffer
// The operand must have the exact type shown.
// The result has type $*(Int, T).
Given the address of a value buffer, return the address of the value
storage in it. This instruction has undefined behavior if the value
buffer is not currently allocated, or if it was allocated with a type
other than the type operand.
The result is the same value as was originally returned by
``alloc_value_buffer``.
The type operand must be a lowered object type.
Debug Information
~~~~~~~~~~~~~~~~~
Debug information is generally associated with allocations (alloc_stack or
alloc_box) by having a Decl node attached to the allocation with a SILLocation.
For declarations that have no allocation we have explicit instructions for
doing this. This is used by 'let' declarations, which bind a value to a name
and for var decls who are promoted into registers. The decl they refer to is
attached to the instruction with a SILLocation.
debug_value
```````````
::
sil-instruction ::= debug_value sil-operand (',' debug-var-attr)*
debug_value %1 : $Int
This indicates that the value of a declaration with loadable type has changed
value to the specified operand. The declaration in question is identified by
the SILLocation attached to the debug_value instruction.
The operand must have loadable type.
::
debug-var-attr ::= 'var'
debug-var-attr ::= 'let'
debug-var-attr ::= 'name' string-literal
debug-var-attr ::= 'argno' integer-literal
There are a number of attributes that provide details about the source
variable that is being described, including the name of the
variable. For function and closure arguments ``argno`` is the number
of the function argument starting with 1.
debug_value_addr
````````````````
::
sil-instruction ::= debug_value_addr sil-operand (',' debug-var-attr)*
debug_value_addr %7 : $*SomeProtocol
This indicates that the value of a declaration with address-only type
has changed value to the specified operand. The declaration in
question is identified by the SILLocation attached to the
debug_value_addr instruction.
Accessing Memory
~~~~~~~~~~~~~~~~
load
````
::
sil-instruction ::= 'load' sil-operand
%1 = load %0 : $*T
// %0 must be of a $*T address type for loadable type $T
// %1 will be of type $T
Loads the value at address ``%0`` from memory. ``T`` must be a loadable type.
This does not affect the reference count, if any, of the loaded value; the
value must be retained explicitly if necessary. It is undefined behavior to
load from uninitialized memory or to load from an address that points to
deallocated storage.
store
`````
::
sil-instruction ::= 'store' sil-value 'to' sil-operand
store %0 to %1 : $*T
// $T must be a loadable type
Stores the value ``%0`` to memory at address ``%1``. The type of %1 is ``*T``
and the type of ``%0`` is ``T``, which must be a loadable type. This will
overwrite the memory at ``%1``. If ``%1`` already references a value that
requires ``release`` or other cleanup, that value must be loaded before being
stored over and cleaned up. It is undefined behavior to store to an address
that points to deallocated storage.
load_borrow
```````````
::
sil-instruction ::= 'load_borrow' sil-value
%1 = load_borrow %0 : $*T
// $T must be a loadable type
Loads the value ``%1`` from the memory location ``%0``. The ``load_borrow``
instruction creates a borrowed scope in which a read-only borrow value ``%1``
can be used to read the value stored in ``%0``. The end of scope is delimited
by an ``end_borrow`` instruction. All ``load_borrow`` instructions must be
paired with exactly one ``end_borrow`` instruction along any path through the
program. Until ``end_borrow``, it is illegal to invalidate or store to ``%0``.
begin_borrow
````````````
TODO
end_borrow
``````````
::
sil-instruction ::= 'end_borrow' sil-value 'from' sil-value : sil-type, sil-type
end_borrow %1 from %0 : $T, $T
end_borrow %1 from %0 : $T, $*T
end_borrow %1 from %0 : $*T, $T
end_borrow %1 from %0 : $*T, $*T
// We allow for end_borrow to be specified in between values and addresses
// all of the same type T.
Ends the scope for which the SILValue ``%1`` is borrowed from the SILValue
``%0``. Must be paired with at most 1 borrowing instruction (like
``load_borrow``) along any path through the program. In the region in between
the borrow instruction and the ``end_borrow``, the original SILValue can not be
modified. This means that:
1. If ``%0`` is an address, ``%0`` can not be written to.
2. If ``%0`` is a non-trivial value, ``%0`` can not be destroyed.
We require that ``%1`` and ``%0`` have the same type ignoring SILValueCategory.
assign
``````
::
sil-instruction ::= 'assign' sil-value 'to' sil-operand
assign %0 to %1 : $*T
// $T must be a loadable type
Represents an abstract assignment of the value ``%0`` to memory at address
``%1`` without specifying whether it is an initialization or a normal store.
The type of %1 is ``*T`` and the type of ``%0`` is ``T``, which must be a
loadable type. This will overwrite the memory at ``%1`` and destroy the value
currently held there.
The purpose of the ``assign`` instruction is to simplify the
definitive initialization analysis on loadable variables by removing
what would otherwise appear to be a load and use of the current value.
It is produced by SILGen, which cannot know which assignments are
meant to be initializations. If it is deemed to be an initialization,
it can be replaced with a ``store``; otherwise, it must be replaced
with a sequence that also correctly destroys the current value.
This instruction is only valid in Raw SIL and is rewritten as appropriate
by the definitive initialization pass.
assign_by_wrapper
``````````````````
::
sil-instruction ::= 'assign_by_wrapper' sil-operand 'to' sil-operand ',' 'init' sil-operand ',' 'set' sil-operand
assign_by_wrapper %0 : $S to %1 : $*T, init %2 : $F, set %3 : $G
// $S can be a value or address type
// $T must be the type of a property wrapper.
// $F must be a function type, taking $S as a single argument (or multiple arguments in case of a tuple) and returning $T
// $G must be a function type, taking $S as a single argument (or multiple arguments in case of a tuple) and without a return value
Similar to the ``assign`` instruction, but the assignment is done via a
delegate.
In case of an initialization, the function ``%2`` is called with ``%0`` as
argument. The result is stored to ``%1``. In case ``%2`` is an address type,
it is simply passed as a first out-argument to ``%2``.
In case of a re-assignment, the function ``%3`` is called with ``%0`` as
argument. As ``%3`` is a setter (e.g. for the property in the containing
nominal type), the destination address ``%1`` is not used in this case.
This instruction is only valid in Raw SIL and is rewritten as appropriate
by the definitive initialization pass.
mark_uninitialized
``````````````````
::
sil-instruction ::= 'mark_uninitialized' '[' mu_kind ']' sil-operand
mu_kind ::= 'var'
mu_kind ::= 'rootself'
mu_kind ::= 'crossmodulerootself'
mu_kind ::= 'derivedself'
mu_kind ::= 'derivedselfonly'
mu_kind ::= 'delegatingself'
mu_kind ::= 'delegatingselfallocated'
%2 = mark_uninitialized [var] %1 : $*T
// $T must be an address
Indicates that a symbolic memory location is uninitialized, and must be
explicitly initialized before it escapes or before the current function returns.
This instruction returns its operands, and all accesses within the function must
be performed against the return value of the mark_uninitialized instruction.
The kind of mark_uninitialized instruction specifies the type of data
the mark_uninitialized instruction refers to:
- ``var``: designates the start of a normal variable live range
- ``rootself``: designates ``self`` in a struct, enum, or root class
- ``crossmodulerootself``: same as ``rootself``, but in a case where it's not
really safe to treat ``self`` as a root because the original module might add
more stored properties. This is only used for Swift 4 compatibility.
- ``derivedself``: designates ``self`` in a derived (non-root) class
- ``derivedselfonly``: designates ``self`` in a derived (non-root) class whose stored properties have already been initialized
- ``delegatingself``: designates ``self`` on a struct, enum, or class in a delegating constructor (one that calls self.init)
- ``delegatingselfallocated``: designates ``self`` on a class convenience initializer's initializing entry point
The purpose of the ``mark_uninitialized`` instruction is to enable
definitive initialization analysis for global variables (when marked as
'globalvar') and instance variables (when marked as 'rootinit'), which need to
be distinguished from simple allocations.
It is produced by SILGen, and is only valid in Raw SIL. It is rewritten as
appropriate by the definitive initialization pass.
mark_function_escape
````````````````````
::
sil-instruction ::= 'mark_function_escape' sil-operand (',' sil-operand)
%2 = mark_function_escape %1 : $*T
Indicates that a function definition closes over a symbolic memory location.
This instruction is variadic, and all of its operands must be addresses.
The purpose of the ``mark_function_escape`` instruction is to enable
definitive initialization analysis for global variables and instance variables,
which are not represented as box allocations.
It is produced by SILGen, and is only valid in Raw SIL. It is rewritten as
appropriate by the definitive initialization pass.
mark_uninitialized_behavior
```````````````````````````
::
init-case ::= sil-value sil-apply-substitution-list? '(' sil-value ')' ':' sil-type
set-case ::= sil-value sil-apply-substitution-list? '(' sil-value ')' ':' sil-type
sil-instruction ::= 'mark_uninitialized_behavior' init-case set-case
mark_uninitialized_behavior %init<Subs>(%storage) : $T -> U,
%set<Subs>(%self) : $V -> W
Indicates that a logical property is uninitialized at this point and needs to be
initialized by the end of the function and before any escape point for this
instruction. Assignments to the property trigger the behavior's ``init`` or
``set`` logic based on the logical initialization state of the property.
It is expected that the ``init-case`` is passed some sort of storage and the
``set`` case is passed ``self``.
This is only valid in Raw SIL.
copy_addr
`````````
::
sil-instruction ::= 'copy_addr' '[take]'? sil-value
'to' '[initialization]'? sil-operand
copy_addr [take] %0 to [initialization] %1 : $*T
// %0 and %1 must be of the same $*T address type
Loads the value at address ``%0`` from memory and assigns a copy of it back into
memory at address ``%1``. A bare ``copy_addr`` instruction when ``T`` is a
non-trivial type::
copy_addr %0 to %1 : $*T
is equivalent to::
%new = load %0 : $*T // Load the new value from the source
%old = load %1 : $*T // Load the old value from the destination
strong_retain %new : $T // Retain the new value
strong_release %old : $T // Release the old
store %new to %1 : $*T // Store the new value to the destination
except that ``copy_addr`` may be used even if ``%0`` is of an address-only
type. The ``copy_addr`` may be given one or both of the ``[take]`` or
``[initialization]`` attributes:
* ``[take]`` destroys the value at the source address in the course of the
copy.
* ``[initialization]`` indicates that the destination address is uninitialized.
Without the attribute, the destination address is treated as already
initialized, and the existing value will be destroyed before the new value
is stored.
The three attributed forms thus behave like the following loadable type
operations::
// take-assignment
copy_addr [take] %0 to %1 : $*T
// is equivalent to:
%new = load %0 : $*T
%old = load %1 : $*T
// no retain of %new!
strong_release %old : $T
store %new to %1 : $*T
// copy-initialization
copy_addr %0 to [initialization] %1 : $*T
// is equivalent to:
%new = load %0 : $*T
strong_retain %new : $T
// no load/release of %old!
store %new to %1 : $*T
// take-initialization
copy_addr [take] %0 to [initialization] %1 : $*T
// is equivalent to:
%new = load %0 : $*T
// no retain of %new!
// no load/release of %old!
store %new to %1 : $*T
If ``T`` is a trivial type, then ``copy_addr`` is always equivalent to its
take-initialization form.
destroy_addr
````````````
::
sil-instruction ::= 'destroy_addr' sil-operand
destroy_addr %0 : $*T
// %0 must be of an address $*T type
Destroys the value in memory at address ``%0``. If ``T`` is a non-trivial type,
This is equivalent to::
%1 = load %0
strong_release %1
except that ``destroy_addr`` may be used even if ``%0`` is of an
address-only type. This does not deallocate memory; it only destroys the
pointed-to value, leaving the memory uninitialized.
If ``T`` is a trivial type, then ``destroy_addr`` can be safely
eliminated. However, a memory location ``%a`` must not be accessed
after ``destroy_addr %a`` (which has not yet been eliminated)
regardless of its type.
index_addr
``````````
::
sil-instruction ::= 'index_addr' sil-operand ',' sil-operand
%2 = index_addr %0 : $*T, %1 : $Builtin.Int<n>
// %0 must be of an address type $*T
// %1 must be of a builtin integer type
// %2 will be of type $*T
Given an address that references into an array of values, returns the address
of the ``%1``-th element relative to ``%0``. The address must reference into
a contiguous array. It is undefined to try to reference offsets within a
non-array value, such as fields within a homogeneous struct or tuple type, or
bytes within a value, using ``index_addr``. (``Int8`` address types have no
special behavior in this regard, unlike ``char*`` or ``void*`` in C.) It is
also undefined behavior to index out of bounds of an array, except to index
the "past-the-end" address of the array.
tail_addr
`````````
::
sil-instruction ::= 'tail_addr' sil-operand ',' sil-operand ',' sil-type
%2 = tail_addr %0 : $*T, %1 : $Builtin.Int<n>, $E
// %0 must be of an address type $*T
// %1 must be of a builtin integer type
// %2 will be of type $*E
Given an address of an array of ``%1`` values, returns the address of an
element which is tail-allocated after the array.
This instruction is equivalent to ``index_addr`` except that the resulting
address is aligned-up to the tail-element type ``$E``.
This instruction is used to project the N-th tail-allocated array from an
object which is created by an ``alloc_ref`` with multiple ``tail_elems``.
The first operand is the address of an element of the (N-1)-th array, usually
the first element. The second operand is the number of elements until the end
of that array. The result is the address of the first element of the N-th array.
It is undefined behavior if the provided address, count and type do not match
the actual layout of tail-allocated arrays of the underlying object.
index_raw_pointer
`````````````````
::
sil-instruction ::= 'index_raw_pointer' sil-operand ',' sil-operand
%2 = index_raw_pointer %0 : $Builtin.RawPointer, %1 : $Builtin.Int<n>
// %0 must be of $Builtin.RawPointer type
// %1 must be of a builtin integer type
// %2 will be of type $Builtin.RawPointer
Given a ``Builtin.RawPointer`` value ``%0``, returns a pointer value at the
byte offset ``%1`` relative to ``%0``.
bind_memory
```````````
::
sil-instruction ::= 'bind_memory' sil-operand ',' sil-operand 'to' sil-type
bind_memory %0 : $Builtin.RawPointer, %1 : $Builtin.Word to $T
// %0 must be of $Builtin.RawPointer type
// %1 must be of $Builtin.Word type
Binds memory at ``Builtin.RawPointer`` value ``%0`` to type ``$T`` with enough
capacity to hold ``%1`` values. See SE-0107: UnsafeRawPointer.
begin_access
````````````
::
sil-instruction ::= 'begin_access' '[' sil-access ']' '[' sil-enforcement ']' '[no_nested_conflict]'? '[builtin]'? sil-operand ':' sil-type
sil-access ::= init
sil-access ::= read
sil-access ::= modify
sil-access ::= deinit
sil-enforcement ::= unknown
sil-enforcement ::= static
sil-enforcement ::= dynamic
sil-enforcement ::= unsafe
%1 = begin_access [read] [unknown] %0 : $*T
// %0 must be of $*T type.
Begins an access to the target memory.
The operand must be a *root address derivation*:
- a function argument,
- an ``alloc_stack`` instruction,
- a ``project_box`` instruction,
- a ``global_addr`` instruction,
- a ``ref_element_addr`` instruction, or
- another ``begin_access`` instruction.
It will eventually become a basic structural rule of SIL that no memory
access instructions can be directly applied to the result of one of these
instructions; they can only be applied to the result of a ``begin_access``
on them. For now, this rule will be conditional based on compiler settings
and the SIL stage.
An access is ended with a corresponding ``end_access``. Accesses must be
uniquely ended on every control flow path which leads to either a function
exit or back to the ``begin_access`` instruction. The set of active
accesses must be the same on every edge into a basic block.
An ``init`` access takes uninitialized memory and initializes it.
It must always use ``static`` enforcement.
An ``deinit`` access takes initialized memory and leaves it uninitialized.
It must always use ``static`` enforcement.
``read`` and ``modify`` accesses take initialized memory and leave it
initialized. They may use ``unknown`` enforcement only in the ``raw``
SIL stage.
A ``no_nested_conflict`` access has no potentially conflicting access within
its scope (on any control flow path between it and its corresponding
``end_access``). Consequently, the access will not need to be tracked by the
runtime for the duration of its scope. This access may still conflict with an
outer access scope; therefore may still require dynamic enforcement at a single
point.
A ``builtin`` access was emitted for a user-controlled Builtin (e.g. the
standard library's KeyPath access). Non-builtin accesses are auto-generated by
the compiler to enforce formal access that derives from the language. A
``builtin`` access is always fully enforced regardless of the compilation mode
because it may be used to enforce access outside of the current module.
end_access
``````````
::
sil-instruction ::= 'end_access' ( '[' 'abort' ']' )? sil-operand
Ends an access. The operand must be a ``begin_access`` instruction.
If the ``begin_access`` is ``init`` or ``deinit``, the ``end_access``
may be an ``abort``, indicating that the described transition did not
in fact take place.
begin_unpaired_access
`````````````````````
::
sil-instruction ::= 'begin_unpaired_access' '[' sil-access ']' '[' sil-enforcement ']' '[no_nested_conflict]'? '[builtin]'? sil-operand : sil-type, sil-operand : $*Builtin.UnsafeValueBuffer
sil-access ::= init
sil-access ::= read
sil-access ::= modify
sil-access ::= deinit
sil-enforcement ::= unknown
sil-enforcement ::= static
sil-enforcement ::= dynamic
sil-enforcement ::= unsafe
%2 = begin_unpaired_access [read] [dynamic] %0 : $*T, %1 : $*Builtin.UnsafeValueBuffer
// %0 must be of $*T type.
Begins an access to the target memory. This has the same semantics and obeys all
the same constraints as ``begin_access``. With the following exceptions:
- ``begin_unpaired_access`` has an additional operand for the scratch buffer
used to uniquely identify this access within its scope.
- An access initiated by ``begin_unpaired_access`` must end with
``end_unpaired_access`` unless it has the ``no_nested_conflict`` flag. A
``begin_unpaired_access`` with ``no_nested_conflict`` is effectively an
instantaneous access with no associated scope.
- The associated ``end_unpaired_access`` must use the same scratch buffer.
end_unpaired_access
```````````````````
::
sil-instruction ::= 'end_unpaired_access' ( '[' 'abort' ']' )? '[' sil-enforcement ']' sil-operand : $*Builtin.UnsafeValueBuffer
sil-enforcement ::= unknown
sil-enforcement ::= static
sil-enforcement ::= dynamic
sil-enforcement ::= unsafe
%1 = end_unpaired_access [dynamic] %0 : $*Builtin.UnsafeValueBuffer
Ends an access. This has the same semantics and constraints as ``end_access`` with the following exceptions:
- The single operand refers to the scratch buffer that uniquely identified the
access with this scope.
- The enforcement level is reiterated, since the corresponding
``begin_unpaired_access`` may not be statically discoverable. It must be
identical to the ``begin_unpaired_access`` enforcement.
Reference Counting
~~~~~~~~~~~~~~~~~~
These instructions handle reference counting of heap objects. Values of
strong reference type have ownership semantics for the referenced heap
object. Retain and release operations, however,
are never implicit in SIL and always must be explicitly performed where needed.
Retains and releases on the value may be freely moved, and balancing
retains and releases may be deleted, so long as an owning retain count is
maintained for the uses of the value.
All reference-counting operations are defined to work correctly on
null references (whether strong, unowned, or weak). A non-null
reference must actually refer to a valid object of the indicated type
(or a subtype). Address operands are required to be valid and non-null.
While SIL makes reference-counting operations explicit, the SIL type
system also fully represents strength of reference. This is useful
for several reasons:
1. Type-safety: it is impossible to erroneously emit SIL that naively
uses a ``@weak`` or ``@unowned`` reference as if it were a strong
reference.
2. Consistency: when a reference is kept in memory, instructions like
``copy_addr`` and ``destroy_addr`` implicitly carry the right
semantics in the type of the address, rather than needing special
variants or flags.
3. Ease of tooling: SIL directly stores the user's intended strength
of reference, making it straightforward to generate instrumentation
that would convey this to a memory profiler. In principle, with
only a modest number of additions and restrictions on SIL, it would
even be possible to drop all reference-counting instructions and
use the type information to feed a garbage collector.
strong_retain
`````````````
::
sil-instruction ::= 'strong_retain' sil-operand
strong_retain %0 : $T
// $T must be a reference type
Increases the strong retain count of the heap object referenced by ``%0``.
strong_release
``````````````
::
strong_release %0 : $T
// $T must be a reference type.
Decrements the strong reference count of the heap object referenced by ``%0``.
If the release operation brings the strong reference count of the object to
zero, the object is destroyed and ``@weak`` references are cleared. When both
its strong and unowned reference counts reach zero, the object's memory is
deallocated.
set_deallocating
````````````````
::
set_deallocating %0 : $T
// $T must be a reference type.
Explicitly sets the state of the object referenced by ``%0`` to deallocated.
This is the same operation what's done by a strong_release immediately before
it calls the deallocator of the object.
It is expected that the strong reference count of the object is one.
Furthermore, no other thread may increment the strong reference count during
execution of this instruction.
strong_copy_unowned_value
`````````````````````````
::
sil-instruction ::= 'strong_copy_unowned_value' sil-operand
%1 = strong_copy_unowned_value %0 : $@unowned T
// %1 will be a strong @owned value of type $T.
// $T must be a reference type
Asserts that the strong reference count of the heap object referenced by ``%0``
is still positive, then increments the reference count and returns a new strong
reference to ``%0``. The intention is that this instruction is used as a "safe
ownership conversion" from ``unowned`` to ``strong``.
strong_retain_unowned
`````````````````````
::
sil-instruction ::= 'strong_retain_unowned' sil-operand
strong_retain_unowned %0 : $@unowned T
// $T must be a reference type
Asserts that the strong reference count of the heap object referenced by ``%0``
is still positive, then increases it by one.
unowned_retain
``````````````
::
sil-instruction ::= 'unowned_retain' sil-operand
unowned_retain %0 : $@unowned T
// $T must be a reference type
Increments the unowned reference count of the heap object underlying ``%0``.
unowned_release
```````````````
::
sil-instruction ::= 'unowned_release' sil-operand
unowned_release %0 : $@unowned T
// $T must be a reference type
Decrements the unowned reference count of the heap object referenced by
``%0``. When both its strong and unowned reference counts reach zero,
the object's memory is deallocated.
load_weak
`````````
::
sil-instruction ::= 'load_weak' '[take]'? sil-operand
load_weak [take] %0 : $*@sil_weak Optional<T>
// $T must be an optional wrapping a reference type
Increments the strong reference count of the heap object held in the operand,
which must be an initialized weak reference. The result is value of type
``$Optional<T>``, except that it is ``null`` if the heap object has begun
deallocation.
If ``[take]`` is specified then the underlying weak reference is invalidated
implying that the weak reference count of the loaded value is decremented. If
``[take]`` is not specified then the underlying weak reference count is not
affected by this operation (i.e. it is a +0 weak ref count operation). In either
case, the strong reference count will be incremented before any changes to the
weak reference count.
This operation must be atomic with respect to the final ``strong_release`` on
the operand heap object. It need not be atomic with respect to ``store_weak``
operations on the same address.
store_weak
``````````
::
sil-instruction ::= 'store_weak' sil-value 'to' '[initialization]'? sil-operand
store_weak %0 to [initialization] %1 : $*@sil_weak Optional<T>
// $T must be an optional wrapping a reference type
Initializes or reassigns a weak reference. The operand may be ``nil``.
If ``[initialization]`` is given, the weak reference must currently either be
uninitialized or destroyed. If it is not given, the weak reference must
currently be initialized. After the evaluation:
* The value that was originally referenced by the weak reference will have
its weak reference count decremented by 1.
* If the optionally typed operand is non-nil, the strong reference wrapped in
the optional has its weak reference count incremented by 1. In contrast, the reference's
strong reference count is not touched.
This operation must be atomic with respect to the final ``strong_release`` on
the operand (source) heap object. It need not be atomic with respect to
``store_weak`` or ``load_weak`` operations on the same address.
load_unowned
````````````
TODO: Fill this in
store_unowned
`````````````
TODO: Fill this in
fix_lifetime
````````````
::
sil-instruction :: 'fix_lifetime' sil-operand
fix_lifetime %0 : $T
// Fix the lifetime of a value %0
fix_lifetime %1 : $*T
// Fix the lifetime of the memory object referenced by %1
Acts as a use of a value operand, or of the value in memory referenced by an
address operand. Optimizations may not move operations that would destroy the
value, such as ``release_value``, ``strong_release``, ``copy_addr [take]``, or
``destroy_addr``, past this instruction.
mark_dependence
```````````````
::
sil-instruction :: 'mark_dependence' sil-operand 'on' sil-operand
%2 = mark_dependence %0 : $*T on %1 : $Builtin.NativeObject
Indicates that the validity of the first operand depends on the value
of the second operand. Operations that would destroy the second value
must not be moved before any instructions which depend on the result
of this instruction, exactly as if the address had been obviously
derived from that operand (e.g. using ``ref_element_addr``).
The result is always equal to the first operand. The first operand
will typically be an address, but it could be an address in a
non-obvious form, such as a Builtin.RawPointer or a struct containing
the same. Transformations should be somewhat forgiving here.
The second operand may have either object or address type. In the
latter case, the dependency is on the current value stored in the
address.
is_unique
`````````
::
sil-instruction ::= 'is_unique' sil-operand
%1 = is_unique %0 : $*T
// $T must be a reference-counted type
// %1 will be of type Builtin.Int1
Checks whether %0 is the address of a unique reference to a memory
object. Returns 1 if the strong reference count is 1, and 0 if the
strong reference count is greater than 1.
A discussion of the semantics can be found here:
:ref:`arcopts.is_unique`.
begin_cow_mutation
``````````````````
::
sil-instruction ::= 'begin_cow_mutation' '[native]'? sil-operand
(%1, %2) = begin_cow_mutation %0 : $C
// $C must be a reference-counted type
// %1 will be of type Builtin.Int1
// %2 will be of type C
Checks whether %0 is a unique reference to a memory object. Returns 1 in the
first result if the strong reference count is 1, and 0 if the strong reference
count is greater than 1.
Returns the reference operand in the second result. The returned reference can
be used to mutate the object. Technically, the returned reference is the same
as the operand. But it's important that optimizations see the result as a
different SSA value than the operand. This is important to ensure the
correctness of ``ref_element_addr [immutable]``.
The operand is consumed and the second result is returned as owned.
The optional ``native`` attribute specifies that the operand has native Swift
reference counting.
end_cow_mutation
````````````````
::
sil-instruction ::= 'end_cow_mutation' '[keep_unique]'? sil-operand
%1 = end_cow_mutation %0 : $C
// $C must be a reference-counted type
// %1 will be of type C
Marks the end of the mutation of a reference counted object.
Returns the reference operand. Technically, the returned reference is the same
as the operand. But it's important that optimizations see the result as a
different SSA value than the operand. This is important to ensure the
correctness of ``ref_element_addr [immutable]``.
The operand is consumed and the result is returned as owned. The result is
guaranteed to be uniquely referenced.
The optional ``keep_unique`` attribute indicates that the optimizer must not
replace this reference with a not uniquely reference object.
is_escaping_closure
```````````````````
::
sil-instruction ::= 'is_escaping_closure' sil-operand
%1 = is_escaping_closure %0 : $@callee_guaranteed () -> ()
// %0 must be an escaping swift closure.
// %1 will be of type Builtin.Int1
Checks whether the context reference is not nil and bigger than one and returns
true if it is.
copy_block
``````````
::
sil-instruction :: 'copy_block' sil-operand
%1 = copy_block %0 : $@convention(block) T -> U
Performs a copy of an Objective-C block. Unlike retains of other
reference-counted types, this can produce a different value from the operand
if the block is copied from the stack to the heap.
copy_block_without_escaping
```````````````````````````
::
sil-instruction :: 'copy_block_without_escaping' sil-operand 'withoutEscaping' sil-operand
%1 = copy_block %0 : $@convention(block) T -> U withoutEscaping %1 : $T -> U
Performs a copy of an Objective-C block. Unlike retains of other
reference-counted types, this can produce a different value from the operand if
the block is copied from the stack to the heap.
Additionally, consumes the ``withoutEscaping`` operand ``%1`` which is the
closure sentinel. SILGen emits these instructions when it passes @noescape
swift closures to Objective C. A mandatory SIL pass will lower this instruction
into a ``copy_block`` and a ``is_escaping``/``cond_fail``/``destroy_value`` at
the end of the lifetime of the objective c closure parameter to check whether
the sentinel closure was escaped.
builtin "unsafeGuaranteed"
``````````````````````````
::
sil-instruction := 'builtin' '"unsafeGuaranteed"' '<' sil-type '>' '(' sil-operand')' ':' sil-type