docs/ABI/CallingConvention.rst - third_party/swift - Git at Google

 :orphan:

 .. _CallingConvention:

 The Swift Calling Convention
 ****************************

 .. contents::

 This whitepaper discusses the Swift calling convention, at least as we
 want it to be.

 It's a basic assumption in this paper that Swift shouldn't make an
 implicit promise to exactly match the default platform calling
 convention.  That is, if a C or Objective-C programmer manages to derive the
 address of a Swift function, we don't have to promise that an obvious
 translation of the type of that function will be correctly callable
 from C.  For example, this wouldn't be guaranteed to work::

   // In Swift:
   func foo(_ x: Int, y: Double) -> MyClass { ... }

   // In Objective-C:
   extern id _TF4main3fooFTSiSd_CS_7MyClass(intptr_t x, double y);

 We do sometimes need to be able to match C conventions, both to use
 them and to generate implementations of them, but that level of
 compatibility should be opt-in and site-specific.  If Swift would
 benefit from internally using a better convention than C/Objective-C uses,
 and switching to that convention doesn't damage the dynamic abilities
 of our target platforms (debugging, dtrace, stack traces, unwinding,
 etc.), there should be nothing preventing us from doing so.  (If we
 did want to guarantee compatibility on this level, this paper would be
 a lot shorter!)

 Function call rules in high-level languages have three major
 components, each operating on a different abstraction level:

 * the high-level semantics of the call (pass-by-reference
   vs. pass-by-value),

 * the ownership and validity conventions about argument and result
   values ("+0" vs. "+1", etc.), and

 * the "physical" representation conventions of how values are actually
   communicated between functions (in registers, on the stack, etc.).

 We'll tackle each of these in turn, then conclude with a detailed
 discussion of function signature lowering.

 High-level semantic conventions
 ===============================

 The major division in argument passing conventions between languages
 is between pass-by-reference and pass-by-value languages.  It's a
 distinction that only really makes sense in languages with the concept
 of an l-value, but Swift does, so it's pertinent.

 In general, the terms "pass-by-X" and "call-by-X" are used
 interchangeably.  It's unfortunate, because these conventions are
 argument specific, and functions can be passed multiple arguments
 that are each handled in a different way.  As such, we'll prefer
 "pass-by-X" for consistency and to emphasize that these conventions
 are argument-specific.

 Pass-by-reference
 -----------------

 In pass-by-reference (also called pass-by-name or pass-by-address), if
 `A` is an l-value expression, `foo(A)` is passed some sort of opaque
 reference through which the original l-value can be modified.  If `A`
 is not an l-value, the language may prohibit this, or (if
 pass-by-reference is the default convention) it may pass a temporary
 variable containing the result of `A`.

 Don't confuse pass-by-reference with the concept of a *reference
 type*.  A reference type is a type whose value is a reference to a
 different object; for example, a pointer type in C, or a class type in
 Java or Swift.  A variable of reference type can be passed by value
 (copying the reference itself) or by reference (passing the variable
 itself, allowing it to be changed to refer to a different object).
 Note that references in C++ are a generalization of pass-by-reference,
 not really a reference type; in C++, a variable of reference type
 behaves completely unlike any other variable in the language.

 Also, don't confuse pass-by-reference with the physical convention of
 passing an argument value indirectly.  In pass-by-reference, what's
 logically being passed is a reference to a tangible, user-accessible
 object; changes to the original object will be visible in the
 reference, and changes to the reference will be reflected in the
 original object.  In an indirect physical convention, the argument is
 still logically an independent value, no longer associated with the
 original object (if there was one).

 If every object in the language is stored in addressable memory,
 pass-by-reference can be easily implemented by simply passing the
 address of the object.  If an l-value can have more structure than
 just a single, independently-addressable object, more information may
 be required from the caller.  For example, an array argument in
 FORTRAN can be a row or column vector from a matrix, and so arrays are
 generally passed as both an address and a stride.  C and C++ do have
 unaddressable l-values because of bitfields, but they forbid passing
 bitfields by reference (in C++) or taking their address (in either
 language), which greatly simplifies pointer and reference types in
 those languages.

 FORTRAN is the last remaining example of a language that defaults to
 pass-by-reference.  Early FORTRAN implementations famously passed
 constants by passing the address of mutable global memory initialized
 to the constant; if the callee modified its parameter (illegal under
 the standard, but...), it literally changed the constant for future
 uses.  FORTRAN now allows procedures to explicitly take arguments by
 value and explicitly declare that arguments must be l-values.

 However, many languages do allow parameters to be explicitly marked as
 pass-by-reference.  As mentioned for C++, sometimes only certain kinds
 of l-values are allowed.

 Swift allows parameters to be marked as pass-by-reference with
 `inout`.  Arbitrary l-values can be passed.  The Swift convention is
 to always pass an address; if the parameter is not addressable, it
 must be materialized into a temporary and then written back.  See the
 accessors proposal for more details about the high-level semantics of
 `inout` arguments.

 Pass-by-value
 -------------

 In pass-by-value, if `A` is an l-value expression, `foo(A)` copies the
 current value there.  Any modifications `foo` makes to its parameter
 are made to this copy, not to the original l-value.

 Most modern languages are pass-by-value, with specific functions able
 to opt in to pass-by-reference semantics.  This is exactly what Swift
 does.

 There's not much room for variation in the high-level semantics of
 passing arguments by value; all the variation is in the ownership and
 physical conventions.

 Ownership transfer conventions
 ==============================

 Arguments and results that require cleanup, like an Objective-C object
 reference or a non-POD C++ object, raise two questions about
 responsibility: who is responsible for cleaning it up, and when?

 These questions arise even when the cleanup is explicit in code.  C's
 `strdup` function returns newly-allocated memory which the caller is
 responsible for freeing, but `strtok` does not.  Objective-C has
 standard naming conventions that describe which functions return
 objects that the caller is responsible for releasing, and outside of
 ARC these must be followed manually.  Of course, conventions designed
 to be implemented by programmers are often designed around the
 simplicity of that implementation, rather than necessarily being more
 efficient.

 Pass-by-reference arguments
 ---------------------------

 Pass-by-reference arguments generally don't involve a *transfer* of
 ownership.  It's assumed that the caller will ensure that the referent
 is valid at the time of the call, and that the callee will ensure that
 the referent is still valid at the time of return.

 FORTRAN does actually allow parameters to be tagged as out-parameters,
 where the caller doesn't guarantee the validity of the argument before
 the call.  Objective-C has something similar, where an indirect method
 argument can be marked `out`; ARC takes advantage of this with
 autoreleasing parameters to avoid a copy into the writeback temporary.
 Neither of these are something we semantically care about supporting
 in Swift.

 There is one other theoretically interesting convention question here:
 the argument has to be valid before the call and after the call, but
 does it have to valid during the call?  Swift's answer to this is
 generally "yes".  Swift does have `inout` aliasing rules that allow a
 certain amount of optimization, but the compiler is forbidden from
 exploiting these rules in any way that could cause memory corruption
 (at least in the absence of race conditions).  So Swift has to ensure
 that an `inout` argument is valid whenever it does something
 (including calling an opaque function) that could potentially access
 the original l-value.

 If Swift allowed local variables to be captured through `inout`
 parameters, and therefore needed to pass an implicit owner parameter
 along with an address, this owner parameter would behave like a
 pass-by-value argument and could use any of the conventions listed
 below.  However, the optimal convention for this is obvious: it should
 be `guaranteed`, since captures are very unlikely and callers are
 almost always expected to use the value of an `inout` variable
 afterwards.

 Pass-by-value arguments
 -----------------------

 All conventions for this have performance trade-offs.

 We're only going to discuss *static* conventions, where the transfer
 is picked at compile time.  It's possible to have a *dynamic*
 convention, where the caller passes a flag indicating whether it's
 okay to directly take responsibility for the value, and the callee can
 (conceptually) return a flag indicating whether it actually did take
 responsibility for it.  If copying is extremely expensive, that can be
 worthwhile; otherwise, the code cost may overwhelm any other benefits.

 This discussion will ignore one particular impact of these conventions
 on code size.  If a function has many callers, conventions that
 require more code in the caller are worse, all else aside.  If a
 single call site has many possible targets, conventions that require
 more code in the callee are worse, all else aside.  It's not really
 reasonable to decide this in advance for unknown code; we could maybe
 make rules about code calling system APIs, except that system APIs are
 by definition locked down, and we can't change them.  It's a
 reasonable thing to consider changing with PGO, though.

 Responsibility
 ~~~~~~~~~~~~~~

 A common refrain in this performance analysis will be whether a
 function has responsibility for a value.  A function has to get a
 value from *somewhere*:

 * A caller is usually responsible for the return values it receives:
   the callee generated the value and the caller is responsible for
   destroying it.  Any other convention has to rely on heavily
   restricting what kind of value can be returned.  (If you're thinking
   about Objective-C autoreleased results, just accept this for now;
   we'll talk about that later.)

 * A function isn't necessarily responsible for a value it loads from
   memory.  Ignoring race conditions, the function may be able to
   immediately use the value without taking any specific action to keep
   it valid.

 * A callee may or may not be responsible for a value passed as a
   parameter, depending on the convention it was passed with.

 * A function might come from a source that doesn't necessarily make
   the function responsible, but if the function takes an action which
   invalidates the source before using the value, the function has to
   take action to keep the value valid.  At that point, the function
   has responsibility for the value despite its original source.

   For example, a function `foo()` might load a reference `r` from a
   global variable `x`, call an unknown function `bar()`, and then use
   `r` in some way.  If `bar()` can't possibly overwrite `x`, `foo()`
   doesn't have to do anything to keep `r` alive across the call;
   otherwise it does (e.g. by retaining it in a refcounted
   environment).  This is a situation where humans are often much
   smarter than compilers.  Of course, it's also a situation where
   humans are sometimes insufficiently conservative.

 A function may also require responsibility for a value as part of its
 operation:

 * Since a variable is always responsible for the current value it
   stores, a function which stores a value into memory must first gain
   responsibility for that value.

 * A callee normally transfers responsibility for its return value to
   its caller; therefore it must gain responsibility for its return
   value before returning it.

 * A caller may need to gain responsibility for a value before passing
   it as an argument, depending on the parameter's ownership-transfer
   convention.

 Known conventions
 ~~~~~~~~~~~~~~~~~

 There are three static parameter conventions for ownership worth
 considering here:

 * The caller may transfer responsibility for the value to the callee.
   In SIL, we call this an **owned** parameter.

   This is optimal if the caller has responsibility for the value and
   doesn't need it after the call.  This is an extremely common
   situation; for example, it comes up whenever a call result is
   immediately used as an argument.  By giving the callee responsibility
   for the value, this convention allows the callee to use the value at
   a later point without taking any extra action to keep it alive.

   The flip side is that this convention requires a lot of extra work
   when a single value is used multiple times in the caller.  For
   example, a value passed in every iteration of a loop will need to be
   copied/retained/whatever each time.

 * The caller may provide the value without any responsibility on
   either side.  In SIL, we call this an **unowned** parameter.  The
   value is guaranteed to be valid at the moment of the call, and in
   the absence of race conditions, that guarantee can be assumed to
   continue unless the callee does something that might invalidate it.
   As discussed above, humans are often much smarter than computers
   about knowing when that's possible.

   This is optimal if the caller can acquire the value without
   responsibility and the callee doesn't require responsibility of it.
   In very simple code --- e.g., loading values from an array and
   passing them to a comparator function which just reads a few fields
   from each and returns --- this can be extremely efficient.

   Unfortunately, this convention is completely undermined if either
   side has to do anything that forces it to take action to keep the
   value alive.  Also, if that happens on the caller side, the
   convention can keep values alive longer than is necessary.  It's
   very easy for both sides of the convention to end up doing extra
   work because of this.

 * The caller may assert responsibility for the value.  In SIL, we call
   this a **guaranteed** parameter.  The callee can rely on the value
   staying valid for the duration of the call.

   This is optimal if the caller needs to use the value after the call
   and either has responsibility for it or has a guarantee like this
   for it.  Therefore, this convention is particularly nice when a
   value is likely to be forwarded by value a great deal.

   However, this convention does generally keep values alive longer
   than is necessary, since the outermost function which passed it as
   an argument will generally be forced to hold a reference for the
   duration.  By the same mechanism, in refcounted systems, this
   convention tends to cause values to have multiple retains active at
   once; for example, if a copy-on-write array is created in one
   function, passed to another, stored in a mutable variable, and then
   modified, the callee will see a reference count of 2 and be forced
   to do a structural copy.  This can occur even if the caller
   literally constructed the array for the sole and immediate purpose
   of passing it to the callee.

 Analysis
 ~~~~~~~~

 Objective-C generally uses the unowned convention for object-pointer
 parameters.  It is possible to mark a parameter as being consumed,
 which is basically the owned convention.  As a special case, in ARC we
 assume that callers are responsible for keeping `self` values alive
 (including in blocks), which is effectively the `guaranteed`
 convention.

 `unowned` causes a lot of problems without really solving any, in my
 experience looking at ARC-generated code and optimizer output.  A
 human can take advantage of it, but the compiler is so frequently
 blocked.  There are many common idioms (like chains of functions that
 just add default arguments at each step) have really awful performance
 because the compiler is adding retains and releases at every single
 level.  It's just not a good convention to adopt by default.  However,
 we might want to consider allowing specific function parameters to opt
 into it; sort comparators are a particularly interesting candidate
 for this.  `unowned` is very similar to C++'s `const &` for things
 like that.

 `guaranteed` is good for some things, but it causes a lot of silly
 code bloat when values are really only used in one place, which is
 quite common.  The liveness / refcounting issues are also pretty
 problematic.  But there is one example that's very nice for
 `guaranteed`: `self`.  It's quite common for clients of a type to call
 multiple methods on a single value, or for methods to dispatch to
 multiple other methods, which are exactly the situations where
 `guaranteed` excels.  And it's relatively uncommon (but not
 unimaginable) for a non-mutating method on a copy-on-write struct to
 suddenly store `self` aside and start mutating that copy.

 `owned` is a good default for other parameters.  It has some minor
 performance disadvantages (unnecessary retains if you have an
 unoptimizable call in a loop) and some minor code size benefits (in
 common straight-line code), but frankly, both of those points pale in
 importance to the ability to transfer copy-on-write structures around
 without spuriously increasing reference counts.  It doesn't take too
 many unnecessary structural copies before any amount of
 reference-counting traffic (especially the Swift-native
 reference-counting used in copy-on-write structures) is basically
 irrelevant in comparison.

 Result values
 -------------

 There's no major semantic split in result conventions like that
 between pass-by-reference and pass-by-value.  In most languages, a
 function has to return a value (or nothing).  There are languages like
 C++ where functions can return references, but that's inherently
 limited, because the reference has to refer to something that exists
 outside the function.  If Swift ever adds a similar language
 mechanism, it'll have to be memory-safe and extremely opaque, and
 it'll be easy to just think of that as a kind of weird value result.
 So we'll just consider value results here.

 Value results raise some of the same ownership-transfer questions as
 value arguments.  There's one major limitation: just like a
 by-reference result, an actual `unowned` convention is inherently
 limited, because something else other than the result value must be
 keeping it valid.  So that's off the table for Swift.

 What Objective-C does is something more dynamic.  Most APIs in
 Objective-C give you a very ephemeral guarantee about the validity of
 the result: it's valid now, but you shouldn't count on it being valid
 indefinitely later.  This might be because the result is actually
 owned by some other object somewhere, or it might be because the
 result has been placed in the autorelease pool, a thread-local data
 structure which will (when explicitly drained by something up the call
 chain) eventually release that's been put into it.  This autorelease
 pool can be a major source of spurious memory growth, and in classic
 manual reference-counting it was important to drain it fairly
 frequently.  ARC's response to this convention was to add an
 optimization which attempts to prevent things from ending up in the
 autorelease pool; the net effect of this optimization is that ARC ends
 up with an owned reference regardless of whether the value was
 autoreleased.  So in effect, from ARC's perspective, these APIs still
 return an owned reference, mediated through some extra runtime calls
 to undo the damage of the convention.

 So there's really no compelling alternative to an owned return
 convention as the default in Swift.

 Physical conventions
 ====================

 The lowest abstraction level for a calling convention is the actual
 "physical" rules for the call:

 * where the caller should place argument values in registers and
   memory before the call,

 * how the callee should pass back the return values in registers
   and/or memory after the call, and

 * what invariants hold about registers and memory over the call.

 In theory, all of these could be changed in the Swift ABI.  In
 practice, it's best to avoid changes to the invariant rules, because
 those rules could complicate Swift-to-C interoperation:

 * Assuming a higher stack alignment would require dynamic realignment
   whenever Swift code is called from C.

 * Assuming a different set of callee-saved registers would require
   additional saves and restores when either Swift code calls C or is
   called from C, depending on the exact change.  That would then
   inhibit some kinds of tail call.

 So we will limit ourselves to considering the rules for allocating
 parameters and results to registers.  Our platform C ABIs are usually
 quite good at this, and it's fair to ask why Swift shouldn't just use
 C's rules.  There are three general answers:

 * Platform C ABIs are specified in terms of the C type system, and the
   Swift type system allows things to be expressed which don't have
   direct analogues in C (for example, enums with payloads).

 * The layout of structures in Swift does not necessarily match their
   layout in C, which means that the C rules don't necessarily cover
   all the cases in Swift.

 * Swift places a larger emphasis on first-class structs than C does.
   C ABIs often fail to allocate even small structs to registers, or
   use inefficient registers for them, and we would like to be somewhat
   more aggressive than that.

 Accordingly, the Swift ABI is defined largely in terms of lowering: a
 Swift function signature is translated to a C function signature with
 all the aggregate arguments and results eliminated (possibly by
 deciding to pass them indirectly).  This lowering will be described in
 detail in the final section of this whitepaper.

 However, there are some specific circumstances where we'd like to
 deviate from the platform ABI:

 Aggregate results
 -----------------

 As mentioned above, Swift puts a lot of focus on first-class value
 types.  As part of this, it's very valuable to be able to return
 common value types fully in registers instead of indirectly.  The
 magic number here is three: it's very common for copy-on-write value
 types to want about three pointers' worth of data, because that's just
 enough for some sort of owner pointer plus a begin/end pair.

 Unfortunately, many common C ABIs fall slightly short of that.  Even
 those ABIs that do allow small structs to be returned in registers
 tend to only allow two pointers' worth.  So in general, Swift would
 benefit from a very slightly-tweaked calling convention that allocates
 one or two more registers to the result.

 Implicit parameters
 -------------------

 There are several language features in Swift which require implicit
 parameters:

 Closures
 ~~~~~~~~

 Swift's function types are "thick" by default, meaning that a function
 value carries an optional context object which is implicitly passed to
 the function when it is called.  This context object is
 reference-counted, and it should be passed `guaranteed` for
 straightforward reasons:

 * It's not uncommon for closures to be called many times, in which
   case an `owned` convention would be unnecessarily expensive.

 * While it's easy to imagine a closure which would want to take
   responsibility for its captured values, giving it responsibility for
   a retain of the context object doesn't generally allow that.  The
   closure would only be able to take ownership of the captured values
   if it had responsibility for a *unique* reference to the context.
   So the closure would have to be written to do different things based
   on the uniqueness of the reference, and it would have to be able to
   tear down and deallocate the context object after stealing values
   from it.  The optimization just isn't worth it.

 * It's usually straightforward for the caller to guarantee the
   validity of the context reference; worst case, a single extra
   Swift-native retain/release is pretty cheap.  Meanwhile, not having
   that guarantee would force many closure functions to retain their
   contexts, since many closures do multiple things with values from
   the context object.  So `unowned` would not be a good convention.

 Many functions don't actually need a context, however; they are
 naturally "thin".  It would be best if it were possible to construct a
 thick function directly from a thin function without having to
 introduce a thunk just to move parameters around the missing context
 parameter.  In the worst case, a thunk would actually require the
 allocation of a context object just to store the original function
 pointer; but that's only necessary when converting from a completely
 opaque function value.  When the source function is known statically,
 which is far more likely, the thunk can just be a global function
 which immediately calls the target with the correctly shuffled
 arguments.  Still, it'd be better to be able to avoid creating such
 thunks entirely.

 In order to reliably avoid creating thunks, it must be possible for
 code invoking an opaque thick function to pass the context pointer in
 a way that can be safely and implicitly ignored if the function
 happens to actually be thin.  There are two ways to achieve this:

 * The context can be passed as the final parameter.  In most C calling
   conventions, extra arguments can be safely ignored; this is because
   most C calling conventions support variadic arguments, and such
   conventions inherently can't rely on the callee knowing the extent
   of the arguments.

   However, this is sub-optimal because the context is often used
   repeatedly in a closure, especially at the beginning, and putting it
   at the end of the argument list makes it more likely to be passed on
   the stack.

 * The context can be passed in a register outside of the normal
   argument sequence.  Some ABIs actually even reserve a register for
   this purpose; for example, on x86-64 it's `%r10`.  Neither of the
   ARM ABIs do, however.

 Having an out-of-band register would be the best solution.

 (Surprisingly, the ownership transfer convention for the context
 doesn't actually matter here.  You might think that an `owned`
 convention would be prohibited, since the callee would fail to release
 the context and would therefore leak it.  However, a thin function
 should always have a `nil` context, so this would be harmless.)

 Either solution works acceptably with curried partial application,
 since the inner parameters can be left in place while transforming the
 context into the outer parameters.  However, an `owned` convention
 would either prevent the uncurrying forwarder from tail-calling the
 main function or force all the arguments to be spilled.  Neither is
 really acceptable; one more argument against an `owned` convention.
 (This is another example where `guaranteed` works quite nicely, since
 the guarantees are straightforward to extend to the main function.)

 `self`
 ~~~~~~

 Methods (both static and instance) require a `self` parameter.  In all
 of these cases, it's reasonable to expect that `self` will used
 frequently, so it's best to pass it in a register.  Also, many methods
 call other methods on the same object, so it's also best if the
 register storing `self` is stable across different method signatures.

 In static methods on value types, `self` doesn't require any dynamic
 information: there's only one value of the metatype, and there's
 usually no point in passing it.

 In static methods on class types, `self` is a reference to the class
 metadata, a single pointer.  This is necessary because it could
 actually be the class object of a subclass.

 In instance methods on class types, `self` is a reference to the
 instance, again a single pointer.

 In mutating instance methods on value types, `self` is the address of
 an object.

 In non-mutating instance methods on value types, `self` is a value; it
 may require multiple registers, or none, or it may need to be passed
 indirectly.

 All of these cases except mutating instance methods on value types can
 be partially applied to create a function closure whose type is the
 formal type of the method.  That is, if class `A` has a method
 declared `func foo(_ x: Int) -> Double`, then `A.foo` yields a function
 of type `(Int) -> Double`.  Assuming that we continue to feel that
 this is a useful language feature, it's worth considered how we could
 support it efficiently.  The expenses associated with a partial
 application are (1) the allocation of a context object and (2) needing
 to introduce a thunk to forward to the original function.  All else
 aside, we can avoid the allocation if the representation of `self` is
 compatible with the representation of a context object reference; this
 is essentially true only if `self` is a class instance using Swift
 reference counting.  Avoiding the thunk is possible only if we
 successfully avoided the allocation (since otherwise a thunk is
 required in order to extract the correct `self` value from the
 allocated context object) and `self` is passed in exactly the same
 manner as a closure context would be.

 It's unclear whether making this more efficient would really be
 worthwhile on its own, but if we do support an out-of-band context
 parameter, taking advantage of it for methods is essentially trivial.

 Error handling
 --------------

 The calling convention implications of Swift's error handling design
 aren't yet settled.  It may involve extra parameters; it may involve
 extra return values.  Considerations:

 * Callers will generally need to immediately check for an error.
   Being able to quickly check a register would be extremely
   convenient.

 * If the error is returned as a component of the result value, it
   shouldn't be physically combined with the normal result.  If the
   normal result is returned in registers, it would be unfortunate to
   have to do complicated logic to test for error.  If the normal
   result is returned indirectly, contorting the indirect result with
   the error would likely prevent the caller from evaluating the call
   in-place.

 * It would be very convenient to be able to trivially turn a function
   which can't produce an error into a function which can.  This is an
   operation that we expect higher-order code to have do frequently, if
   it isn't completely inlined away.  For example::

     // foo() expects its argument to follow the conventions of a
     // function that's capable of throwing.
     func foo(_ fn: () throws -> ()) throwsIf(fn)

     // Here we're passing foo() a function that can't throw; this is
     // allowed by the subtyping rules of the language.  We'd like to be
     // able to do this without having to introduce a thunk that maps
     // between the conventions.
     func bar(_ fn: () -> ()) {
       foo(fn)
     }

 We'll consider two ways to satisfy this.

 The first is to pass a pointer argument that doesn't interfere with
 the normal argument sequence.  The caller would initialize the memory
 to a zero value.  If the callee is a throwing function, it would be
 expected to write the error value into this argument; otherwise, it
 would naturally ignore it.  Of course, the caller then has to load
 from memory to see whether there's an error.  This would also either
 consume yet another register not in the normal argument sequence or
 have to be placed at the end of the argument list, making it more
 likely to be passed on the stack.

 The second is basically the same idea, but using a register that's
 otherwise callee-save.  The caller would initialize the register to a
 zero value.  A throwing function would write the error into it; a
 non-throwing function would consider it callee-save and naturally
 preserve it.  It would then be extremely easy to check it for an
 error.  Of course, this would take away a callee-save register in the
 caller when calling throwing functions.  Also, if the caller itself
 isn't throwing, it would have to save and restore that register.

 Both solutions would allow tail calls, and the zero store could be
 eliminated for direct calls to known functions that can throw.  The
 second is the clearly superior solution, but definitely requires more
 work in the backend.

 Default argument generators
 ---------------------------

 By default, Swift is resilient about default arguments and treats them
 as essentially one part of the implementation of the function.  This
 means that, in general, a caller using a default argument must call a
 function to emit the argument, instead of simply inlining that
 emission directly into the call.

 These default argument generation functions are unlike any other
 because they have very precise information about how their result will
 be used: it will be placed into a specific position in specific
 argument list.  The only reason the caller would ever want to do
 anything else with the result is if it needs to spill the value before
 emitting the call.

 Therefore, in principle, it would be really nice if it were possible
 to tell these functions to return in a very specific way, e.g. to
 return two values in the second and third argument registers, or to
 return a value at a specific location relative to the stack pointer
 (although this might be excessively constraining; it would be
 reasonable to simply opt into an indirect return instead).  The
 function should also preserve earlier argument registers (although
 this could be tricky if the default argument generator is in a generic
 context and therefore needs to be passed type-argument information).

 This enhancement is very easy to postpone because it doesn't affect
 any basic language mechanics.  The generators are always called
 directly, and they're inherently attached to a declaration, so it's
 quite easy to take any particular generator and compatibly enhance it
 with a better convention.

 ARM32
 -----

 Most of the platforms we support have pretty good C calling
 conventions.  The exceptions are i386 (for the iOS simulator) and
 ARM32 (for iOS).  We really, really don't care about i386, but iOS on
 ARM32 is still an important platform.  Switching to a better physical
 calling convention (only for calls from Swift to Swift, of course)
 would be a major improvement.

 It would be great if this were as simple as flipping a switch, but
 unfortunately the obvious convention to switch to (AAPCS-VFP) has a
 slightly different set of callee-save registers: iOS treats `r9` as a
 scratch register.  So we'd really want a variant of AAPCS-VFP that did
 the same.  We'd also need to make sure that SJ/LJ exceptions weren't
 disturbed by this calling convention; we aren't really *supporting*
 exception propagation through Swift frames, but completely breaking
 propagation would be unfortunate, and we may need to be able to
 *catch* exceptions.

 So this would also require some amount of additional support from the
 backend.

 Function signature lowering
 ===========================

 Function signatures in Swift are lowered in two phases.

 Semantic lowering
 -----------------

 The first phase is a high-level semantic lowering, which does a number
 of things:

 * It determines a high-level calling convention: specifically, whether
   the function must match the C calling convention or the Swift
   calling convention.

 * It decides the types of the parameters:

   * Functions exported for the purposes of C or Objective-C may need
     to use bridged types rather than Swift's native types.  For
     example, a function that formally returns Swift's `String` type
     may be bridged to return an `NSString` reference instead.

   * Functions which are values, not simply immediately called, may
     need their types lowered to follow to match a specific generic
     abstraction pattern.  This applies to functions that are
     parameters or results of the outer function signature.

 * It identifies specific arguments and results which *must* be passed
   indirectly:

   * Some types are inherently address-only:

     * The address of a weak reference must be registered with the
       runtime at all times; therefore, any `struct` with a weak field
       must always be passed indirectly.

     * An existential type (if not class-bounded) may contain an
       inherently address-only value, or its layout may be sensitive to
       its current address.

     * A value type containing an inherently address-only type as a
       field or case payload becomes itself inherently address-only.

   * Some types must be treated as address-only because their layout is
     not known statically:

     * The layout of a resilient value type may change in a later
       release; the type may even become inherently address-only by
       adding a weak reference.

     * In a generic context, the layout of a type may be dependent on a
       type parameter.  The type parameter might even be inherently
       address-only at runtime.

     * A value type containing a type whose layout isn't known
       statically itself generally will not have a layout that can be
       known statically.

   * Other types must be passed or returned indirectly because the
     function type uses an abstraction pattern that requires it.  For
     example, a generic `map` function expects a function that takes a
     `T` and returns a `U`; the generic implementation of `map` will
     expect these values to be passed indirectly because their layout
     isn't statically known.  Therefore, the signature of a function
     intended to be passed as this argument must pass them indirectly,
     even if they are actually known statically to be non-address-only
     types like (e.g.) `Int` and `Float`.

 * It expands tuples in the parameter and result types.  This is done
   at this level both because it is affected by abstraction patterns
   and because different tuple elements may use different ownership
   conventions.  (This is most likely for imported APIs, where it's the
   tuple elements that correspond to specific C or Objective-C parameters.)

   This completely eliminates top-level tuple types from the function
   signature except when they are a target of abstraction and thus are
   passed indirectly.  (A function with type `(Float, Int) -> Float`
   can be abstracted as `(T) -> U`, where `T == (Float, Int)`.)

 * It determines ownership conventions for all parameters and results.

 After this phase, a function type consists of an abstract calling
 convention, a list of parameters, and a list of results.  A parameter
 is a type, a flag for indirectness, and an ownership convention.  A
 result is a type, a flag for indirectness, and an ownership
 convention.  (Results need ownership conventions only for non-Swift
 calling conventions.)  Types will not be tuples unless they are
 indirect.

 Semantic lowering may also need to mark certain parameters and results
 as special, for the purposes of the special-case physical treatments
 of `self`, closure contexts, and error results.

 Physical lowering
 -----------------

 The second phase of lowering translates a function type produced by
 semantic lowering into a C function signature.  If the function
 involves a parameter or result with special physical treatment,
 physical lowering initially ignores this value, then adds in the
 special treatment as agreed upon with the backend.

 General expansion algorithm
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Central to the operation of the physical-lowering algorithm is the
 **generic expansion algorithm**.  This algorithm turns any
 non-address-only Swift type in a sequence of zero or more **legal
 type**, where a legal type is either:

 * an integer type, with a power-of-two size no larger than the maximum
   integer size supported by C on the target,

 * a floating-point type supported by the target, or

 * a vector type supported by the target.

 Obviously, this is target-specific.  The target also specifies a
 maximum voluntary integer size.  The legal type sequence only contains
 vector types or integer types larger than the maximum voluntary size
 when the type was explicit in the input.

 Pointers are represented as integers in the legal type sequence.  We
 assume there's never a reason to differentiate them in the ABI as long
 as the effect of address spaces on pointer size is taken into account.
 If that's not true, this algorithm should be adjusted.

 The result of the algorithm also associates each legal type with an
 offset.  This information is sufficient to reconstruct an object in
 memory from a series of values and vice-versa.

 The algorithm proceeds in two steps.

 Typed layouts
 ^^^^^^^^^^^^^

 First, the type is recursively analyzed to produce a **typed layout**.
 A typed layout associates ranges of bytes with either (1) a legal type
 (whose storage size must match the size of the associated byte
 range), (2) the special type **opaque**, or (3) the special type
 **empty**.  Adjacent ranges mapped to **opaque** or **empty** can be
 combined.

 For most of the types in Swift, this process is obvious: they either
 correspond to an obvious legal type (e.g. thick metatypes are
 pointer-sized integers), or to an obvious sequence of scalars
 (e.g. class existentials are a sequence of pointer-sized integers).
 Only a few cases remain:

 * Integer types that are not legal types should be mapped as opaque.

 * Vector types that are not legal types should be broken into smaller
   vectors, if their size is an even multiple of a legal vector type,
   or else broken into their components.  (This rule may need some
   tinkering.)

 * Tuples and structs are mapped by merging the typed layouts of the
   fields, as padded out to the extents of the aggregate with
   empty-mapped ranges.  Note that, if fields do not overlap, this is
   equivalent to concatenating the typed layouts of the fields, in
   address order, mapping internal padding to empty.  Bit-fields should
   map the bits they occupy to opaque.

   For example, given the following struct type::

     struct FlaggedPair {
       var flag: Bool
       var pair: (MyClass, Float)
     }

   If Swift performs naive, C-like layout of this structure, and this
   is a 64-bit platform, typed layout is mapped as follows::

     FlaggedPair.flag := [0: i1,                        ]
     FlaggedPair.pair := [       8-15: i64, 16-19: float]
     FlaggedPair      := [0: i1, 8-15: i64, 16-19: float]

   If Swift instead allocates `flag` into the spare (little-endian) low
   bits of `pair.0`, the typed layout map would be::

     FlaggedPair.flag := [0: i1                   ]
     FlaggedPair.pair := [0-7: i64,    8-11: float]
     FlaggedPair      := [0-7: opaque, 8-11: float]

 * Unions (imported from C) are mapped by merging the typed layouts of
   the fields, as padded out to the extents of the aggregate with
   empty-mapped ranges.  This will often result in a fully-opaque
   mapping.

 * Enums are mapped by merging the typed layouts of the cases, as
   padded out to the extents of the aggregate with empty-mapped ranges.
   A case's typed layout consists of the typed layout of the case's
   directly-stored payload (if any), merged with the typed layout for
   its discriminator.  We assume that checking for a discriminator
   involves a series of comparisons of bits extracted from
   non-overlapping ranges of the value; the typed layout of a
   discriminator maps all these bits to opaque and the rest to empty.

   For example, given the following enum type::

     enum Sum {
       case Yes(MyClass)
       case No(Float)
       case Maybe
     }

   If Swift, in its infinite wisdom, decided to lay this out
   sequentially, and to use invalid pointer values the class to
   indicate that the other cases are present, the layout would look as
   follows::

     Sum.Yes.payload        := [0-7: i64                ]
     Sum.Yes.discriminator  := [0-7: opaque             ]
     Sum.Yes                := [0-7: opaque             ]
     Sum.No.payload         := [             8-11: float]
     Sum.No.discriminator   := [0-7: opaque             ]
     Sum.No                 := [0-7: opaque, 8-11: float]
     Sum.Maybe              := [0-7: opaque             ]
     Sum                    := [0-7: opaque, 8-11: float]

   If Swift instead chose to just use a discriminator byte, the layout
   would look as follows::

     Sum.Yes.payload        := [0-7: i64             ]
     Sum.Yes.discriminator  := [            8: opaque]
     Sum.Yes                := [0-7: i64,   8: opaque]
     Sum.No.payload         := [0-3: float           ]
     Sum.No.discriminator   := [            8: opaque]
     Sum.No                 := [0-3: float, 8: opaque]
     Sum.Maybe              := [            8: opaque]
     Sum                    := [0-8: opaque          ]

   If Swift chose to use spare low (little-endian) bits in the class
   pointer, and to offset the float to make this possible, the layout
   would look as follows::

     Sum.Yes.payload        := [0-7: i64             ]
     Sum.Yes.discriminator  := [0: opaque            ]
     Sum.Yes                := [0-7: opaque          ]
     Sum.No.payload         := [           4-7: float]
     Sum.No.discriminator   := [0: opaque            ]
     Sum.No                 := [0: opaque, 4-7: float]
     Sum.Maybe              := [0: opaque            ]
     Sum                    := [0-7: opaque          ]

 The merge algorithm for typed layouts is as follows.  Consider two
 typed layouts `L` and `R`.  A range from `L` is said to *conflict*
 with a range from `R` if they intersect and they are mapped as
 different non-empty types.  If two ranges conflict, and either range
 is mapped to a vector, replace it with mapped ranges for the vector
 elements.  If two ranges conflict, and neither range is mapped to a
 vector, map them both to opaque, combining them with adjacent opaque
 ranges as necessary.  If a range is mapped to a non-empty type, and
 the bytes in the range are all mapped as empty in the other map, add
 that range-mapping to the other map.  `L` and `R` should now match
 perfectly; this is the result of the merge.  Note that this algorithm
 is both associative and commutative.

 Forming a legal type sequence
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Once the typed layout is constructed, it can be turned into a legal
 type sequence.

 Note that this transformation is sensitive to the offsets of ranges in
 the complete type.  It's possible that the simplifications described
 here could be integrated directly into the construction of the typed
 layout without changing the results, but that's not yet proven.

 In all of these examples, the maximum voluntary integer size is 4
 (`i32`) unless otherwise specified.

 If any range is mapped as a non-empty, non-opaque type, but its start
 offset is not a multiple of its natural alignment, remap it as opaque.
 For these purposes, the natural alignment of an integer type is the
 minimum of its size and the maximum voluntary integer size; the
 natural alignment of any other type is its C ABI type.  Combine
 adjacent opaque ranges.

 For example::

   [1-2: i16, 4: i8, 6-7: i16]  ==>  [1-2: opaque, 4: i8, 6-7: i16]

 If any range is mapped as an integer type that is not larger than the
 maximum voluntary size, remap it as opaque.  Combine adjacent opaque
 ranges.

 For example::

   [1-2: opaque, 4: i8, 6-7: i16]  ==>  [1-2: opaque, 4: opaque, 6-7: opaque]
   [0-3: i32, 4-11: i64, 12-13: i16]  ==>  [0-3: opaque, 4-11: i64, 12-13: opaque]

 An *aligned storage unit* is an N-byte-aligned range of N bytes, where
 N is a power of 2 no greater than the maximum voluntary integer size.
 A *maximal* aligned storage unit has a size equal to the maximum
 voluntary integer size.

 Note that any remaining ranges mapped as integers must fully occupy
 multiple maximal aligned storage units.

 Split all opaque ranges at the boundaries of maximal aligned storage
 units.  From this point on, never combine adjacent opaque ranges
 across these boundaries.

 For example::

   [1-6: opaque]  ==> [1-3: opaque, 4-6: opaque]

 Within each maximal aligned storage unit, find the smallest aligned
 storage unit which contains all the opaque ranges.  Replace the first
 opaque range in the maximal aligned storage unit with a mapping from
 that aligned storage unit to an integer of the aligned storage unit's
 size.  Remove any other opaque ranges in the maximal aligned storage
 unit.  Note that this can create overlapping ranges in some cases.
 For the purposes of this calculation, the last maximal aligned
 storage unit should be considered "full", as if the type had an
 infinite amount of empty tail-padding.

 For example::

   [1-2: opaque]  ==>  [0-3: i32]
   [0-1: opaque]  ==>  [0-1: i16]
   [0: opaque, 2: opaque]  ==>  [0-3: i32]
   [0-9: fp80, 10: opaque]  ==>  [0-9: fp80, 10: i8]

   // If maximum voluntary size is 8 (i64):
   [0-9: fp80, 11: opaque, 13: opaque]  ==>  [0-9: fp80, 8-15: i64]

 (This assumes that `fp80` is a legal type for illustrative purposes.
 It would probably be a better policy for the actual x86-64 target to
 consider it illegal and treat it as opaque from the start, at least
 when lowering for the Swift calling convention; for C, it is important
 to produce an `fp80` mapping for ABI interoperation with C functions
 that take or return `long double` by value.)

 The final legal type sequence is the sequence of types for the
 non-empty ranges in the map.  The associated offset for each type is
 the offset of the start of the corresponding range.

 Only the final step can introduce overlapping ranges, and this is only
 possible if there's a non-integer legal type which:

 * has a natural alignment less than half of the size of the maximum
   voluntary integer size or

 * has a store size is not a multiple of half the size of the maximum
   voluntary integer size.

 On our supported platforms, these conditions are only true on x86-64,
 and only of `long double`.

 Deconstruction and Reconstruction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Given the address of an object and a legal type sequence for its type,
 it's straightforward to load a valid sequence or store the sequence
 back into memory.  For the most part, it's sufficient to simply load
 or store each value at its appropriate offset.  There are two
 subtleties:

 * If the legal type sequence had any overlapping ranges, the integer
   values should be stored first to prevent overwriting parts of the
   other values they overlap.

 * Care must be taken with the final values in the sequence; integer
   values may extend slightly beyond the ordinary storage size of the
   argument type.  This is usually easy to compensate for.

 The value sequence essentially has the same semantics that the value
 in memory would have: any bits that aren't part of the actual
 representation of the original type have a completely unspecified
 value.

 Forming a C function signature
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 As mentioned before, in principle the process of physical lowering
 turns a semantically-lowered Swift function type (in implementation
 terms, a SILFunctionType) into a C function signature, which can then
 be lowered according to the usual rules for the ABI.  This is, in
 fact, what we do when trying to match a C calling convention.
 However, for the native Swift calling convention, because we actively
 want to use more aggressive rules for results, we instead build an
 LLVM function type directly.  We first construct a direct result type
 that we're certain the backend knows how to interpret according to our
 more aggressive desired rules, and then we use the expansion algorithm
 to construct a parameter sequence consisting solely of types with
 obvious ABI lowering that the backend can reliably handle.  This
 bypasses the need to consult Clang for our own native calling
 convention.

 We have this generic expansion algorithm, but it's important to
 understand that the physical lowering process does not just naively
 use the results of this algorithm.  The expansion algorithm will
 happily expand an arbitrary structure; if that structure is very
 large, the algorithm might turn it into hundreds of values.  It would
 be foolish to pass it as an argument that way; it would use up all the
 argument registers and basically turn into a very inefficient memcpy,
 and if the caller wanted it all in one place, they'd have to very
 painstakingly reassemble.  It's much better to pass large structures
 indirectly.  And with result values, we really just don't have a
 choice; there's only so many registers you can use before you have to
 give up and return indirectly.  Therefore, even in the Swift native
 convention, the expansion algorithm is basically used as a first pass.
 A second pass then decides whether the expanded sequence is actually
 reasonable to pass directly.

 Recall that one aspect of the semantically-lowered Swift function type
 is whether we should be matching the C calling convention or not.  The
 following algorithm here assumes that the importer and semantic
 lowering have conspired in a very particular way to make that
 possible.  Specifically, we assume is that an imported C function
 type, lowered semantically by Swift, will follow some simple
 structural rules:

 * If there was a by-value `struct` or `union` parameter or result in
   the imported C type, it will correspond to a by-value direct
   parameter or return type in Swift, and the Swift type will be a
   nominal type whose declaration links back to the original C
   declaration.

 * Any other parameter or result will be transformed by the importer
   and semantic lowering to a type that the generic expansion algorithm
   will expand to a single legal type whose representation is
   ABI-compatible with the original parameter.  For example, an
   imported pointer type will eventually expand to an integer of
   pointer size.

 * There will be at most one result in the lowered Swift type, and it
   will be direct.

 Given this, we go about lowering the function type as follows.  Recall
 that, when matching the C calling convention, we're building a C
 function type; but that when matching the Swift native calling
 convention, we're building an LLVM function type directly.

 Results
 ^^^^^^^

 The first step is to consider the results of the function.

 There's a different set of rules here when we're matching the C
 calling convention.  If there's a single direct result type, and it's
 a nominal type imported from Clang, then the result type of the C
 function type is that imported Clang type.  Otherwise, concatenate the
 legal type sequences from the direct results.  If this yields an empty
 sequence, the result type is `void`.  If it yields a single legal
 type, the result type is the corresponding Clang type.  No other could
 actually have come from an imported C declaration, so we don't have
 any real compatibility requirements; for the convenience of
 interoperation, this is handled by constructing a new C struct which
 contains the corresponding Clang types for the legal type sequence as
 its fields.

 Otherwise, we are matching the Swift calling convention.  Concatenate
 the legal type sequences from all the direct results.  If
 target-specific logic decides that this is an acceptable collection to
 return directly, construct the appropriate IR result type to convince
 the backend to handle it.  Otherwise, use the `void` IR result type
 and return the "direct" results indirectly by passing the address of a
 tuple combining the original direct results (*not* the types from the
 legal type sequence).

 Finally, any indirect results from the semantically-lowered function
 type are simply added as pointer parameters.

 Parameters
 ^^^^^^^^^^

 After all the results are collected, it's time to collect the
 parameters.  This is done one at the time, from left to right, adding
 parameters to our physically-lowered type.

 If semantic lowering has decided that we have to pass the parameter
 indirectly, we simply add a pointer to the type.  This covers both
 mandatory-indirect pass-by-value parameters and pass-by-reference
 parameters.  The latter can arise even in C and Objective-C.

 Otherwise, the rules are somewhat different if we're matching the C
 calling convention.  If the parameter is a nominal type imported from
 Clang, then we just add the imported Clang type to the Clang function
 type as a parameter.  Otherwise, we derive the legal type sequence for
 the parameter type.  Again, we should only have compatibility
 requirements if the legal type sequence has a single element, but for
 the convenience of interoperation, we collect the corresponding Clang
 types for all of the elements of the sequence.

 Finally, if we're matching the Swift calling convention, derive the
 legal type sequence.  If the result appears to be a reasonably small
 and efficient set of parameters, add their corresponding IR types to
 the function type we're building; otherwise, ignore the legal type
 sequence and pass the address of the original type indirectly.

 Considerations for whether a legal type sequence is reasonable to pass
 directly:

 * There probably ought to be a maximum size.  Unless it's a single
   256-bit vector, it's hard to imagine wanting to pass more than, say,
   32 bytes of data as individual values.  The callee may decide that
   it needs to reconstruct the value for some reason, and the larger
   the type gets, the more expensive this is.  It may also be
   reasonable for this cap to be lower on 32-bit targets, but that
   might be dealt with better by the next restriction.

 * There should also be a cap on the number of values.  A 32-byte limit
   might be reasonable for passing 4 doubles.  It's probably not
   reasonable for passing 8 pointers.  That many values will exhaust
   all the parameter registers for just a single value.  4 is probably
   a reasonable cap here.

 * There's no reason to require the data to be homogeneous.  If a
   struct contains three floats and a pointer, why force it to be
   passed in memory?

 When all of the parameters have been processed in this manner,
 the function type is complete.