docs/proposals/Accessors.rst - third_party/swift - Git at Google

 :orphan:

 Optimizing Accessors in Swift
 =============================

 Definitions
 -----------

 An abstract storage declaration is a language construct that declares
 a means of accessing some sort of abstract entity.  I'll just say
 "storage" hereafter.

 Swift provides three storage declarations:

 * a single named entity, called a *variable* and declared with ``var``
 * a single named entity which can never be reassigned, called a *constant* and declared with ``let``
 * a compound unnamed entity accessed with an index, called a *subscript* and declared with ``subscript``

 These features are similar to those in other languages.  Swift notably
 lacks compound named entities, such as C#'s indexed properties; the
 design team intentionally chose to favor the use of named single
 entities of subscriptable type.

 It's useful to lump the two kinds of single named entities together;
 I'll just call them both "variables" hereafter.

 Subscripts must always be instance type members.  When a variable is
 a type member, it's called a *property*.

 When I say that these entities are *abstract*, I mean that they're not
 directly tied to any particular implementation.  All of them may be
 backed directly by memory storing values of the entity's type, or they
 may simply provide a way of invoking arbitrary code to access a
 logical memory, or they may be a little of both.

 Full-value accesses
 -------------------

 All accesses to storage, no matter the implementation, can be performed
 with two primitive operations:

 * a full-value load, which creates a new copy of the current value
 * a full-value store, which overwrites the current value with a
   different, independent value

 A function which implements a full-value load is called a *getter*;
 a full-value store, a *setter*.

 An operation which calls for a full-value load into a temporary, then
 a modification of the temporary, then a full-value store of the
 temporary into the original entity, is called a *write-back*.

 Implementing accesses with full-value accesses introduces two
 problems: subobject clobbering and performance.

 Subobject clobbering
 ~~~~~~~~~~~~~~~~~~~~

 Subobject clobbering is a semantic issue.  It occurs when there are
 two changes to an entity in flight at once, as in::

    swap(&point.x, &point.y)

 (By "at once", I mean synchronously.  Unlike Java, Swift is willing to
 say that an *asynchronous* simultaneous access (mutating or not) to an
 entity that's being modified is completely undefined behavior, with
 any sort of failure permitted, up to and including memory corruption
 and crashes.  As Swift transitions towards being a systems language,
 it can selectively choose to define some of this behavior,
 e.g. allowing racing accesses to different elements of an array.)

 Subobject clobbering is a problem with user expectation.  Users are
 unlikely to write code that intentionally modifies two obviously
 overlapping entities, but they might very reasonably write code that
 modifies two "separate" sub-entities.  For example, they might write
 code to swap two properties of a struct, or two elements of an array.
 The natural expansion of these subobject accesses using whole-object
 accesses generates code that "loses" the changes made to one of the
 objects::

   var point0 = point
   var x = point0.x
   var point1 = point
   var y = point1.y
   swap(&x, &y)
   point1.y = y
   point = point1
   point0.x = x
   point = point0

 Note that ``point.y`` is left unchanged.

 Local analysis
 ^^^^^^^^^^^^^^

 There are two straightforward solutions to subobject clobbering, both
 reliant on doing local analysis to recognize two accesses which
 obviously alias (like the two references to ``point`` in the above
 example).  Once you've done this, you can either:

 * Emit an error, outlawing such code.  This is what Swift currently
   does (but only when the aliasing access must be implemented with
   full-value loads and stores).
 * Use a single temporary, potentially changing the semantics.

 It's impossible to guarantee the absence of subobject clobbering with
 this analysis without extremely heavy-handed languages changes.
 Fortunately, subobject clobbering is "only" a semantics issue, not a
 memory-safety issue, at least as long as it's implemented with
 full-value accesses.

 Reprojection
 ^^^^^^^^^^^^

 A more general solution is to re-project the modified subobject from
 scratch before writing it back.  That is, you first acquire an initial
 value like normal for the subobject you wish to modify, then modify
 that temporary copy in-place.  But instead of then recursively
 consuming all the intermediate temporaries when writing them back, you
 drop them all and recompute the current value, then write the modified
 subobject to it, then write back all the way up.  That is::

   var point0 = point
   var x = point0.x
   var point1 = point
   var y = point1.y
   swap(&x, &y)
   point1 = point      // reload point1
   point1.y = y
   point = point1
   point0 = point      // reload point0
   point0.x = x
   point = point0

 In this example, I've applied the solution consistently to all
 accesses, which protects against unseen modifications (e.g. during the
 call to ``swap``) at the cost of performing two extra full-value
 loads.

 You can heuristically lower this by combining it with a simple local
 analysis and only re-projecting when writing back to other l-values
 besides the last.  In other words, generate code that will work as
 long as the entity is not modified behind abstraction barriers or
 through unexpected aliases::

   var point0 = point
   var x = point0.x
   var point1 = point
   var y = point1.y
   swap(&x, &y)
   point1.y = y        // do not reload point1
   point = point1
   point0 = point      // reload point0
   point0.x = x
   point = point0

 Note that, in either solution, you've introduced extra full-value
 loads.  This may be quite expensive, and it's not guaranteed to be
 semantically equivalent.

 Performance
 ~~~~~~~~~~~

 There are three major reasons why full-value accesses are inefficient.

 Unnecessary subobject accesses
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The first is that they may load or store more than is necessary.

 As an obvious example, imagine a variable of type ``(Int, Int)``; even
 if my code only accesses the first element of the tuple, full-value
 accesses force me to read or write the second element as well.  That
 means that, even if I'm purely overwriting the first element, I
 actually have to perform a full-value load first so that I know what
 value to use for the second element when performing the full-value
 store.

 Additionally, while unnecessarily loading the second element of an
 ``(Int, Int)`` pair might seem trivial, consider that the tuple could
 actually have twenty elements, or that the second element might be
 non-trivial to copy (e.g. if it's a retainable pointer).

 Abstraction barriers
 ^^^^^^^^^^^^^^^^^^^^

 A full-value load or store which you can completely reason about is one
 thing, but if it has to be performed as a call, it can be a major
 performance drag.

 For one, calls do carry a significant amount of low-level overhead.

 For another, optimizers must be extremely conservative about what a
 call might do.  A retainable pointer might have to be retained and
 later released purely to protect against the possibility that a getter
 might, somehow, cause the pointer to otherwise be deallocated.

 Furthermore, the conventions of the call might restrict performance.
 One way or another, a getter for a retainable pointer generally
 returns at +1, meaning that as part of the return, it is retained,
 forcing the caller to later release.  If the access were instead
 direct to memory, this retain might be avoidable, depending on what
 the caller does with the pointer.

 Copy-on-write
 ^^^^^^^^^^^^^

 These problems are compounded by copy-on-write (COW) types.  In Swift,
 a copy-on-write value embeds an object reference.  Copying the value
 has low immediate cost, because it simply retains the existing
 reference.  However, modifying a value requires the reference to be
 made unique, generally by copying the data held by the value into a
 fresh object.  I'll call this operation a *structural copy* in an
 effort to avoid the more treacherous term "deep copy".

 COW types are problematic with full-value accesses for several reasons.

 First, COW types are often used to implement aggregates and thus often
 have several distinguishable subobjects which users are likely to
 think of as independent.  This heightens the dangers of subobject
 clobbering.

 Second, a full-value load of a COW type implies making the object
 reference non-unique.  Changing the value at this point will force a
 structural copy.  This means that modifying a temporary copy has
 dramatically worse performance compared to modifying the original
 entity in-place.  For example::

   window.name += " (closing)"

 If ``&window.name`` can be passed directly to the operator, and the
 string buffer is uniquely referenced by that string, then this
 operation may be as cheap as copying a few characters into the tail of
 the buffer.  But if this must be done with a write-back, then the
 temporary will never have a unique reference, and there will always
 be an unneeded structural copy.

 Conservative access patterns
 ----------------------------

 When you know how storage is implemented, it's straightforward to
 generate an optimal access to it.  There are several major reasons why
 you might not know how a storage declaration is implemented, though:

 * It might be an abstract declaration, not a concrete declaration.
   Currently this means a protocol member, but Swift may someday add
   abstract class members.

 * It might be a non-final class member, where the implementation you
   can see is potentially overridable by a subclass.

 * It might be a resilient declaration, where you know only that the
   entity exists and know nothing statically about its implementation.

 In all of these cases, you must generate code that will handle the
 worst possible case, which is that the entity is implemented with a
 getter and a setter.  Therefore, the conservative access pattern
 includes opaque getter and setter functions.

 However, for all the reasons discussed above, using unnecessary
 full-value accesses can be terrible for performance.  It's really bad
 if a little conservatism --- e.g. because Swift failed to devirtualize
 a property access --- causes asymptotic inefficiencies.  Therefore,
 Swift's native conservative access pattern also includes a third
 accessor which permits direct access to storage when possible.  This
 accessor is called ``materializeForSet``.

 ``materializeForSet`` receives an extra argument, which is an
 uninitialized buffer of the value type, and it returns a pointer and a
 flag.  When it can provide direct access to storage for the entity, it
 constructs a pointer to the storage and returns false.  When it can't,
 it performs a full-value load into the buffer and returns true.  The
 caller performs the modification in-place on the returned pointer and
 then, if the flag is true, passes the value to the setter.

 The overall effect is to enable direct storage access as a dynamic
 optimization when it's impossible as a static optimization.

 For now, ``materializeForSet`` is always automatically generated based
 on whether the entity is implemented with a computed setter.  It is
 possible to imagine data structures that would benefit from having
 this lifted to a user-definable feature; for example, a data structure
 which sometimes holds its elements in memory but sometimes does not.

 ``materializeForSet`` can provide direct access whenever an address
 for the storage can be derived.  This includes when the storage is
 implemented with a ``mutableAddress`` accessor, as covered below.
 Observing accessors currently prevent ``materializeForSet`` from
 offering direct access; that's fixable for ``didSet`` using a slightly
 different code pattern, but ``willSet`` is an inherent obstacle.

 Independent of any of the other optimizations discussed in this
 whitepaper, ``materializeForSet`` had the potential to immediately
 optimize the extremely important case of mutations to COW values in
 un-devirtualized class properties, with fairly minimal risk.
 Therefore, ``materializeForSet`` was implemented first, and it shipped
 in Xcode 6.1.

 Direct access at computed addresses
 -----------------------------------

 What entities can be directly accessed in memory?  Non-computed
 variables make up an extremely important set of cases; Swift has
 enough built-in knowledge to know that it can provide direct access to
 them.  But there are a number of other important cases where the
 address of an entity is not built-in to the compiler, but where direct
 access is nonetheless possible.  For example, elements of a simple
 array always have independent storage in memory.  Most benchmarks on
 arrays would profit from being able to modify array elements in-place.

 There's a long chain of proposals in this area, many of which are
 refinement on previous proposals.  None of these proposals has yet
 shipped in Xcode.

 Addressors
 ~~~~~~~~~~

 For something like a simple array (or any similar structure, like a
 deque) which is always backed by a buffer, it makes sense for the
 implementor to simply define accessors which return the address of
 the element.  Such accessors are called *addressors*, and there are
 two: ``address`` and ``mutableAddress``.

 The conservative access pattern can be generated very easily from
 this: the getter calls ``address`` and loads from it, the setter calls
 ``mutableAddress`` and stores to it, and ``materializeForSet``
 provides direct access to the address returned from
 ``mutableAddress``.

 If the entity has type ``T``, then ``address`` returns an
 ``UnsafePointer<T>`` and ``mutableAddress`` returns an
 ``UnsafeMutablePointer<T>``.  This means that the formal type of the
 entity must exactly match the formal type of the storage.  Thus, the
 standard subscript on ``Dictionary<K,V>`` cannot be implemented using
 addressors, because the formal type of the entity is ``V?``, but the
 backing storage holds a ``V``.  (And this is in keeping with user
 expectations about the data structure: assigning ``nil`` at a key is
 supposed to erase any existing entry there, not create a new entry to
 hold ``nil``.)

 This simple addressor proposal was the first prong of our efforts to
 optimize array element access.  Unfortunately, while it is useful for
 several other types (such as ``ContiguousArray`` and
 ``UnsafeMutablePointer``), it is not flexible enough for the ``Array``
 type.

 Mixed addressors
 ~~~~~~~~~~~~~~~~

 Swift's chief ``Array`` type is only a simple array when it is not
 interacting with Objective-C.  Type bridging requires ``Array`` to be
 able to store an immutable ``NSArray`` instance, and the ``NSArray``
 interface does not expose the details of how it stores elements.  An
 ``NSArray`` is even permitted to dynamically generate its values in
 its ``objectAtIndex:`` method.  And it would be absurd for ``Array``
 to perform a structural copy during a load just to make non-mutating
 accesses more efficient!  So the load access pattern for ``Array``'s
 subscript declaration must use a getter.

 Fortunately, this requirement does not preclude using an addressor for
 mutating accesses.  Mutations to ``Array`` always transition the array
 to a unique contiguous buffer representation as their first step.
 This means that the subscript operator can sensibly return an address
 when it's used for the purposes of mutation: in other words, exactly
 when ``mutableAddress`` would be invoked.

 Therefore, the second prong of our efforts to optimize array element
 access was to allow entities to be implemented with the combination of
 a ``get`` accessor and a ``mutableAddress`` accessor.  This is
 straightforward in the user model, where it simply means lifting a
 restriction.  It's more complex behind the scenes because it broke
 what was previously a clean conceptual division between "physical" and
 "logical" l-values.

 Mixed addressors have now been adopted by ``Array`` to great success.
 As expected, they substantially improved performance mutating COW
 array elements.  But they also fix an important instance of subobject
 clobbering, because modifications to different subobjects (notably,
 different elements of the same array) can occur simultaneously by
 simply projecting out their addresses in the unique buffer.  For
 example, this means that it's possible to simply swap two elements
 of an array directly::

   swap(&array[i], &array[j])

   // Expanded:
   array.transitionToUniquelyReferenced()
   let address_i = array.buffer.storage + i
   array.transitionToUniquelyReferenced()
   let address_j = array.buffer.storage + j
   swap(address_i, address_j)

 Mixed addressors weren't completely implemented until very close to
 the Xcode 6.1 deadline, and they changed code-generation patterns
 enough to break a number of important array-specific optimizations.
 Therefore, the team sensibly decided that they were too risky for that
 release, and that there wasn't enough benefit from other applications
 to justify including any of the addressor work.

 In a way, that was a fortunate decision, because the naive version of
 addressors implemented so far in Swift creates a safety hole which
 would otherwise have been exposed to users.

 Memory unsafety of addressors
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 The semantics and memory safety of operations on COW types rely on a
 pair of simple rules:

 * A non-mutating operation must own a reference to the buffer for
   the full course of the read.

 * A mutating operation must own a unique reference to the buffer
   for the full course of the mutation.

 Both rules tend to be naturally satisfied by the way that operations
 are organized into methods.  A value must own a reference to its
 buffer at the moment that a method is invoked on it.  A mutating
 operation immediately transitions the buffer to a unique reference,
 performing a structural copy if necessary.  This reference will remain
 valid for the rest of the method as long as the method is *atomic*: as
 long as it does not synchronously invoke arbitrary user code.

 (This is a single-threaded notion of atomicity.  A second thread which
 modifies the value simultaneously can clearly invalidate the
 assumption.  But that would necessarily be a data race, and the
 language design team is willing to say that such races have fully
 undefined behavior, and arbitrary consequences like memory corruption
 and crashes are acceptable in their wake.)

 However, addressors are not atomic in this way: they return an address
 to the caller, which may then interleave arbitrary code before
 completing the operation.  This can present the opportunity for
 corruption if the interleaved code modifies the original value.
 Consider the following code::

   func operate(value: inout Int, count: Int) { ... }

   var array: [Int] = [1,2,3,4]
   operate(&array[0], { array = []; return 0 }())

 The dynamic sequence of operations performed here will expand like so::

   var array: [Int] = [1,2,3,4]
   let address = array.subscript.mutableAddress(0)
   array = []
   operate(address, 0)

 The assignment to ``array`` within the closure will release the buffer
 containing ``address``, thus passing ``operate`` a dangling pointer.

 Nor can this be fixed with a purely local analysis; consider::

   class C { var array: [Int] }
   let global_C = C()

   func assign(value: inout Int) {
     C.array = []
     value = 0
   }

   assign(&global_C.array[0])

 Fixing the memory safety hole
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Conceptually, the correct fix is to guarantee that the rules are
 satisfied by ensuring that the buffer is retained for the duration of
 the operation.  Any interleaving modifications will then see a
 non-uniquely-referenced buffer and perform a structural copy::

   // Project the array element.
   let address = array.subscript.mutableAddress(0)

   // Remember the new buffer value and keep it retained.
   let newArrayBuffer = array.buffer
   retain(newArrayBuffer)

   // Reassign the variable.
   release(array.buffer)
   array.buffer = ...

   // Perform the mutation.  These changes will be silently lost, but
   // they at least won't be using deallocated memory.
   operate(address, 0)

   // Release the "new" buffer.
   release(newArrayBuffer)

 Note that this still leaves a semantic hole if the original value is
 copied in interleaving code before the modification, because the
 subsequent modification will be reflected in the copy::

   // Project the array element.
   let address = array.subscript.mutableAddress(0)

   // Remember the new buffer value and keep it retained.
   let newArrayBuffer = array.buffer
   retain(newArrayBuffer)

   // Copy the value.  Note that arrayCopy uses the same buffer that
   // 'address' points into.
   let arrayCopy = array
   retain(arrayCopy.buffer)

   // Perform the mutation.
   operate(address, 0)

   // Release the "new" buffer.
   release(newArrayBuffer)

 This might be unexpected behavior, but the language team is willing to
 accept unexpected behavior for this code.  What's non-negotiable is
 breaking memory safety.

 Unfortunately, applying this fix naively reintroduces the problem of
 subobject clobbering: since a modification of one subobject
 immediately retains a buffer that's global to the entire value, an
 interleaved modification of a different subobject will see a
 non-unique buffer reference and therefore perform a structural copy.
 The modifications to the first subobject will therefore be silently
 lost.

 Unlike the interleaving copy case, this is seen as unacceptable.
 Notably, it breaks swapping two array elements::

   // Original:
   swap(&array[i], &array[j])

   // Expanded:

   // Project array[i].
   array.transitionToUniquelyReferenced()
   let address_i = array.buffer.storage + i
   let newArrayBuffer_i = array.buffer
   retain(newArrayBuffer_i)

   // Project array[j].  Note that this transition is guaranteed
   // to have to do a structural copy.
   array.transitionToUniquelyReferenced()
   let address_j = array.buffer.storage + j
   let newArrayBuffer_j = array.buffer
   retain(newArrayBuffer_j)

   // Perform the mutations.
   swap(address_i, address_j)

   // Balance out the retains.
   release(newArrayBuffer_j)
   release(newArrayBuffer_i)

 get- and setForMutation
 ~~~~~~~~~~~~~~~~~~~~~~~

 Some collections need finer-grained control over the entire mutation
 process. For instance, to support divide-and-conquer algorithms using
 slices, sliceable collections must "pin" and "unpin" their buffers
 while a slice is being mutated to grant permission for the slice
 to mutate the collection in-place while sharing ownership. This
 flexibility can be exposed by a pair of accessors that are called
 before and after a mutation. The "get" stage produces both the
 value to mutate and a state value (whose type must be declared) to
 forward to the "set" stage. A pinning accessor can then look something
 like this::

   extension Array {
     subscript(range: Range<Int>) -> Slice<Element> {
       // `getForMutation` must declare its return value, a pair of both
       // the value to mutate and a state value that is passed to
       // `setForMutation`.
       getForMutation() -> (Slice<Element>, PinToken) {
         let slice = _makeSlice(range)
         let pinToken = _pin()
         return (slice, pinToken)
       }

       // `setForMutation` receives two arguments--the result of the
       // mutation to write back, and the state value returned by
       // `getForMutation`.
       setForMutation(slice, pinToken) {
         _unpin(pinToken)
         _writeSlice(slice, backToRange: range)
       }
     }
   }

 ``getForMutation`` and ``setForMutation`` must appear as a pair;
 neither one is valid on its own. A ``get`` and ``set`` accessor
 should also still be provided for simple read and write operations.
 When the compiler has visibility that storage is implemented in
 terms of ``getForMutation`` and ``setForMutation``, it lowers a mutable
 projection using those accessors as follows::

   // A mutation like this (assume `reverse` is a mutating method):
   array[0...99].reverse()
   // Decomposes to:
   let index = 0...99
   (var slice, let state) = array.`subscript.getForMutation`(index)
   slice.reverse()
   array.`subscript.setForMutation`(index, slice, state)

 To support the conservative access pattern,
 a `materializeForSet` accessor can be generated from `getForMutation`
 and `setForMutation` in an obvious fashion: perform `getForMutation`
 and store the state result in its scratch space, and return a
 callback that loads the state and hands it off to `setForMutation`.

 The beacon of hope for a user-friendly future: Inversion of control
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Addressors and ``{get,set}ForMutation`` expose important optimizations
 to the standard library, but are undeniably fiddly and unsafe constructs
 to expose to users. A more natural model would be to
 recognize that a compound mutation is a composition of nested scopes, and
 express it in the language that way. A strawman model might look something
 like this::

   var foo: T {
     get { return getValue() }
     set { setValue(newValue) }

     // Perform a full in-out mutation. The `next` continuation is of
     // type `(inout T) -> ()` and must be called exactly once
     // with the value to hand off to the nested mutation operation.
     mutate(next) {
       var value = getValue()
       next(&value)
       setValue(value)
     }
   }

 This presents a natural model for expressing the lifetime extension concerns
 of addressors, and the state maintenance necessary for pinning ``getForMutation``
 accessors::

   // An addressing mutator
   mutate(next) {
     withUnsafePointer(&resource) {
       next(&$0.memory)
     }
   }

   // A pinning mutator
   mutate(next) {
     var slice = makeSlice()
     let token = pin()
     next(&slice)
     unpin(token)
     writeBackSlice(slice)
   }

 For various semantic and implementation efficiency reasons, we don't want to
 literally implement every access as a nesting of closures like this. Doing so
 would allow for semantic surprises (a mutate() operation never invoking its
 continuation, or doing so multiple times would be disastrous), and would
 interfere with the ability for `inout` and `mutating` functions to throw or
 otherwise nonlocally exit. However, we could present this model using
 *inversion of control*, similar to Python generators or async-await.
 A `mutate` operation could `yield` the `inout` reference to its inner value,
 and the compiler could enforce that a `yield` occurs exactly once on every
 control flow path::

   // An addressing, yielding mutator
   mutate {
     withUnsafePointer(&resource) {
       yield &$0.memory
     }
   }

   // A pinning mutator
   mutate {
     var slice = makeSlice()
     let token = pin()
     yield &slice
     unpin(token)
     writeBackSlice(slice)
   }

 This obviously requires more implementation infrastructure than we currently
 have, and raises language and library design issues (in particular,
 lifetime-extending combinators like ``withUnsafePointer`` would need either
 a ``reyields`` kind of decoration, or to become macros), but represents a
 promising path toward exposing the full power of the accessor model to
 users in an elegant way.

 Acceptability
 -------------

 This whitepaper has mentioned several times that the language team is
 prepared to accept such-and-such behavior but not prepared to accept
 some other kind of behavior.  Clearly, there is a policy at work.
 What is it?

 General philosophy
 ~~~~~~~~~~~~~~~~~~

 For any given language problem, a perfect solution would be one which:

 * guarantees that all operations complete without crashing or
   corrupting the program state,

 * guarantees that all operations produce results according to
   consistent, reliable, and intuitive rules,

 * does not limit or require complex interactions with the remainder
   of the language, and

 * imposes no performance cost.

 These goals are, however, not all simultaneously achievable, and
 different languages reach different balances.  Swift's particular
 philosophy is as follows:

 * The language should be as dynamically safe as possible.
   Straightforward uses of ordinary language features may cause
   dynamic failure, but the should never corrupt the program state.
   Any unsafe language or library features (other than simply calling
   into C code) should be explicitly labeled as unsafe.

   A dynamic failure should mean that the program reliably halts,
   ideally with a message clearly describing the source of the
   failure.  In the future, the language may allow for emergency
   recovery from such failures.

 * The language should sit on top of C, relying only on a relatively
   unobtrusive runtime.  Accordingly, the language's interactions
   with C-based technologies should be efficient and obvious.

 * The language should allow a static compiler to produce efficient
   code without dynamic instrumentation.  Accordingly, static
   analysis should only be blocked by incomplete information when
   the code uses an obviously abstract language feature (such as
   calling a class method or an unknown function), and the language
   should provide tools to allow programmers to limit such cases.

   (Dynamic instrumentation can, of course, still help, but it
   shouldn't be required for excellent performance.)

 General solutions
 ~~~~~~~~~~~~~~~~~

 A language generally has six tools for dealing with code it considers
 undesirable.  Some of this terminology is taken from existing
 standards, others not.

 * The language may nonetheless take steps to ensure that the code
   executes with a reliable result.  Such code is said to have
   *guaranteed behavior*.

 * The language may report the code as erroneous before it executes.
   Such code is said to be *ill formed*.

 * The language may reliably report the code as having performed an
   illegal operation when it executes.  Such code is said to be
   *asserting* or *aborting*.

 * The language may allow the code to produce an arbitrary-but-sound
   result.  Such code is said to have *unspecified behavior* or to
   have produced an *unspecified value*.

 * The language may allow the code to produce an unsound result which
   will result in another of these behaviors, but only if used.
   Such code is said to have produced a *trap value*.

 * The language may declare the code to be completely outside of the
   guarantees of the language.  Such code is said to have
   *undefined behavior*.

 In keeping with its design philosophy, Swift has generally limited
 itself to the first four solutions, with two significant exceptions.

 The first exception is that Swift provides several explicitly unsafe
 language and library features, such as ``UnsafePointer<T>`` and
 ``unowned(unsafe)``.  The use of these features is generally subject
 to undefined behavior rules.

 The second exception is that Swift does not make any guarantees about
 programs in the presence of race conditions.  It is extremely
 difficult to make even weak statements about the behavior of a program
 with a race condition without either:

 * heavily restricting shared mutable state on a language level, which
   would require invasive changes to how the language interacts with C;

 * forcing implicit synchronization when making any change to
   potentially shared memory, which would cripple performance and
   greatly complicate library implementation; or

 * using a garbage collector to manage all accessible memory, which
   would impose a very large burden on almost all of Swift's language
   goals.

 Therefore, Swift does surrender safety in the presence of races.

 Acceptability conditions for storage accesses
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Storage access involves a tension between four goals:

 * Preserving all changes when making simultaneous modifications to
   distinct subobjects; in other words, avoiding subobject clobbering

 * Performing a predictable and intuitive sequence of operations when
   modifying storage that's implemented with a getter and setter

 * Avoiding unnecessary copies of a value during a modification,
   especially when this forces a structural copy of a COW value

 * Avoiding memory safety holes when accessing storage that's been
   implemented with memory.

 Reprojection_ is good at preserving changes, but it introduces extra
 copies, and it's less intuitive about how many times getters and
 setters will be called.  There may be a place for it anyway, if we're
 willing to accept the extra conceptual complexity for computed
 storage, but it's not a reasonable primary basis for optimizing the
 performance of storage backed by memory.

 Solutions permitting in-place modification are more efficient, but
 they do have the inherent disadvantage of having to vend the address
 of a value before arbitrary interleaving code.  Even if the address
 remains valid, and the solution to that avoids subobject clobbering,
 there's an unavoidable issue that the write can be lost because the
 address became dissociated from the storage.  For example, if your
 code passes ``&array[i]`` to a function, you might plausibly argue
 that changes to that argument should show up in the ``i``\ th element of
 ``array`` even if you completely reassign ``array``.  Reprojection_
 could make this work, but in-place solutions cannot efficiently do so.
 So, for any in-place solution to be acceptable, there does need to be
 some rule specifying when it's okay to "lose track" of a change.

 Furthermore, the basic behavior of COW means that it's possible to
 copy an array with an element under modification and end up sharing
 the same buffer, so that the modification will be reflected in a value
 that was technically copied beforehand.  For example::

   var array = [1,2,3]
   var oldArray : [Int] = []

   // This function copies array before modifying it, but because that
   // copy is of a value undergoing modification, the copy will use
   // the same buffer and therefore observe updates to the element.
   func foo(element: inout Int) {
     oldArray = array
     element = 4
   }

   // Therefore, oldArray[2] will be 4 after this call.
   foo(&array[2])

 Nor can this be fixed by temporarily moving the modified array aside,
 because that would prevent simultaneous modifications to different
 elements (and, in fact, likely cause them to assert).  So the rule
 will also have to allow this.

 However, both of these possibilities already come up in the design of
 both the library and the optimizer.  The optimizer makes a number of
 assumptions about aliasing; for example, the general rule is that
 storage bound to an ``inout`` parameter cannot be accessed through
 other paths, and while the optimizer is not permitted to compromise
 memory safety, it is permitted to introduce exactly this kind of
 unexpected behavior where aliasing accesses may or may not the storage
 as a consistent entity.

 Formal accesses
 ^^^^^^^^^^^^^^^

 That rule leads to an interesting generalization.  Every modification
 of storage occurs during a *formal access* (FA) to that storage.  An
 FA is also associated with zero or more *designated storage names*
 (DSNs), which are ``inout`` arguments in particular execution records.
 An FA arises from an l-value expression, and its duration and DSN set
 depend on how the l-value is used:

 * An l-value which is simply loaded from creates an instantaneous FA
   at the time of the load.  The DSN set is empty.

   Example::

     foo(array)
     // instantaneous FA reading array

 * An l-value which is assigned to with ``=`` creates an
   instantaneous FA at the time of the primitive assignment.  The DSN
   set is empty.

   Example::

     array = []
     // instantaneous FA assigning array

   Note that the primitive assignment strictly follows the evaluation
   of both the l-value and r-value expressions of the assignment.
   For example, the following code::

     // object is a variable of class type
     object.array = object.array + [1,2,3]

   produces this sequence of formal accesses::

     // instantaneous FA reading object (in the left-hand side)
     // instantaneous FA reading object (in the right-hand side)
     // instantaneous FA reading object.array (in the right-hand side)
     // evaluation of [1,2,3]
     // evaluation of +
     // instantaneous FA assigning object.array

 * An l-value which is passed as an ``inout`` argument to a call
   creates an FA beginning immediately before the call and ending
   immediately after the call.  (This includes calls where an
   argument is implicitly passed ``inout``, such as to mutating
   methods or user-defined assignment operators such as ``+=`` or
   ``++``.) The DSN set contains the ``inout`` argument within the
   call.

   Example::

     func swap<T>(lhs: inout T, rhs: inout T) {}

     // object is a variable of class type
     swap(&leftObject.array, &rightObject.array)

   This results in the following sequence of formal accesses::

     // instantaneous FA reading leftObject
     // instantaneous FA reading rightObject
     // begin FA for inout argument leftObject.array (DSN={lhs})
     // begin FA for inout argument rightObject.array (DSN={rhs})
     // evaluation of swap
     // end FA for inout argument rightObject.array
     // end FA for inout argument leftObject.array

 * An l-value which is used as the base of a member storage access
   begins an FA whose duration is the same as the duration of the FA
   for the subobject l-value.  The DSN set is empty.

   Example::

     swap(&leftObject.array[i], &rightObject.array[j])

   This results in the following sequence of formal accesses::

     // instantaneous FA reading leftObject
     // instantaneous FA reading i
     // instantaneous FA reading rightObject
     // instantaneous FA reading j
     // begin FA for subobject base leftObject.array (DSN={})
     // begin FA for inout argument leftObject.array[i] (DSN={lhs})
     // begin FA for subobject base rightObject.array (DSN={})
     // begin FA for inout argument rightObject.array[j] (DSN={rhs})
     // evaluation of swap
     // end FA for subobject base rightObject.array[j]
     // end FA for inout argument rightObject.array
     // end FA for subobject base leftObject.array[i]
     // end FA for inout argument leftObject.array

 * An l-value which is the base of an ! operator begins an FA whose
   duration is the same the duration of the FA for the resulting
   l-value.  The DSN set is empty.

   Example::

     // left is a variable of type T
     // right is a variable of type T?
     swap(&left, &right!)

   This results in the following sequence of formal accesses::

     // begin FA for inout argument left (DSN={lhs})
     // begin FA for ! operand right (DSN={})
     // begin FA for inout argument right! (DSN={rhs})
     // evaluation of swap
     // end FA for inout argument right!
     // end FA for ! operand right
     // end FA for inout argument left

 * An l-value which is the base of a ? operator begins an FA whose
   duration begins during the formal evaluation of the l-value
   and ends either immediately (if the operand was nil) or at the
   end of the duration of the FA for the resulting l-value.
   In either case, the DSN set is empty.

   Example::

     // left is a variable of optional struct type
     // right is a variable of type Int
     left?.member += right

   This results in the following sequence of formal accesses, assuming
   that ``left`` contains a value::

     // begin FA for ? operand left (DSN={})
     // instantaneous FA reading right (DSN={})
     // begin FA for inout argument left?.member (DSN={lhs})
     // evaluation of +=
     // end FA for inout argument left?.member
     // end FA for ? operand left

 The FAs for all ``inout`` arguments to a call begin simultaneously at
 a point strictly following the evaluation of all the argument
 expressions.  For example, in the call ``foo(&array, array)``, the
 evaluation of the second argument produces a defined value, because
 the FA for the first argument does not begin until after all the
 arguments are formally evaluated.  No code should actually be emitted
 during the formal evaluation of ``&array``, but for an expression like
 ``someClassReference.someArray[i]``, the class r-value and index
 expressions would be fully evaluated at that time, and then the
 l-value would be kept abstract until the FA begins.  Note that this
 requires changes in SILGen's current code generation patterns.

 The FA rule for the chaining operator ``?`` is an exception to the
 otherwise-simple intuition that formal accesses begin immediately
 before the modification begins.  This is necessary because the
 evaluation rules for ``?`` may cause arbitrary computation to be
 short-circuited, and therefore the operand must be accessed during the
 formal evaluation of the l-value.  There were three options here:

 * Abandon short-circuiting for assignments to optional l-values.  This
   is a very high price; short-circuiting fits into user intuitions
   about the behavior of the chaining operator, and it can actually be
   quite awkward to replicate with explicit accesses.

 * Short-circuit using an instantaneous formal access, then start a
   separate formal access before the actual modification.  In other
   words, evaluation of ``X += Y`` would proceed by first determining
   whether ``X`` exists (capturing the results of any r-value
   components), discarding any projection information derived from
   that, evaluating ``Y``, reprojecting ``X`` again (using the saved
   r-value components and checking again for whether the l-value
   exists), and finally calling the ``+=`` operator.

   If ``X`` involves any sort of computed storage, the steps required
   to evaluate this might be... counter-intuitive.

 * Allow the formal access to begin during the formal evaluation of the
   l-value. This means that code like the following will have
   unspecified behavior::

     array[i]?.member = deriveNewValueFrom(array)

 In the end, I've gone with the third option.  The intuitive
 explanation is that ``array`` has to be accessed early in order to
 continue the evaluation of the l-value.  I think that's
 comprehensible to users, even if it's not immediately obvious.

 Disjoint and non-disjoint formal accesses
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 I'm almost ready to state the core rule about formal accesses, but
 first I need to build up a few more definitions.

 An *abstract storage location* (ASL) is:

 * a global variable declaration;

 * an ``inout`` parameter declaration, along with a reference
   to a specific execution record for that function;

 * a local variable declaration, along with a reference to a
   specific execution record for that declaration statement;

 * a static/class property declaration, along with a type having
   that property;

 * a struct/enum instance property declaration, along with an
   ASL for the base;

 * a struct/enum subscript declaration, along with a concrete index
   value and an ASL for the base;

 * a class instance property declaration, along with an instance of
   that class; or

 * a class instance subscript declaration, along with a concrete
   index value and an instance of that class.

 Two abstract storage locations may be said to *overlap*.  Overlap
 corresponds to the imprecise intuition that a modification of one
 location directly alters the value of another location.  Overlap
 is an "open" property of the language: every new declaration may
 introduce its own overlap behavior.  However, the language and
 library make certain assertions about the overlap of some locations:

 * An ``inout`` parameter declaration overlaps exactly the set
   of ASLs overlapped by the ASL which was passed as an argument.

 * If two ASLs are both implemented with memory, then they overlap
   only if they have the same kind in the above list and the
   corresponding data match:

     * execution records must represent the same execution
     * types must be the same
     * class instances must be the same
     * ASLs must overlap

 * For the purposes of the above rule, the subscript of a standard
   library array type is implemented with memory, and the two
   indexes match if they have the same integer value.

 * For the purposes of the above rule, the subscript of a standard
   library dictionary type is implemented with memory, and the two
   indexes match if they compare equal with ``==``.

 Because this definition is open, it is impossible to completely
 statically or dynamically decided it.  However, it would still be
 possible to write a dynamic analysis which decided it for common
 location kinds.  Such a tool would be useful as part of, say, an
 ASan-like dynamic tool to diagnose violations of the
 unspecified-behavior rule below.

 The overlap rule is vague about computed storage partly because
 computed storage can have non-obvious aliasing behavior and partly
 because the subobject clobbering caused by the full-value accesses
 required by computed storage can introduce unexpected results that can
 be reasonably glossed as unspecified values.

 This notion of abstract storage location overlap can be applied to
 formal accesses as well.  Two FAs ``x`` and ``y`` are said to be
 *disjoint* if:

 * they refer to non-overlapping abstract storage locations or

 * they are the base FAs of two disjoint member storage accesses
   ``x.a`` and ``y.b``.

 Given these definitions, the core unspecified-behavior rule is:

   If two non-disjoint FAs have intersecting durations, and neither FA
   is derived from a DSN for the other, then the program has
   unspecified behavior in the following way: if the second FA is a
   load, it yields an unspecified value; otherwise, both FAs store an
   unspecified value in the storage.

 Note that you cannot have two loads with intersecting durations,
 because the FAs for loads are instantaneous.

 Non-overlapping subobject accesses make the base accesses disjoint
 because otherwise code like ``swap(&a[0], &a[1])`` would have
 unspecified behavior, because the two base FAs are to clearly
 overlapping locations and have intersecting durations.

 Note that the optimizer's aliasing rule falls out from this rule.  If
 storage has been bound as an ``inout`` argument, accesses to it
 through any path not derived from the ``inout`` argument will start a
 new FA for overlapping storage, the duration of which will necessarily
 intersect duration with that of the FA through which the ``inout``
 argument was bound, causing unspecified behavior.  If the ``inout``
 argument is forwarded to another call, that will start a new FA which
 is validly based on a DSN of the first; but an attempt to modify the
 storage through the first ``inout`` argument while the second call is
 active will create a third FA not based on the DSN from the second
 ``inout`` call, causing a conflict there.  Therefore a function may
 assume that it can see all accesses to the storage bound to an
 ``inout`` argument.

 If you didn't catch all that...
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 That may have been a somewhat intense description, so here's a simple
 summary of the rule being proposed.

 If storage is passed to an ``inout`` argument, then any other
 simultaneous attempt to read or write to that storage, including to
 the storage containing it, will have unspecified behavior.  Reads
 from it may see partially-updated values, or even values which will
 change as modifications are made to the original storage; and writes
 may be clobbered or simply disappear.

 But this only applies during the call with the ``inout`` argument: the
 evaluation of other arguments to the call will not be interfered with,
 and as soon as the call ends, all these modifications will resolve
 back to a quiescent state.

 And this unspecified behavior has limits.  The storage may end up with
 an unexpected value, with only a subset of the writes made to it, and
 copies from it may unexpectedly reflect modifications made after they
 were copied.  However, the program will otherwise remain in a
 consistent and uncorrupted state.  This means that execution will be
 able to continue apace as long as these unexpected values don't trip
 up some higher-level invariant.

 Tracking formal accesses
 ------------------------

 Okay, now that I've analyzed this to death, it's time to make a
 concrete proposal about the implementation.

 As discussed above, the safety hole with addressors can be fixed by
 always retaining the buffer which keeps the address valid.  Assuming
 that other uses of the buffer follow the general copy-on-write
 pattern, this retain will prevent structural changes to the buffer
 while the address is in use.

 But, as I also discussed above, this introduces two problems:

 Copies during modification
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

 Copying a COW aggregate value always shares the same buffer that was
 stored there at the time of the copy; there is no uniqueness check
 done as part of the copy.  Changes to subobjects will then be
 instantly reflected in the "copy" as they are made to the original.
 The structure of the copy will stay the same, but the values of
 its subobjects will appear to spontaneously change.

 I want to say that this behavior is acceptable according to the
 formal-access rule I laid out above.  How does that reasoning work?

 First, I need to establish what kind of behavior is at work here.  It
 clearly isn't guaranteed behavior: copies of COW values are normally
 expected to be independent.  The code wasn't rejected by the compiler,
 nor did it dynamically assert; it simply seems to misbehave.  But
 there are limits to the misbehavior:

 * By general COW rules, there's no way to change the structure of an
   existing buffer unless the retain count is 1.  For the purposes of
   this analysis, that means that, as long as the retain count is
   above 1, there's no way to invalidate the address returned by the
   addressor.

 * The buffer will be retained for as long as the returned address
   is being modified.  This retain is independent of any storage
   which might hold the aggregate value (and thus also retain the buffer).

 * Because of this retain, the only way for the retain count to drop
   to 1 is for no storage to continue to refer to the buffer.

 * But if no storage refers to the buffer, there is no way to
   initiate an operation which would change the buffer structure.

 Thus the address will remain valid, and there's no danger of memory
 corruption.  The only thing is that the program no longer makes useful
 guarantees about the value of the copied aggregate.  In other words,
 the copy yielded an unspecified value.

 The formal-access rule allows loads from storage to yield an
 unspecified value if there's another formal access to that storage in
 play and the load is (1) not from an l-value derived from a name in
 the other FA's DSN set and (2) not from a non-overlapping subobject.
 Are these conditions true?

 Recall that an addressor is invoked for an l-value of the form::

   base.pointee

 or::

   base[i]

 Both cases involve a formal access to the storage ``base`` as the base
 of a subobject formal access.  This kind of formal access always has
 an empty DSN set, regardless of how the subobject is used.  A COW
 mutable addressor will always ensure that the buffer is uniquely
 referenced before returning, so the only way that a value containing
 that buffer can be copied is if the load is a non-subobject access to
 ``base``.  Therefore, there are two simultaneous formal accesses to
 the same storage, and the load is not from an l-value derived from the
 modification's DSN set (which is empty), nor is it for a
 non-overlapping subobject.  So the formal-access rule applies, and
 an unspecified value is an acceptable result.

 The implementation requirement here, then, is simply that the
 addressor must be called, and the buffer retained, within the duration
 of the formal access.  In other words, the addressor must only
 be called immediately prior to the call, rather than at the time
 of the formal evaluation of the l-value expression.

 What would happen if there *were* a simultaneous load from a
 non-overlapping subobject?  Accessing the subobject might cause a
 brief copy of ``base``, but only for the duration of copying the
 subobject.  If the subobject does not overlap the subobject which was
 projected out for the addressor, then this is harmless, because the
 addressor will not allow modifications to those subobjects; there
 might be other simultaneous formal accesses which do conflict, but
 these two do not.  If the subobject does overlap, then a recursive
 analysis must be applied; but note that the exception to the
 formal-access rule will only apply if non-overlapping subobjects were
 projected out from *both* formal accesses.  Otherwise, it will be
 acceptable for the access to the overlapping subobject to yield an
 unspecified value.

 Avoiding subobject clobbering during parallel modification
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 The other problem is that the retain will prevent simultaneous changes
 to the same buffer.  The second change will cause a structural copy,
 and the first address will end up modifying a buffer which is no
 longer referenced: in other words, the program will observe subobject
 clobbering.  A similar analysis to the one from the last section
 suggests that this can be described as unspecified behavior.

 Unfortunately, this unspecified behavior is unwanted: it violates the
 guarantees of the formal-access rule as I laid it out above, because
 it occurs even if you have formal accesses to two non-overlapping
 subobjects.  So something does need to be done here.

 One simple answer is to dynamically track whether a COW buffer is
 currently undergoing a non-structural mutation.  I'll call this *NSM
 tracking*, and I'll call buffers which are undergoing non-structural
 mutations *NSM-active*.

 The general rules of COW say that mutating operations must ensure that
 their buffer is uniquely referenced before performing the
 modification.  NSM tracking works by having non-structural mutations
 perform a weaker check: the buffer must be either uniquely referenced
 or be NSM-active.  If the non-structural mutation allows arbitrary
 code to run between the start of the mutation and the end --- as an
 addressor does --- it must both retain the buffer and flag it as
 NSM-active for the entire duration.

 Because the retain still occurs, and because any *structural* changes
 to the buffer that might invalidate the addresses of subobjects are
 still blocked by that retain, all of the earlier analysis about the
 memory safety of simultaneous accesses still applies.  The only change
 is that simultaneous non-structural modifications, as would be created
 by simultaneous formal accesses to subobjects, will now be able to
 occur on a single buffer.

 A set of simultaneous formal accesses on a single thread follows a
 natural stack protocol, or can be made to do so with straightforward
 SILGen and SIL optimizer consideration.  Therefore, the runtime can
 track whether a buffer is NSM-active on a thread using a single bit,
 which nested modifications can be told not to clear.  Call this the
 *NSM bit*.  Ignoring multithreading considerations for a moment, since
 the NSM bit is only ever set at the same as a retain and only ever
 cleared at the same time as a release, it makes sense to pack this
 into the strong reference count.  There is no need to support this
 operation on non-Swift objects.  The runtime should provide three new
 functions:

 * A function to test whether an object is either uniquely referenced
   or NSM-active.  Call this ``swift_isUniquelyReferencedForNSM``.

 * A function to perform the above test and, if the test passes and
   the NSM bit is not set, atomically retain the object and set
   the NSM bit.  It should return both the result of the test and an
   object to later set as NSM-inactive.  That object will be nil if
   the test failed or the NSM bit was already set.  Call this
   ``swift_tryRetainForNSM``.

 * A function to atomically clear the NSM bit and release the object.
   Call this ``swift_releaseForNSM``.

 These operations should also be reflected in SIL.

 Concurrent modifications and the non-structural modification bit
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 What about concurrency?  Two concurrent non-structural modifications
 could race to set the NSM bit, and then the winning thread could clear
 it before the other thread's modification is complete.  This could
 cause memory-unsafe behavior, since the losing thread would be
 modifying the object through an address while not retaining the value.

 The major question here is whether this is a significant objection.
 It's accepted that race conditions have undefined behavior.  Is such
 code inherently racy?

 The answer appears to be "no", and that it is possible to write code
 which concurrently writes to existing non-overlapping elements of a
 COW aggregate without causing races; but that such code is extremely
 fraught, and moreover it is extremely fraught regardless of whether
 NSM-activeness is tracked with a single bit or a wider count.  Consider:

 * If the shared aggregate value is ever non-uniquely referenced, two
   threads concurrently modifying it will race to unique the array.
   This unavoidably has undefined behavior, because uniquing the
   array requires the previous value to eventually be released, and a
   race may cause an over-release.

 * Assume that it's possible to guarantee that the aggregate value's
   buffer is uniquely referenced before any threads concurrently
   access it.  Now, all of the threads are performing different
   concurrent accesses.

   * If any of the accesses is a structural modification, there will
     be a race to re-unique the buffer.

   * If all of the accesses are non-structural modifications, then
     there will be no races as long as the retain-and-set and
     release-and-clear operations are atomic: when starting any
     particular operation, the buffer will always either be uniquely
     referenced or have the bit set.

   * If any of the accesses is a read, and that read does not occur
     during a non-structural modification, then the buffer may
     briefly become non-uniquely referenced and there will be a
     race from concurrent modifications to re-unique it.

   * If any of the accesses is a read, and that read occurs during a
     non-structural modification, and the optimizer does not re-order
     the read's retain/release around the retainForNSM/releaseForNSM
     operations, then it matters how NSM-activeness is tracked.

     If there is complete tracking (i.e. a count, not just a single
     bit), the retain for the read will only occur while the buffer
     is flagged as NSM-active, and so it will have no effect.

     If there is incomplete tracking (i.e. just a single NSM bit),
     then there is a potential for undefined behavior.  Suppose two
     threads race to set the NSM bit.  The loser then initiates a
     read and retains the buffer.  Before the loser releases the
     buffer, the winner clears the NSM bit.  Now another thread might
     see that the buffer is non-uniquely referenced and not
     NSM-active, and so it will attempt to unique the buffer.

     It is probably unreasonable to require the optimizer to never
     reorder ordinary retains and releases past retainForNSM and
     releaseForNSM operations.

 More importantly, the use case here (many threads concurrently
 accessing different elements of a shared data structure) just
 inherently doesn't really work well with a COW data structure.  Even
 if the library were able to make enough guarantees to ensure that,
 with the right pattern of accesses, there would never be a structural
 copy of the aggregate, it would still be extremely inefficient,
 because all of the threads would be competing for atomic access to the
 strong reference count.

 In short, I think it's reasonable for the library to say that programs
 which want to do this should always use a type with reference
 semantics.  Therefore, it's reasonable to ignore concurrent accesses
 when deciding how to best track whether an aggregate is undergoing
 non-structural modification.  This removes the only objection I
 can see to tracking this with a single NSM bit.

 Code generation patterns
 ~~~~~~~~~~~~~~~~~~~~~~~~

 The signatures and access patterns for addressors will need to change
 in order to ensure memory-safety.

 ``mutableAddress`` currently returns an ``UnsafeMutablePointer``; it
 will need to return ``(Builtin.NativeObject?, UnsafeMutablePointer)``.
 The owner pointer must be a native object; we cannot efficiently
 support either uniqueness checking or the NSM bit on non-Swift
 objects.  SILGen will mark that the address depends on the owner
 reference and push a cleanup to ``releaseForNSM`` it.

 ``address`` currently returns an ``UnsafePointer``; it will need to
 return ``(Builtin.NativeObject?, UnsafePointer)``.  I do not currently
 see a reason to allow non-Swift owners, but the model doesn't depend
 on that.  SILGen will mark that the address depends on the owner
 reference and push a cleanup to ``release`` it.

 In order to support ultimately calling an addressor in the
 conservative access path, ``materializeForSet`` must also return an
 owner reference.  Since ``materializeForSet`` calls ``mutableAddress``
 in this case, SILGen will follow that pattern for calls.  SILGen will
 also assume that the need to perform a ``releaseForNSM`` is exclusive
 with the need to call the setter.

 Mutating operations on COW types will now have two different paths for
 making a buffer mutable and unique: one for structural mutations and
 another for non-structural mutations.  I expect that this will require
 separate semantics annotations, and the optimizer will have to
 recognize both.

 ``releaseForNSM`` operations will not be reorderable unless the
 optimizer can prove that the objects are distinct.

 Summary of proposal and plan
 ----------------------------

 Let me summarize what I'm proposing:

 * Swift's core approach to optimizing accesses should be based
   around providing direct access to memory, either statically or
   dynamically.  In other words, Swift should adopt addressors on
   core data structures as much as possible.

 * Swift should fix the current memory hole with addressors by
   retaining for the duration of the access and, for modifications,
   flagging the buffer as NSM-active.  The implementation plan
   follows:

   * The runtime implements the NSM-bit and its entrypoints.

   * SIL provides operations for manipulating and querying the NSM
     bit.  IRGen implements these operations using the runtime
     functions.  Builtins are exposed.

   * The standard library changes data structures to do different
     uniquing for structural and non-structural modifications.  This
     patch is not yet committed.

   * The optimizer reacts to the above.  When both are settled, they
     can be committed.

   * SILGen changes the emission patterns for l-values so that
     addresses and writebacks are live only during the formal
     access.

   * Sema changes the signature of ``address``, ``mutableAddress``,
     and ``materializeForSet`` to return an optional owner reference.
     Sema changes ``materializeForSet`` synthesis to return the
     owner correctly.  SILGen implements the desired code patterns.

     The standard library changes its addressor implementations
     to continue to compile, but for staging purposes, it only uses
     nil owners.

   * The standard library changes addressor implementations to
     use meaningful owners.  This patch is not yet committed.

   * The optimizer reacts to the above.  When both are settled, they
     can be committed.