docs/SequencesAndCollections.rst - third_party/swift - Git at Google

 :orphan:

 .. default-role:: code

 ====================================
  Sequences And Collections in Swift
 ====================================

 Unlike many languages, Swift provides a rich taxonomy of abstractions
 for processing series of elements.  This document explains why that
 taxonomy exists and how it is structured.

 Sequences
 =========

 It all begins with Swift's `for`\ ...\ `in` loop::

   for x in s {
     doSomethingWith(x)
   }

 Because this construct is generic, `s` could be

 * an array
 * a set
 * a linked list
 * a series of UI events
 * a file on disk
 * a stream of incoming network packets
 * an infinite series of random numbers
 * a user-defined data structure
 * etc.

 In Swift, all of the above are called **sequences**, an abstraction
 represented by the `SequenceType` protocol::

   protocol SequenceType {
     typealias Iterator : IteratorProtocol
     func makeIterator() -> Iterator
   }

 .. sidebar:: Hiding Iterator Type Details

   A sequence's iterator is an associated type--rather than something
   like |AnyIterator|__ that depends only on the element type--for
   performance reasons.  Although the alternative design has
   significant usability benefits, it requires one dynamic
   allocation/deallocation pair and *N* dynamic dispatches to traverse
   a sequence of length *N*.  That said, our optimizer has improved to
   the point where it can sometimes remove these overheads completely,
   and we are `considering <rdar://19755076>`_ changing the design
   accordingly.

   .. |AnyIterator| replace:: `AnyIterator<T>`

   __ http://swiftdoc.org/v3.0/type/AnyIterator/

 As you can see, sequence does nothing more than deliver an iterator.
 To understand the need for iterators, it's important to distinguish
 the two kinds of sequences.

 * **Volatile** sequences like "stream of network packets," carry
   their own traversal state, and are expected to be "consumed" as they
   are traversed.

 * **Stable** sequences, like arrays, should *not* be mutated by `for`\
   ...\ `in`, and thus require *separate traversal state*.

 To get an initial traversal state for an arbitrary sequence `x`, Swift
 calls `x.makeIterator()`.  The sequence delivers that state, along with
 traversal logic, in the form of an **iterator**.

 Iterators
 ==========

 `for`\ ...\ `in` needs three operations from the iterator:

 * get the current element
 * advance to the next element
 * detect whether there are more elements

 If we literally translate the above into protocol requirements, we get
 something like this::

   protocol NaiveIteratorProtocol {
     typealias Element
     var current() -> Element      // get the current element
     mutating func advance()       // advance to the next element
     var isExhausted: Bool         // detect whether there are more elements
   }

 Such a protocol, though, places a burden on implementors of volatile
 sequences: either the iterator must buffer the current element
 internally so that `current` can repeatedly return the same value, or
 it must trap when `current` is called twice without an intervening
 call to `moveToNext`.  Both semantics have a performance cost, and
 the latter unnecessarily adds the possibility of incorrect usage.

 .. sidebar:: `NSEnumerator`

   You might recognize the influence on iterators of the `NSEnumerator` API::

     class NSEnumerator : NSObject {
       func nextObject() -> AnyObject?
     }

 Therefore, Swift's `IteratorProtocol` merges the three operations into one,
 returning `nil` when the iterator is exhausted::

   protocol IteratorProtocol {
     typealias Element
     mutating func next() -> Element?
   }

 Combined with `SequenceType`, we now have everything we need to
 implement a generic `for`\ ...\ `in` loop.

 .. sidebar:: Adding a Buffer

   The use-cases for singly-buffered iterators are rare enough that it
   is not worth complicating `IteratorProtocol`, [#input_iterator]_ but
   support for buffering would fit nicely into the scheme, should it
   prove important::

     public protocol BufferedIteratorProtocol
       : IteratorProtocol {
       var latest: Element? {get}
     }

   The library could easily offer a generic wrapper that adapts any
   `IteratorProtocol` to create a `BufferedIteratorProtocol`::

     /// Add buffering to any IteratorProtocol I
     struct BufferedIterator<I : IteratorProtocol>
       : BufferedIteratorProtocol {

       public init(_ baseIterator: I) {
         self._baseIterator = baseIterator
       }
       public func next() -> Element? {
         latest = _baseIterator.next() ?? latest
         return latest
       }
       public private(set) var latest: I.Element?
       private var _baseIterator: I
     }

 Operating on Sequences Generically
 ----------------------------------

 Given an arbitrary `SequenceType`, aside from a simple `for`\ ...\ `in` loop,
 you can do anything that requires reading elements from beginning to
 end.  For example::

   // Return an array containing the elements of `source`, with
   // `separator` interposed between each consecutive pair.
   func array<S: SequenceType>(
     _ source: S,
     withSeparator separator: S.Iterator.Element
   ) -> [S.Iterator.Element] {
     var result: [S.Iterator.Element] = []
     var iterator = source.makeIterator()
     if let start = iterator.next() {
       result.append(start)
       while let next = iterator.next() {
         result.append(separator)
         result.append(next)
       }
     }
     return result
   }

   let s = String(array("Swift", withSeparator: "|"))
   print(s)        // "S|w|i|f|t"

 Because sequences may be volatile, though, you can--in general--only
 make a single traversal.  This capability is quite enough for many
 languages: the iteration abstractions of Java, C#, Python, and Ruby
 all go about as far as `SequenceType`, and no further.  In Swift,
 though, we want to do much more generically.  All of the following
 depend on stability that an arbitrary sequence can't provide:

 * Finding a sub-sequence
 * Finding the element that occurs most often
 * Meaningful in-place element mutation (including sorting,
   partitioning, rotations, etc.)

 .. sidebar:: Iterators Should Be Sequences

   In principle, every iterator is a volatile sequence containing
   the elements it has yet to return from `next()`.  Therefore, every
   iterator *could* satisfy the requirements of `SequenceType` by
   simply declaring conformance, and returning `self` from its
   `makeIterator()` method.  In fact, if it weren't for `current language
   limitations <rdar://17986597>`_, `IteratorProtocol` would refine
   `SequenceType`, as follows:

   .. parsed-literal::

        protocol IteratorProtocol **: SequenceType** {
          typealias Element
          mutating func next() -> Element?
        }

   Though we may not currently be able to *require* that every
   `IteratorProtocol` refines `SequenceType`, most iterators in the
   standard library do conform to `SequenceType`.

 Fortunately, many real sequences *are* stable. To take advantage of
 that stability in generic code, we'll need another protocol.

 Collections
 ===========

 A **collection** is a stable sequence with addressable "positions,"
 represented by an associated `Index` type::

   protocol CollectionType : SequenceType {
     typealias Index : ForwardIndexType             // a position
     subscript(i: Index) -> Iterator.Element {get}

     var startIndex: Index {get}
     var endIndex: Index {get}
   }

 The way we address positions in a collection is a generalization of
 how we interact with arrays: we subscript the collection using its
 `Index` type::

   let ith = c[i]

 An **index**\ --which must model `ForwardIndexType`\ --is a type with a
 linear series of discrete values that can be compared for equality:

 .. sidebar:: Dictionary Keys

    Although dictionaries overload `subscript` to also operate on keys,
    a `Dictionary`\ 's `Key` type is distinct from its `Index` type.
    Subscripting on an index is expected to offer direct access,
    without introducing overheads like searching or hashing.

 ::

   protocol ForwardIndexType : Equatable {
     typealias Distance : SignedIntegerType
     func successor() -> Self
   }

 While one can use `successor()` to create an incremented index value,
 indices are more commonly advanced using an in-place increment
 operator, just as one would when traversing an array: `++i` or `i++`.
 These operators are defined generically, for all models of
 `ForwardIndexType`, in terms of the `successor()` method.

 Every collection has two special indices: a `startIndex` and an
 `endIndex`.  In an empty collection, `startIndex == endIndex`.
 Otherwise, `startIndex` addresses the collection's first element, and
 `endIndex` is the successor of an index addressing the collection's
 last element.  A collection's `startIndex` and `endIndex` form a
 half-open range containing its elements: while a collection's
 `endIndex` is a valid index value for comparison, it is not a valid
 index for subscripting the collection::

   if c.startIndex != c.endIndex { } // OK
   c[c.endIndex]                     // Oops! (index out-of-range)

 Mutable Collections
 -------------------

 A **mutable collection** is a collection that supports in-place element
 mutation.  The protocol is a simple refinement of `CollectionType` that adds a
 subscript setter:

 .. parsed-literal::

   protocol MutableCollectionType : CollectionType {
     subscript(i: Index) -> Iterator.Element { get **set** }
   }

 The `CollectionType` protocol does not require collection to support mutation,
 so it is not possible to tell from the protocol itself whether the order of
 elements in an instance of a type that conforms to `CollectionType` has a
 domain-specific meaning or not.  (Note that since elements in collections have
 stable indices, the element order within the collection itself is stable; the
 order sometimes does not have a meaning and is not chosen by the code that uses
 the collection, but by the implementation details of the collection itself.)

 `MutableCollectionType` protocol allows the caller to replace a specific element,
 identified by an index, with another one in the same position.  This capability
 essentially allows to rearrange the elements inside the collection in any
 order, thus types that conform to `MutableCollectionType` can represent
 collections with a domain-specific element order (not every instance of a
 `MutableCollectionType` has an interesting order, though).

 Range Replaceable Collections
 -----------------------------

 The `MutableCollectionType` protocol implies only mutation of content, not of
 structure (for example, changing the number of elements).  The
 `RangeReplaceableCollectionType` protocol adds the capability to perform
 structural mutation, which in its most general form is expressed as replacing a
 range of elements, denoted by two indices, by elements from a collection with a
 **different** length.

 ::

   public protocol RangeReplaceableCollectionType : MutableCollectionType {
     mutating func replaceSubrange<
       C: CollectionType where C.Iterator.Element == Self.Iterator.Element
     >(
       _ subRange: Range<Index>, with newElements: C
     )
   }


 Index Protocols
 ---------------

 As a generalization designed to cover diverse data structures,
 `CollectionType` provides weaker guarantees than arrays do.  In
 particular, an arbitrary collection does not necessarily offer
 efficient random access; that property is determined by the protocol
 conformances of its `Index` type.

 **Forward indices** are the simplest and most general, capturing the
 capabilities of indices into a singly-linked list:

 1. advance to the next position
 2. detect the end position

 **Bidirectional indices** are a refinement of forward indices that
 additionally support reverse traversal::

   protocol BidirectionalIndexType : ForwardIndexType {
     func predecessor() -> Self
   }

 Indices into a doubly-linked list would be bidirectional, as are the
 indices that address `Character`\ s and `UnicodeScalar`\ s in a
 `String`.  Reversing the order of a collection's elements is a simple
 example of a generic algorithm that depends on bidirectional traversal.

 **Random access indices** have two more requirements: the ability to
 efficiently measure the number of steps between arbitrary indices
 addressing the same collection, and the ability to advance an index by
 a (possibly negative) number of steps::

   public protocol RandomAccessIndexType : BidirectionalIndexType {
     func distance(to other: Self) -> Distance
     func advanced(by n: Distance) -> Self
   }

 From these methods, the standard library derives several other
 features such as `Comparable` conformance, index subtraction, and
 addition/subtraction of integers to/from indices.

 The indices of a `deque
 <https://en.wikipedia.org/wiki/Double-ended_queue>`_ can provide random
 access, as do the indices into `String.UTF16View` (when Foundation is
 loaded) and, of course, array indices.  Many common sorting and
 selection algorithms, among others, depend on these capabilities.

 All direct operations on indices are intended to be lightweight, with
 amortized O(1) complexity.  In fact, indices into `Dictionary` and
 `Set` *could* be bidirectional, but are limited to modeling
 `ForwardIndexType` because the APIs of `NSDictionary` and
 `NSSet`--which can act as backing stores of `Dictionary` and `Set`--do
 not efficiently support reverse traversal.

 Conclusion
 ==========

 Swift's sequence, collection, and index protocols allow us to write
 general algorithms that apply to a wide variety of series and data
 structures.  The system has been both easy to extend, and predictably
 performant.  Thanks for taking the tour!

 ------

 .. [#input_iterator] This trade-off is not as obvious as it might
    seem.  For example, the C# and C++ analogues for `IteratorProtocol`
    (`IEnumerable` and `input iterator`) are saddled with the
    obligation to provide buffering.
	:orphan:

	.. default-role:: code

	====================================
	Sequences And Collections in Swift
	====================================

	Unlike many languages, Swift provides a rich taxonomy of abstractions
	for processing series of elements. This document explains why that
	taxonomy exists and how it is structured.

	Sequences
	=========

	It all begins with Swift's `for`\ ...\ `in` loop::

	for x in s {
	doSomethingWith(x)
	}

	Because this construct is generic, `s` could be

	* an array
	* a set
	* a linked list
	* a series of UI events
	* a file on disk
	* a stream of incoming network packets
	* an infinite series of random numbers
	* a user-defined data structure
	* etc.

	In Swift, all of the above are called sequences, an abstraction
	represented by the `SequenceType` protocol::

	protocol SequenceType {
	typealias Iterator : IteratorProtocol
	func makeIterator() -> Iterator
	}

	.. sidebar:: Hiding Iterator Type Details

	A sequence's iterator is an associated type--rather than something
	like \|AnyIterator\|__ that depends only on the element type--for
	performance reasons. Although the alternative design has
	significant usability benefits, it requires one dynamic
	allocation/deallocation pair and N dynamic dispatches to traverse
	a sequence of length N. That said, our optimizer has improved to
	the point where it can sometimes remove these overheads completely,
	and we are `considering <rdar://19755076>`_ changing the design
	accordingly.

	.. \|AnyIterator\| replace:: `AnyIterator<T>`

	__ http://swiftdoc.org/v3.0/type/AnyIterator/

	As you can see, sequence does nothing more than deliver an iterator.
	To understand the need for iterators, it's important to distinguish
	the two kinds of sequences.

	* Volatile sequences like "stream of network packets," carry
	their own traversal state, and are expected to be "consumed" as they
	are traversed.

	* Stable sequences, like arrays, should not be mutated by `for`\
	...\ `in`, and thus require separate traversal state.

	To get an initial traversal state for an arbitrary sequence `x`, Swift
	calls `x.makeIterator()`. The sequence delivers that state, along with
	traversal logic, in the form of an iterator.

	Iterators
	==========

	`for`\ ...\ `in` needs three operations from the iterator:

	* get the current element
	* advance to the next element
	* detect whether there are more elements

	If we literally translate the above into protocol requirements, we get
	something like this::

	protocol NaiveIteratorProtocol {
	typealias Element
	var current() -> Element // get the current element
	mutating func advance() // advance to the next element
	var isExhausted: Bool // detect whether there are more elements
	}

	Such a protocol, though, places a burden on implementors of volatile
	sequences: either the iterator must buffer the current element
	internally so that `current` can repeatedly return the same value, or
	it must trap when `current` is called twice without an intervening
	call to `moveToNext`. Both semantics have a performance cost, and
	the latter unnecessarily adds the possibility of incorrect usage.

	.. sidebar:: `NSEnumerator`

	You might recognize the influence on iterators of the `NSEnumerator` API::

	class NSEnumerator : NSObject {
	func nextObject() -> AnyObject?
	}

	Therefore, Swift's `IteratorProtocol` merges the three operations into one,
	returning `nil` when the iterator is exhausted::

	protocol IteratorProtocol {
	typealias Element
	mutating func next() -> Element?
	}

	Combined with `SequenceType`, we now have everything we need to
	implement a generic `for`\ ...\ `in` loop.

	.. sidebar:: Adding a Buffer

	The use-cases for singly-buffered iterators are rare enough that it
	is not worth complicating `IteratorProtocol`, [#input_iterator]_ but
	support for buffering would fit nicely into the scheme, should it
	prove important::

	public protocol BufferedIteratorProtocol
	: IteratorProtocol {
	var latest: Element? {get}
	}

	The library could easily offer a generic wrapper that adapts any
	`IteratorProtocol` to create a `BufferedIteratorProtocol`::

	/// Add buffering to any IteratorProtocol I
	struct BufferedIterator<I : IteratorProtocol>
	: BufferedIteratorProtocol {

	public init(_ baseIterator: I) {
	self._baseIterator = baseIterator
	}
	public func next() -> Element? {
	latest = _baseIterator.next() ?? latest
	return latest
	}
	public private(set) var latest: I.Element?
	private var _baseIterator: I
	}

	Operating on Sequences Generically
	----------------------------------

	Given an arbitrary `SequenceType`, aside from a simple `for`\ ...\ `in` loop,
	you can do anything that requires reading elements from beginning to
	end. For example::

	// Return an array containing the elements of `source`, with
	// `separator` interposed between each consecutive pair.
	func array<S: SequenceType>(
	_ source: S,
	withSeparator separator: S.Iterator.Element
	) -> [S.Iterator.Element] {
	var result: [S.Iterator.Element] = []
	var iterator = source.makeIterator()
	if let start = iterator.next() {
	result.append(start)
	while let next = iterator.next() {
	result.append(separator)
	result.append(next)
	}
	}
	return result
	}

	let s = String(array("Swift", withSeparator: "\|"))
	print(s) // "S\|w\|i\|f\|t"

	Because sequences may be volatile, though, you can--in general--only
	make a single traversal. This capability is quite enough for many
	languages: the iteration abstractions of Java, C#, Python, and Ruby
	all go about as far as `SequenceType`, and no further. In Swift,
	though, we want to do much more generically. All of the following
	depend on stability that an arbitrary sequence can't provide:

	* Finding a sub-sequence
	* Finding the element that occurs most often
	* Meaningful in-place element mutation (including sorting,
	partitioning, rotations, etc.)

	.. sidebar:: Iterators Should Be Sequences

	In principle, every iterator is a volatile sequence containing
	the elements it has yet to return from `next()`. Therefore, every
	iterator could satisfy the requirements of `SequenceType` by
	simply declaring conformance, and returning `self` from its
	`makeIterator()` method. In fact, if it weren't for `current language
	limitations <rdar://17986597>`_, `IteratorProtocol` would refine
	`SequenceType`, as follows:

	.. parsed-literal::

	protocol IteratorProtocol : SequenceType {
	typealias Element
	mutating func next() -> Element?
	}

	Though we may not currently be able to require that every
	`IteratorProtocol` refines `SequenceType`, most iterators in the
	standard library do conform to `SequenceType`.

	Fortunately, many real sequences are stable. To take advantage of
	that stability in generic code, we'll need another protocol.

	Collections
	===========

	A collection is a stable sequence with addressable "positions,"
	represented by an associated `Index` type::

	protocol CollectionType : SequenceType {
	typealias Index : ForwardIndexType // a position
	subscript(i: Index) -> Iterator.Element {get}

	var startIndex: Index {get}
	var endIndex: Index {get}
	}

	The way we address positions in a collection is a generalization of
	how we interact with arrays: we subscript the collection using its
	`Index` type::

	let ith = c[i]

	An index\ --which must model `ForwardIndexType`\ --is a type with a
	linear series of discrete values that can be compared for equality:

	.. sidebar:: Dictionary Keys

	Although dictionaries overload `subscript` to also operate on keys,
	a `Dictionary`\ 's `Key` type is distinct from its `Index` type.
	Subscripting on an index is expected to offer direct access,
	without introducing overheads like searching or hashing.

	::

	protocol ForwardIndexType : Equatable {
	typealias Distance : SignedIntegerType
	func successor() -> Self
	}

	While one can use `successor()` to create an incremented index value,
	indices are more commonly advanced using an in-place increment
	operator, just as one would when traversing an array: `++i` or `i++`.
	These operators are defined generically, for all models of
	`ForwardIndexType`, in terms of the `successor()` method.

	Every collection has two special indices: a `startIndex` and an
	`endIndex`. In an empty collection, `startIndex == endIndex`.
	Otherwise, `startIndex` addresses the collection's first element, and
	`endIndex` is the successor of an index addressing the collection's
	last element. A collection's `startIndex` and `endIndex` form a
	half-open range containing its elements: while a collection's
	`endIndex` is a valid index value for comparison, it is not a valid
	index for subscripting the collection::

	if c.startIndex != c.endIndex { } // OK
	c[c.endIndex] // Oops! (index out-of-range)

	Mutable Collections
	-------------------

	A mutable collection is a collection that supports in-place element
	mutation. The protocol is a simple refinement of `CollectionType` that adds a
	subscript setter:

	.. parsed-literal::

	protocol MutableCollectionType : CollectionType {
	subscript(i: Index) -> Iterator.Element { get set }
	}

	The `CollectionType` protocol does not require collection to support mutation,
	so it is not possible to tell from the protocol itself whether the order of
	elements in an instance of a type that conforms to `CollectionType` has a
	domain-specific meaning or not. (Note that since elements in collections have
	stable indices, the element order within the collection itself is stable; the
	order sometimes does not have a meaning and is not chosen by the code that uses
	the collection, but by the implementation details of the collection itself.)

	`MutableCollectionType` protocol allows the caller to replace a specific element,
	identified by an index, with another one in the same position. This capability
	essentially allows to rearrange the elements inside the collection in any
	order, thus types that conform to `MutableCollectionType` can represent
	collections with a domain-specific element order (not every instance of a
	`MutableCollectionType` has an interesting order, though).

	Range Replaceable Collections
	-----------------------------

	The `MutableCollectionType` protocol implies only mutation of content, not of
	structure (for example, changing the number of elements). The
	`RangeReplaceableCollectionType` protocol adds the capability to perform
	structural mutation, which in its most general form is expressed as replacing a
	range of elements, denoted by two indices, by elements from a collection with a
	different length.

	::

	public protocol RangeReplaceableCollectionType : MutableCollectionType {
	mutating func replaceSubrange<
	C: CollectionType where C.Iterator.Element == Self.Iterator.Element
	>(
	_ subRange: Range<Index>, with newElements: C
	)
	}


	Index Protocols
	---------------

	As a generalization designed to cover diverse data structures,
	`CollectionType` provides weaker guarantees than arrays do. In
	particular, an arbitrary collection does not necessarily offer
	efficient random access; that property is determined by the protocol
	conformances of its `Index` type.

	Forward indices are the simplest and most general, capturing the
	capabilities of indices into a singly-linked list:

	1. advance to the next position
	2. detect the end position

	Bidirectional indices are a refinement of forward indices that
	additionally support reverse traversal::

	protocol BidirectionalIndexType : ForwardIndexType {
	func predecessor() -> Self
	}

	Indices into a doubly-linked list would be bidirectional, as are the
	indices that address `Character`\ s and `UnicodeScalar`\ s in a
	`String`. Reversing the order of a collection's elements is a simple
	example of a generic algorithm that depends on bidirectional traversal.

	Random access indices have two more requirements: the ability to
	efficiently measure the number of steps between arbitrary indices
	addressing the same collection, and the ability to advance an index by
	a (possibly negative) number of steps::

	public protocol RandomAccessIndexType : BidirectionalIndexType {
	func distance(to other: Self) -> Distance
	func advanced(by n: Distance) -> Self
	}

	From these methods, the standard library derives several other
	features such as `Comparable` conformance, index subtraction, and
	addition/subtraction of integers to/from indices.

	The indices of a `deque
	<https://en.wikipedia.org/wiki/Double-ended_queue>`_ can provide random
	access, as do the indices into `String.UTF16View` (when Foundation is
	loaded) and, of course, array indices. Many common sorting and
	selection algorithms, among others, depend on these capabilities.

	All direct operations on indices are intended to be lightweight, with
	amortized O(1) complexity. In fact, indices into `Dictionary` and
	`Set` could be bidirectional, but are limited to modeling
	`ForwardIndexType` because the APIs of `NSDictionary` and
	`NSSet`--which can act as backing stores of `Dictionary` and `Set`--do
	not efficiently support reverse traversal.

	Conclusion
	==========

	Swift's sequence, collection, and index protocols allow us to write
	general algorithms that apply to a wide variety of series and data
	structures. The system has been both easy to extend, and predictably
	performant. Thanks for taking the tour!

	------

	.. [#input_iterator] This trade-off is not as obvious as it might
	seem. For example, the C# and C++ analogues for `IteratorProtocol`
	(`IEnumerable` and `input iterator`) are saddled with the
	obligation to provide buffering.