blob: c7cbf7b90022fc7cb0c0ee966a7b5e82dbe69078 [file] [log] [blame]
:orphan:
Text Formatting in Swift
========================
:Author: Dave Abrahams
:Author: Chris Lattner
:Author: Dave Zarzycki
:Date: 2013-08-12
.. contents:: Index
**Abstract:** We propose a system for creating textual representations
of Swift objects. Our system unifies conversion to ``String``, string
interpolation, printing, and representation in the REPL and debugger.
Scope
-----
Goals
.....
* The REPL and LLDB ("debuggers") share formatting logic
* All types are "debug-printable" automatically
* Making a type "printable for humans" is super-easy
* ``toString()``-ability is a consequence of printability.
* Customizing a type's printed representations is super-easy
* Format variations such as numeric radix are explicit and readable
* Large textual representations do not (necessarily) ever need to be
stored in memory, e.g. if they're being streamed into a file or over
a remote-debugging channel.
Non-Goals
.........
.. sidebar:: Rationale
Localization (including single-locale linguistic processing such as
what's found in Clang's diagnostics subsystem) is the only major
application we can think of for dynamically-constructed format
strings, [#dynamic]_ and is certainly the most important consumer of
that feature. Therefore, localization and dynamic format strings
should be designed together, and *under this proposal* the only
format strings are string literals containing interpolations
("``\(...)``"). Cocoa programmers can still use Cocoa localization
APIs for localization jobs.
In Swift, only the most common cases need to be very terse.
Anything "fancy" can afford to be a bit more verbose. If and when
we address localization and design a full-featured dynamic string
formatter, it may make sense to incorporate features of ``printf``
into the design.
* **Localization** issues such as pluralizing and argument
presentation order are beyond the scope of this proposal.
* **Dynamic format strings** are beyond the scope of this proposal.
* **Matching the terseness of C**\ 's ``printf`` is a non-goal.
CustomStringConvertible Types
-----------------------------
``CustomStringConvertible`` types can be used in string literal interpolations,
printed with ``print(x)``, and can be converted to ``String`` with
``x.toString()``.
The simple extension story for beginners is as follows:
"To make your type ``CustomStringConvertible``, simply declare conformance to
``CustomStringConvertible``::
extension Person : CustomStringConvertible {}
and it will have the same printed representation you see in the
interpreter (REPL). To customize the representation, give your type
a ``func format()`` that returns a ``String``::
extension Person : CustomStringConvertible {
func format() -> String {
return "\(lastName), \(firstName)"
}
}
The formatting protocols described below allow more efficient and
flexible formatting as a natural extension of this simple story.
Formatting Variants
-------------------
``CustomStringConvertible`` types with parameterized textual representations
(e.g. number types) *additionally* support a ``format(...)`` method
parameterized according to that type's axes of variability::
print(offset)
print(offset.format()) // equivalent to previous line
print(offset.format(radix: 16, width: 5, precision: 3))
Although ``format(...)`` is intended to provide the most general
interface, specialized formatting interfaces are also possible::
print(offset.hex())
Design Details
--------------
Output Streams
..............
The most fundamental part of this design is ``TextOutputStream``, a thing
into which we can stream text: [#character1]_
::
protocol TextOutputStream {
func append(_ text: String)
}
Every ``String`` can be used as an ``TextOutputStream`` directly::
extension String : TextOutputStream {
func append(_ text: String)
}
Debug Printing
..............
Via compiler magic, *everything* conforms to the ``CustomDebugStringConvertible``
protocol. To change the debug representation for a type, you don't
need to declare conformance: simply give the type a ``debugFormat()``::
/// A thing that can be printed in the REPL and the Debugger
protocol CustomDebugStringConvertible {
typealias DebugRepresentation : TextOutputStreamable = String
/// Produce a textual representation for the REPL and
/// Debugger.
func debugFormat() -> DebugRepresentation
}
Because ``String`` is a ``TextOutputStreamable``, your implementation of
``debugFormat`` can just return a ``String``. If want to write
directly to the ``TextOutputStream`` for efficiency reasons,
(e.g. if your representation is huge), you can return a custom
``DebugRepresentation`` type.
.. Admonition:: Guideline
Producing a representation that can be consumed by the REPL
and LLDB to produce an equivalent object is strongly encouraged
where possible! For example, ``String.debugFormat()`` produces
a representation starting and ending with "``"``", where special
characters are escaped, etc. A ``struct Point { var x, y: Int }``
might be represented as "``Point(x: 3, y: 5)``".
(Non-Debug) Printing
....................
The ``CustomStringConvertible`` protocol provides a "pretty" textual representation
that can be distinct from the debug format. For example, when ``s``
is a ``String``, ``s.format()`` returns the string itself,
without quoting.
Conformance to ``CustomStringConvertible`` is explicit, but if you want to use the
``debugFormat()`` results for your type's ``format()``, all you
need to do is declare conformance to ``CustomStringConvertible``; there's nothing to
implement::
/// A thing that can be print()ed and toString()ed.
protocol CustomStringConvertible : CustomDebugStringConvertible {
typealias PrintRepresentation : TextOutputStreamable = DebugRepresentation
/// produce a "pretty" textual representation.
///
/// In general you can return a String here, but if you need more
/// control, return a custom TextOutputStreamable type
func format() -> PrintRepresentation {
return debugFormat()
}
/// Simply convert to String
///
/// You'll never want to reimplement this
func toString() -> String {
var result: String
self.format().write(result)
return result
}
}
``TextOutputStreamable``
........................
Because it's not always efficient to construct a ``String``
representation before writing an object to a stream, we provide a
``TextOutputStreamable`` protocol, for types that can write themselves into an
``TextOutputStream``. Every ``TextOutputStreamable`` is also a
``CustomStringConvertible``, naturally::
protocol TextOutputStreamable : CustomStringConvertible {
func writeTo<T: TextOutputStream>(_ target: [inout] T)
// You'll never want to reimplement this
func format() -> PrintRepresentation {
return this
}
}
How ``String`` Fits In
......................
``String``\ 's ``debugFormat()`` yields a ``TextOutputStreamable`` that
adds surrounding quotes and escapes special characters::
extension String : CustomDebugStringConvertible {
func debugFormat() -> EscapedStringRepresentation {
return EscapedStringRepresentation(self)
}
}
struct EscapedStringRepresentation : TextOutputStreamable {
var _value: String
func writeTo<T: TextOutputStream>(_ target: [inout] T) {
target.append("\"")
for c in _value {
target.append(c.escape())
}
target.append("\"")
}
}
Besides modeling ``TextOutputStream``, ``String`` also conforms to
``TextOutputStreamable``::
extension String : TextOutputStreamable {
func writeTo<T: TextOutputStream>(_ target: [inout] T) {
target.append(self) // Append yourself to the stream
}
func format() -> String {
return this
}
}
This conformance allows *most* formatting code to be written entirely
in terms of ``String``, simplifying usage. Types with other needs can
expose lazy representations like ``EscapedStringRepresentation``
above.
Extended Formatting Example
---------------------------
The following code is a scaled-down version of the formatting code
used for ``Int``. It represents an example of how a relatively
complicated ``format(...)`` might be written::
protocol CustomStringConvertibleInteger
: ExpressibleByIntegerLiteral, Comparable, SignedNumber, CustomStringConvertible {
func %(lhs: Self, rhs: Self) -> Self
func /(lhs: Self, rhs: Self) -> Self
constructor(x: Int)
func toInt() -> Int
func format(_ radix: Int = 10, fill: String = " ", width: Int = 0)
-> RadixFormat<This> {
return RadixFormat(this, radix: radix, fill: fill, width: width)
}
}
struct RadixFormat<T: CustomStringConvertibleInteger> : TextOutputStreamable {
var value: T, radix = 10, fill = " ", width = 0
func writeTo<S: TextOutputStream>(_ target: [inout] S) {
_writeSigned(value, &target)
}
// Write the given positive value to stream
func _writePositive<T:CustomStringConvertibleInteger, S: TextOutputStream>(
_ value: T, stream: [inout] S
) -> Int {
if value == 0 { return 0 }
var radix: T = T.fromInt(self.radix)
var rest: T = value / radix
var nDigits = _writePositive(rest, &stream)
var digit = UInt32((value % radix).toInt())
var baseCharOrd : UInt32 = digit <= 9 ? '0'.value : 'A'.value - 10
stream.append(String(UnicodeScalar(baseCharOrd + digit)))
return nDigits + 1
}
func _writeSigned<T:CustomStringConvertibleInteger, S: TextOutputStream>(
_ value: T, target: [inout] S
) {
var width = 0
var result = ""
if value == 0 {
result = "0"
++width
}
else {
var absVal = abs(value)
if (value < 0) {
target.append("-")
++width
}
width += _writePositive(absVal, &result)
}
while width < width {
++width
target.append(fill)
}
target.append(result)
}
}
extension Int : CustomStringConvertibleInteger {
func toInt() -> Int { return this }
}
Possible Extensions (a.k.a. Complications)
------------------------------------------
We are not proposing these extensions. Since we have given them
considerable thought, they are included here for completeness and to
ensure our proposed design doesn't rule out important directions of
evolution.
``TextOutputStream`` Adapters
.............................
Most text transformations can be expressed as adapters over generic
``TextOutputStream``\ s. For example, it's easy to imagine an upcasing
adapter that transforms its input to upper case before writing it to
an underlying stream::
struct UpperStream<UnderlyingStream:TextOutputStream> : TextOutputStream {
func append(_ x: String) { base.append(x.toUpper()) }
var base: UnderlyingStream
}
However, upcasing is a trivial example: many such transformations--such
as ``trim()`` or regex replacement--are stateful, which implies some
way of indicating "end of input" so that buffered state can be
processed and written to the underlying stream:
.. parsed-literal::
struct TrimStream<UnderlyingStream:TextOutputStream> : TextOutputStream {
func append(_ x: String) { ... }
**func close() { ... }**
var base: UnderlyingStream
var bufferedWhitespace: String
}
This makes general ``TextOutputStream`` adapters more complicated to write
and use than ordinary ``TextOutputStream``\ s.
``TextOutputStreamable`` Adapters
.................................
For every conceivable ``TextOutputStream`` adaptor there's a corresponding
``TextOutputStreamable`` adaptor. For example::
struct UpperStreamable<UnderlyingStreamable : TextOutputStreamable> {
var base: UnderlyingStreamable
func writeTo<T: TextOutputStream>(_ target: [inout] T) {
var adaptedStream = UpperStream(target)
self.base.writeTo(&adaptedStream)
target = adaptedStream.base
}
}
Then, we could extend ``TextOutputStreamable`` as follows::
extension TextOutputStreamable {
typealias Upcased : TextOutputStreamable = UpperStreamable<This>
func toUpper() -> UpperStreamable<This> {
return Upcased(self)
}
}
and, finally, we'd be able to write:
.. parsed-literal::
print(n.format(radix:16)\ **.toUpper()**)
The complexity of this back-and-forth adapter dance is daunting, and
might well be better handled in the language once we have some formal
model--such as coroutines--of inversion-of-control. We think it makes
more sense to build the important transformations directly into
``format()`` methods, allowing, e.g.:
.. parsed-literal::
print(n.format(radix:16, **case:.upper**))
Possible Simplifications
------------------------
One obvious simplification might be to fearlessly use ``String`` as
the universal textual representation type, rather than having a
separate ``TextOutputStreamable`` protocol that doesn't necessarily
create a fully-stored representation. This approach would trade some
efficiency for considerable design simplicity. It is reasonable to
ask whether the efficiency cost would be significant in real cases,
and the truth is that we don't have enough information to know. At
least until we do, we opt not to trade away any CPU, memory, and
power.
If we were willing to say that only ``class``\ es can conform to
``TextOutputStream``, we could eliminate the explicit ``[inout]`` where
``TextOutputStream``\ s are passed around. Then, we'd simply need a
``class StringStream`` for creating ``String`` representations. It
would also make ``TextOutputStream`` adapters a *bit* simpler to use
because you'd never need to "write back" explicitly onto the target
stream. However, stateful ``TextOutputStream`` adapters would still need a
``close()`` method, which makes a perfect place to return a copy of
the underlying stream, which can then be "written back":
.. parsed-literal::
struct AdaptedStreamable<T : TextOutputStreamable> {
...
func writeTo<Target: TextOutputStream>(_ target: [inout] Target) {
// create the stream that transforms the representation
var adaptedTarget = adapt(target, adapter);
// write the Base object to the target stream
base.writeTo(&adaptedTarget)
// Flush the adapted stream and, in case Target is a value type,
// write its new value
**target = adaptedTarget.close()**
}
...
}
We think anyone writing such adapters can handle the need for explicit
write-back, and the ability to use ``String`` as an ``TextOutputStream``
without additionally allocating a ``StringStream`` on the heap seems
to tip the balance in favor of the current design.
--------
.. [#format] Whether ``format(...)`` is to be a real protocol or merely
an ad-hoc convention is TBD. So far, there's no obvious use for a
generic ``format`` with arguments that depend on the type being
formatted, so an ad-hoc convention would be just fine.
.. [#character1] We don't support streaming individual code points
directly because it's possible to create invalid sequences of code
points. For any code point that, on its own, represents a valid
``Character`` (a.k.a. Unicode `extended grapheme cluster`__), it is
trivial and inexpensive to create a ``String``. For more
information on the relationship between ``String`` and
``Character`` see the (forthcoming, as of this writing) document
*Swift Strings State of the Union*.
__ http://www.unicode.org/glossary/#extended_grapheme_cluster
.. [#dynamic] In fact it's possible to imagine a workable system for
localization that does away with dynamic format strings altogether,
so that all format strings are fully statically-checked and some of
the same formatting primitives can be used by localizers as by
fully-privileged Swift programmers. This approach would involve
compiling/JIT-ing localizations into dynamically-loaded modules.
In any case, that will wait until we have native Swift dylibs.