blob: 269222988d1277168179385a83d7f61fe16d0a79 [file] [log] [blame]
:orphan:
==============================
Interaction with Objective-C
==============================
:Authors: John McCall
I propose some elementary semantics and limitations when exposing
Swift code to Objective-C and vice-versa.
Dynamism in Objective-C
=======================
Objective-C intentionally defines its semantics almost entirely around
its implementation model. The implementation supports two basic
operations that you can perform on any object:
Instance variable access
------------------------
An access to an ivar ``foo`` is type-checked by looking for a declared
ivar with the name ``foo`` in the static type of the base object. To
succeed, that search must find an ivar from some class ``C``.
It is undefined behavior if, at runtime, the base expression does not
evaluate to a valid object of type ``C`` or some subclass thereof.
This restriction is necessary to allow the compiler to compute the
address of the ivar at (relatively) minimal expense.
Because of that restriction, ivar accesses are generally seen by users
as "primitive", and we probably have some additional flexibility about
optimizing them. For example, we could probably impose a rule
insisting that the entire static type of the base be accurate, thus
forbidding a user from accessing an ivar using a pointer to an
incorrect subclass of ``C``. This would theoretically allow stronger
alias analysis: for example, if we have ``D *d;`` and ``E* e;`` where
those types are subclasses of ``C``, we would be able to prove that
``d->foo`` does not alias ``e->foo``. However, in practice, ObjC
ivars tend to be directly accessed only within the implementation of
the class which defines them, which means that the stronger rule would
basically never kick in.
Message sends
-------------
A message send is type-checked by looking for a declared method with
that selector in the static type of the receiver, or, if the receiver
has type ``id`` or ``Class``, in any known class in the translation
unit. The arguments and expression result are then determined by that
method declaration.
At runtime, if the receiver is actually ``nil``, the message send
returns immediately with a zero-initialized result; otherwise the
runtime searches for a method implementation (and, sometimes, an
alternate receiver) via a somewhat elaborate algorithm:
* The runtime first searches the method tables of the object for a
method with that selector, starting from the object's dynamic
class and proceeding up the hierarchy.
* If that search fails, the runtime gives the class an opportunity
to dynamically resolve the method by adding it to the method lists
of one of the classes in the hierarchy.
* If dynamic resolution fails, the runtime invokes the forwarding
handler; the standard handler installed by CoreFoundation follows
a complex rule that needn't be described here other than to note
that it can involve actually sending the message to a completely
different object if the receiver agrees.
It is undefined behavior if the signature of the method implementation
actually found is not compatible with the signature of the method
against which the message send was type-checked. This restriction is
necessary in order to avoid the need to perform dynamic signature
checking and automatic implicit conversion on arguments, which would
substantially slow down the method-call mechanism.
Note that common practice requires this sense of "compatible" to be
much looser than C's. For example, ``performSelector`` is documented
as expecting the target method to return an object reference, but it
is regularly used to invoke methods that actually return ``void``.
Otherwise, the language provides no guarantees which would allow the
compiler to reason accurately about what the invoked method will
actually do. Moreover, this is not just a formal oversight; there is
quite a bit of code relying on the ability to break each of these
non-guarantees. We can classify them into five categories:
* The language does not guarantee that the static type of an object
reference will be accurate. Just because a value is typed ``C*``
does not mean it is dynamically a reference to a ``C`` object.
Proxy objects are frequently passed around as if they were values
of their target class. (Some people would argue that users
should only proxy protocols, not entire concrete classes; but
that's not universally followed.)
* The language does not guarantee that the complete class hierarchy
is statically knowable. Even ignoring the limitations of the C
compilation model, it is possible to dynamically construct classes
with new methods that may override the declared behavior of the
class. Core Data relies on this kind of dynamic class generation.
* The language does not guarantee that an object's dynamic class
will be the type that a user actually appeared to allocate. Even
ignoring proxies, it is relatively common for a factory method on
a class to return an instance of a subclass.
* The language does not guarantee that an object's dynamic class
will not change. KVO works by changing an object's dynamic class
to a dynamically-generated subclass with new methods that replace
the observed accessors.
However, it is probably reasonable to call it undefined behavior
to change a dynamic class in a way that would add or remove ivars.
* The language does not guarantee that a class will remain constant.
Loading a new dynamic library may introduce categories that add
and replace methods on existing classes, and the runtime provides
public functions to do the same. These features are often used to
introduce dynamic instrumentation, for example when debugging.
All of these constraints combined --- actually, many of them
individually --- make devirtualization completely impossible in
Objective-C [1]_.
.. [1] Not completely. We could optimistically apply techniques
typically used in dynamic language implementations. For
example, we could directly call an expected method body, then
guard that call with a check of the actual dispatched
implementation against the expected method pointer. But since
the inlined code would necessarily be one side of a "diamond"
in the CFG, and the branches in that diamond would
overwhelmingly be unthreadable, it is not clear that the
optimization would gain much, and it would significantly bloat
the call.
Devirtualization in Swift
=========================
Method devirtualization [2]_ is likely to be a critically important
optimization in Swift.
.. [2] In contrast to generic or existential devirtualization, which
are also important, but which aren't affected by the Objective-C
interoperation model.
A Missing Optimization
----------------------
For one, it is an important missing optimization even in Objective-C.
Any program that tries to separate its concerns will usually introduce
some extra abstraction in its formal model. For example:
* A class might provide multiple convenience initializers that all
delegate to each other so that all initialization will flow
through a single point.
* A large operation might be simpler to reason about when split into
several smaller methods.
* A property might be abstracted behind a getter/setter to make it
easier to change the representation (or do additional work on set)
later.
In each of the examples, the user has made a totally reasonable
decision about code organization and reserved flexibility, and
Objective-C proceeds to introduce unnecessary runtime costs which
might force a performance-sensitive programmer to choose a different
path.
Swift-Specific Concerns
-----------------------
The lack of devirtualization would hit Swift much harder because of
its property model. With a synthesized property, Objective-C provides
a way to either call the getter/setter (with dot syntax) or directly
access the underlying ivar (with arrow syntax). By design, Swift
hides that difference, and the abstract language model is that all
accesses go through a getter or setter.
Using a getter or setter instead of a direct access is a major
regression for several reasons. The first is the direct one: the
generated code must call a function, which prevents the compiler from
keep values live in the most efficient way, and which inhibits most
compiler analyses. The second is a by-product of value types: if a
value is read, modified, and then written back, the modification will
take place on the temporary copy, forcing a copy-on-write. Any
proposal to improve on that relies on having a richer API for the
access than merely a getter/setter pair, which cannot be guaranteed.
For properties of a value type, this isn't a performance problem,
because we can simply look at the implementation (ignoring resilience
for now) and determine whether we can access the property directly.
But for properties of a class type, polymorphism requires us to
defensively handle the possibility that a subclass might add arbitrary
logic to either the getter or setter. If our implementation model
is as unrestricted as Objective-C's, that's a serious problem.
I think that this is such a massive regression from Objective-C that
we have to address it.
Requirements for Devirtualization
---------------------------------
There are several different ways to achieve devirtualization, each
with its own specific requirements. But they all rely on a common
guarantee: we must remove or constrain the ability to dynamically
add and replace method implementations.
Restricting Method Replacement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are two supported ways to add or replace methods in Objective-C.
The first is via the runtime API. If we do have to support doing this
to replace Swift methods --- and we should try to avoid that --- then
I think restricting it to require a ``@dynamic`` annotation on the
replaceable method (or its lexical context) is reasonable. We should
try to get the Objective-C runtime to complain about attempts to
replace non-dynamic methods.
The second is via categories. It's generally understood that a
category replacing an existing method implementation is "rude"
.. The rest of this doesn't seem to be coherent and isn't parseable as
ReST, which breaks the build
It's arguable whether we should
even support that at all. If we do, I think that restricting it to
require some sort of ``@dynamic`` annotation on the replaceable method
(or its lexical context) is much of a problem.
is a I don't think that restricting this is
actually a serious problem, if we . We can have some sort of
``@dynamic`` annotation for
I don't think that requiring some sort of ``@dynamic``
That one, central restriction is that we must remove or constrain the
ability to dynamically add and replace method implementations on
existing classes. It's reasonable to request some sort of ``@dynamic``
annotation for cases where this is absolutely required.
One interesting corner case
don't think anybody will weep too heavily if we scale back those ObjC
runtime functions to say that either you can't use them on Swift classes
or
restriction: removing the general ability to dynamically add and
replace method implementations on an existing class.
There's a tension here.
Point of Allocation
~~~~~~~~~~~~~~~~~~~
If we can see the true point of allocation of an object, then we know
its dynamic class at that point. However:
Note that the true point of allocation is not the point at which we
call some factory method on a specific type; it has to be an actual
allocation: an ``alloc_ref`` SIL instruction. And it's questionable
whether an ObjC allocation counts, because those sometimes don't
return what you might expect.
Once you know the dynamic class at a particular point, you can
devirtualize calls if:
* There is no supported way to replace a function implementation.
``+[NSManagedObject alloc]`` does this).
We can reason forward from the point of allocation.
If we can see that an object was allocated with ``alloc_object``,
then we know the dynamic class at that point. That's relatively
easy to deal with.
*
If we can restrict the ability to
change the dynamic class, or at least restrict
Access Control
--------------
Swift does give us one big tool for devirtualization that Objective-C
lacks: access control. In Swift, access control determines
visibility, and it doesn't make sense to override something that you
can't see. Therefore:
* A declaration which is private to a file can only be overridden
within that file.
* A declaration which is private to a module can only be overridden
with that module.
* A public declaration can be overridden anywhere [3]_.
.. [3] We've talked about having two levels of public access control
for classes, so that you have to opt in to subclassability. I
still think this is a good idea.
This means that a private stored property can always be
"devirtualized" into a direct access [4]_. Unfortunately, ``private``
is not the default access control: module-private is. And if the
current module can contain Objective-C code, then even that raises the
question of what ObjC interop actually means.
.. [4] Assuming we don't introduce a supported way of dynamically
replacing the implementation of a private Swift method!
Using Swift Classes from Objective-C
====================================
open the question of
Because we intentionally hide the
difference between a stored property and its underlying storage,
For another example, code
class might access
In both cases, t makes sense to organize the code that way,
but Objective-C punishes the performance of that code in order to
reserve the language's
provide some abstractions which are unnecessary in the current
implementation. For example, a class might have multiple
initializers, each chaining to another, so that it can be conveniently
constructed with any set of arguments but any special initialization
logic can go in one place.
Reserving that
flexibility in the code is often good sense, and reserving it across
API boundaries is good language design, but it's silly
not actually
Well-factored object-oriented code often contains a large number of
abstractions that improve the organization of the code or make it
easier to later extend or maintain, but serve no current purpose.
In
typical object-oriented code, many operations are split into several
small methods in order to improve code organization and reserve the
ability to
Conscientious developers
runtime calls
cached-offset calculation for the ivar location.
restriction, there's general acceptance that the
is necessary to make ivar
accesses not ridiculously expensive. Because of that, there's general
acceptance that
If the compiler team cared, we could *probably* convince people to
accept a language rule that requires the entire static type to be
accurate, so that e.g.
Otherwise Objective-C provides no restrictions beyond those imposed by
the source language.
ivar is accessed on an object reference which is ``nil`` or otherwise
not a valid object of the class
* You can send an object reference a message.
* If the reference is actually ``nil``, the message send returns
immediately with a zero-initialized result.
* Otherwise, the runtime searches the class hierarchy of the object,
starting from the object's dynamic class and proceeding to
superclasses, looking for a concrete method matching the selector
of the message. If a concrete method is found, it is called.
* If the class hierarchy contains no such method, a different method
is invoked on the class to give it an opportunity to dynamically
resolve the method; if taken, the process repeats, but skipping
this step the second time.
* Otherwise, the global forwarding handler is invoked. On Darwin,
this is set up by CoreFoundation, and it follows a complicated
protocol which relies on NSInvocation.
It is undefined behavior if the type signature of the method
implementation ultimately found is not compatible with the type
signature of the method that was used to statically type-check the
message send. This includes semantic annotations like ARC ownership
conventions and the ``noreturn`` attribute. Otherwise, there are no
semantic restrictions on what any particular method can do.
signature of the method implementation's
pr signature is not compatible with the signature at which the
method was invoked.
, in which case the runtime searches
the class hierarchy of the object, from most to least derived,
and calls the method
In Objective-C, every object has a class and every class has a
collection of methods. The high-level semantics are essentially
those
.. nonsense ReST
class is essentially a hashtable of selectors to
We propose a new attribute, ``@public``, that can adorn any
declaration not local to a function. For the purpose of standard
library development, even just parsing this attribute without
implementing semantics would be extremely useful in the near term.
Basic Semantics
===============
``@public`` makes a declaration visible in code where the enclosing
module is imported. So, given this declaration in the ``Satchel``
module::
@public struct Bag<T> : ... {
...
}
We could write, in any other module, ::
import Satchel
typealias SwingMe = Bag<Cat>
The difference from the status quo being that without ``@public`` on
the declaration of ``Bag``, the use of ``Bag`` above would be
ill-formed.
Type-Checking
=============
The types of all parameters and the return type of a func marked
``@public`` (including the implicit ``self`` of methods) must also be
``@public``.
All parameters to a ``func`` marked ``@public`` (including the
implicit ``self`` of methods) must also be ``@public``::
struct X {} // not @public
@public struct Y {}
func f(_: X) {} // OK; also not @public
@public func g(_: Y) {} // OK; uses only @public types
@public func h(_: X, _: Y) {} // Ill-formed; non-public X in public signature
A ``typealias`` marked ``@public`` must refer to a type marked
``@public``::
typealias XX = X // OK; not @public
@public typealias YY = Y // OK; Y is @public
@public typealias XXX = X // Ill-formed; public typealias refers to non-public type
There is a straightforward and obvious rule for composing the
``@public``\ -ness of any compound type, including function types,
tuple types and instances of generic types: The compound type is
public if and only if all of the component types, are ``@public`` and
either defined in this module or re-exported from this module.
Enums
=====
The cases of an ``enum`` are ``@public`` if and only if the ``enum``
is declared ``@public``.
Derived Classes
===============
A method that overrides an ``@public`` method must be declared
``@public``, even if the enclosing class is non-``@public``.
Protocols
=========
A ``@public`` protocol can have ``@public`` and non-``@public``
requirements. ``@public`` requirements can only be satisfied by
``@public`` declarations. Non-``@public`` requirements can be
satisfied by ``@public`` or non-``@public`` declarations.
Conformances
============
The conformance of a type to a protocol is ``@public`` if that
conformance is part of an ``@public`` declaration. The program is
ill-formed if any declaration required to satisfy a ``@public``
conformance is not also declared ``@public``.::
@public protocol P {
@public func f() { g() }
func g()
}
struct X : P { // OK, X is not @public, so neither is its
func f() {} // conformance to P, and therefore f
func g() {} // can be non-@public
}
protocol P1 {}
@public struct Y : P1 {} // Y is @public so its
// conformance to P1 is, too.
@public
extension Y : P { // This extension is @public, so
@public func f() {} // Y's conformance to P is also, and
func g() {} // thus f must be @public too
}
protocol P2 {}
extension Y : P2 {} // Y's conformance to P2 is non-@public
.. Note:: It's our expectation that in the near term, and probably for
v1.0, non-``@public`` conformances on ``@public`` types will be
diagnosed as ill-formed/unsupported.
A Related Naming Change
=======================
The existing ``@exported`` attribute for imports should be renamed
``@public`` with no change in functionality.
Future Directions
=================
Some obvious directions to go in this feature space, which we are not
proposing today, but with which we tried to make this proposal
compatible:
* non-``@public`` conformances
* file-private accessibility
* explicit non-``@public`` overrides, e.g. ``@!public``