| :orphan: |
| |
| .. _ProgramStructureAndCompilationModel: |
| |
| .. highlight:: none |
| |
| Swift Program Structure and Compilation Model |
| ============================================= |
| |
| .. warning:: This is a very early design document discussing the features of |
| a Swift build model and modules system. It should not be taken as a plan of |
| record. |
| |
| Commentary |
| ---------- |
| |
| The C spec only describes things up to translation unit granularity: no |
| discussion of file system layout, build system, linking, runtime concepts of |
| code (dynamic libraries, executables, plugins), dependence between parts of a |
| program, Versioning + SDKs, human factors like management units, etc. It leaves |
| all of this up to implementors to sort out, and we got what the unix world |
| defined in the 60's and 70's with some minor stuff that could be shoehorned into |
| the old unix toolchain model without too much trouble. C also doesn't help with |
| resources (images etc), has a miserable incremental compilation model and many, |
| many, other issues. |
| |
| Swift should strive to make trivial programs really simple. Hello world should |
| just be something like:: |
| |
| print("hello world") |
| |
| while also acknowledging and strongly supporting the real world demands and |
| requirements that library implementors (hey, that's us!) face every day. In |
| particular, note how the language elements (described below) correspond directly |
| to the business and management reality of the world: |
| |
| **Ownership Domain / Top Level Component**: corresponds to a product that is |
| shipped as a unit (Mac OS/X, iWork, Xcode), is a collection of frameworks/dylibs |
| and resources. Only acyclic dependencies between different domains is |
| allowed. There is some correlation in concept here to "umbrella headers" or |
| "dyld shared cache" though it isn't exact. |
| |
| **Namespace**: Organizational structure within a domain, similar to C++ or |
| Java. Programmers can use or abuse them however they wish. |
| |
| **Subcomponent**: corresponds to an individual team or management unit, is one |
| dylib + optional resources. All contributing source files and resources live in |
| one directory (with optional subdirs), and have a single "project file". Can |
| contribute to multiple namespaces. The division of a domain into components is |
| an implementation detail, not something externally visible as API. Can have |
| cyclic dependencies between other components. Components roughly correspond to |
| "xcode project" or "B&I project" granularity at Apple. Can rebuild a "debug |
| version" of a subcomponent and drop it into an app without rebuilding the entire |
| world. |
| |
| **Source File**: Organizational unit within a component. |
| |
| In the trivial hello world example, the source file gets implicitly dropped into |
| a default component (since it doesn't have a component declaration). The default |
| component has settings that corresponds to an executable. As the app grows and |
| wants to start using sub-libraries, the author would have to know about |
| components. This ensures a simple model for new people, because they don't need |
| to know anything about components until they want to define a library and stable |
| APIs. |
| |
| We'll also eventually build tools to do things like: |
| |
| * Inspect and maintain dependence graphs between components and subcomponents. |
| |
| * Diff API [semantically, not "by symbol" like 'nm'] across versions of products |
| |
| * Provide code migration tools, like "rewrite rules" to update clients that use |
| obsoleted and removed API. |
| |
| * Pure swift apps won't be able to use SPI (they just won't build), but mixed |
| swift/C apps could (through the C parts, similar to using things like "extern |
| int Z3fooi(int)" to access C++ mangled symbols from C today). It will be |
| straight-forward to write a binary verifier that cross references the NM |
| output with the manifest file of the components it legitimately depends on. |
| |
| * Lots of other cool stuff I'm sure. |
| |
| Anyway, that's the high-level thoughts and motivation, this is what I'm |
| proposing: |
| |
| Program structure |
| ----------------- |
| |
| Programs and frameworks in swift consist of declarations (functions, variables, |
| types) that are (optionally) defined in possibly nested namespaces, which are |
| nested in a component, which are (optionally) split into |
| subcomponents. Components can also have associated resources like images and |
| plists, as well as code written in C/C++/ObjC. |
| |
| A "**Top Level Component**" (also referred to as "an ownership domain") is a |
| unit of code that is owned by a single organization and is updated (shipped to |
| customers) as a whole. Examples of different top-level components are products |
| like the swift standard libraries, Mac OS/X, iOS, Xcode, iWork, and even small |
| things like a theoretical third-party Perforce plugin to Xcode. |
| |
| Components are explicitly declared, and these declarations can include: |
| |
| * whether the component should be built into a dylib or executable, or is a |
| subcomponent. |
| |
| * the version of the component (which are used for "availability macros" etc) |
| |
| * an explicit list of dependencies on other top-level components (whose |
| dependence graph is required to be acyclic) optionally with specific versions: |
| "I depend on swift standard libs 1.4 or later" |
| |
| * a list of subcomponents that contribute to the component: "mac os consists of |
| appkit, coredata, ..." |
| |
| * a list of resource files and other stuff that makes up the framework |
| |
| * A list of subdirectories to get source files out of (see filesystem layout |
| below) if the component is more that one directory full of code. |
| |
| * A list of any .c/.m/.cpp files that are built and linked into the component, |
| along with build flags etc. |
| |
| Top-Level Components define the top level of the namespace stack. This means |
| everything in the swift libraries are "swift.array.xyz", everything in MacOS/X |
| is "macosx.whatever". Thus you can't have naming conflicts across components. |
| |
| **Namespaces** are for organization within a component, and are left up to the |
| developer to handle however they want. They will work similarly to C++ |
| namespaces and aren't described in detail here. For example, you could have a |
| macosx.coredata namespace that coredata drops all its stuff into. |
| |
| Components can optionally be broken into a set of "**Subcomponents**", which are |
| organizational units within a top-level component. Subcomponents exist to |
| support extremely large components that have multiple different teams |
| contributing to a single large product. Subcomponents are purely an |
| implementation detail of top-level components and have no runtime, |
| naming/namespace, or other externally visible artifacts that persist once the |
| entire domain is built. If version 1.0 of a domain is shipped, version 1.1 can |
| completely reshuffle the internal subcomponent organization without affecting |
| its published API or anything else a client can see. |
| |
| Subcomponents are explicitly declared, and these declarations can include: |
| |
| * The component they belong to. |
| |
| * The set of other (optionally versioned) top-level components they depend on. |
| |
| * The set of components (within the current top-level component) that this |
| subcomponent depends on. This dependence is an acyclic dependence: "core data |
| depends on foundation". |
| |
| * A list of declarations they use within the current top-level component that |
| aren't provided by the subcomponents they explicitly depend on. This is used |
| to handle cyclic dependencies across subcomponents within an ownership domain: |
| for example: "libsystem depends on libcompiler_rt", however, "libcompiler_rt |
| depends on 'func abort();' in libsystem". This preserves the acyclic |
| compilation order across components. |
| |
| * A list of subdirectories to get source files out of (see filesystem layout |
| below) if the component is more that one directory full of code. |
| |
| * A list of any .c/.m/.cpp files that are linked into the component, with build |
| flags. |
| |
| **Source Files** and **Resources** make up a component. Swift source files can |
| include: |
| |
| * The component they belong to. |
| |
| * Import declarations that affect their local scope lookups (similar to java |
| import statements) |
| |
| * A set of declarations of variables, functions, types etc. |
| |
| * C and other language files are just another kind of resource to be built. |
| |
| **Declarations** of variables, functions and types are the meat of the program, |
| and populate source files. Declarations can be scoped to be externally exported |
| from the component (aka API), internal to the component (aka SPI), local to a |
| subcomponent (aka "visibility hidden", the default), or local to the file (aka |
| static). Top-level components also have a simple runtime representation which is |
| used to ensure that reflection only returns API and decls within the current |
| ownership domain: "App's can't get at iOS SPI". |
| |
| **Executable expressions** can also be included at file scope (outside other |
| declarations). This global code is run at startup time (same as static |
| constructors), eliminating the need for "main". This initialization code is |
| correctly run bottom-up in the explicit dependence graph. Order of |
| initialization between multiple cyclicly dependent files within a single |
| component is not defined (and perhaps we can make it be an outright error). |
| |
| File system layout and compiler UI |
| ---------------------------------- |
| |
| The filesystem layout of a component is a directory with at least one .swift |
| file in it that has the same name as the directory. A common case is that the |
| component is a single directory with a bunch of .swift files and resources in |
| it. The "large component" case can break up its source files and resources into |
| subdirectories. |
| |
| Here is the minimal hello world example written as a proper app:: |
| |
| myapp/ |
| myapp.swift |
| |
| You'd compile it like this:: |
| |
| $ swift myapp |
| myapp compiled successfully! |
| |
| or:: |
| |
| $ cd myapp |
| $ swift |
| myapp compiled successfully! |
| |
| and it would produce this filesystem layout:: |
| |
| myapp/ |
| myapp.swift |
| products/ |
| myapp |
| myapp.manifest |
| buildcache/ |
| <stuff> |
| |
| Here is a moderately complicated example of a library:: |
| |
| mylib/ |
| mylib.swift |
| a.swift |
| b.swift |
| UserManual.html |
| subdir/ |
| c.swift |
| d.swift |
| e.png |
| |
| mylib.swift tells the compiler about your sub directories, resources, how to |
| process them, where to put them, etc. After compiling it you'd keep your source |
| files and get:: |
| |
| mylib/ |
| products/ |
| mylib.dylib |
| mylib.manifest |
| e.png |
| docs/ |
| UserManual.html |
| buildcache/ |
| <more stuff> |
| |
| Swift compiler command line is very simple: "swift mylib" is enough for most |
| uses. For more complex use cases we'll support specifying paths to search for |
| components (similar to clang -F or -L) etc. We'll also support a "clean" command |
| that nukes buildcache/ and products/. |
| |
| The BuildCache directory holds object files, dependence information and other |
| stuff needed for incremental [re]builds within the component. The generated |
| manifest file is used by the compiler when a client lib/app import mylib (it |
| contains type information for all the stuff exported from mylib) but also at |
| runtime by the runtime library (e.g. for reflection). It needs to be a |
| fast-to-read but extensible format. |
| |
| What the build system does, how it works |
| ---------------------------------------- |
| |
| Assuming that we're starting with an empty build cache, the build system starts |
| by parsing the mylib.swift file (the main file for the directory). This file |
| contains the component declaration. If this is a subcomponent, the subcomponent |
| declares which super-component it is in (in which case, the super-component info |
| is loaded). In either case, the compiler verifies that all of the depended-on |
| components are built, if not, it goes off and recursively builds them before |
| handling this one: the component dependence graph is acyclic, and cycles are |
| diagnosed here. |
| |
| If this directory is a subcomponent (as opposed to a top-level component), the |
| subcomponent declaration has already been read. If this subcomponent depends on |
| any other components that are not up-to-date, those are recursively |
| rebuilt. Explicit subcomponent dependencies are acyclic and cycles are diagnosed |
| here. Now all depended-on top-level components and subcomponents are built. |
| |
| Now the compiler parses each swift file into an AST. We'll keep the swift |
| grammar carefully factored to keep types and values distinct, so it is possible |
| to parse (but not fully typecheck) the files without first reading "all the |
| headers they depend on". This is important because we want to allow arbitrary |
| type and value cyclic dependencies between files in a component. As each file is |
| parsed, the compiler resolves as many intra-file references as it can, and ends |
| up with a list of (namespace qualified) types and values that are imported by |
| the file that are not satisfied by other components. This is the list of things |
| the file requires that some other files in the component provide. |
| |
| Now that the compiler has the full set of dependence information between files |
| in a component, it processes the files in strongly connected component (SCC) |
| order processing an SCC of dependent files at a time. Given the entire SCC it is |
| able to resolve values and types across the files (without needing prototypes) |
| and complete type checking. Assuming type checking is successful (no errors) it |
| generates code for each file in the SCC, emits a .o file for them, and emits |
| some extra metadata to accelerate incremental builds. If there are .c files in |
| the component, they are compiled to .o files now (they are also described in the |
| component declaration). |
| |
| Once all of the source files are compiled into .o files, they are linked into a |
| final linked image (dylib or executable). At this point, a couple of other |
| random things are done: 1) metadata is checked to ensure that any explicitly |
| declared cyclic dependencies match the given and actual prototype. 2) resources |
| are copied or processed into the product directory. 3) the explicit dependence |
| graph is verified, extraneous edges are warned about, missing edges are errors. |
| |
| In terms of implementation, this should be relatively straight-forward, and is |
| carefully layered to be memory efficient (e.g. only processing an SCC at a time |
| instead of an entire component) as well as highly parallel for multicore |
| machines. For incremental builds, we will have a huge win because the |
| fine-grained dependence information between .o files is tracked and we know |
| exactly what dependencies to rebuild if anything changes. The build cache will |
| accelerate most of this, which will eventually be a hybrid on-disk/in-memory |
| data structure. |
| |
| The build system should be scalable enough for B&I to eventually do a "swift |
| macos" and have it do a full incremental (and parallel) build of something the |
| scale of Mac OS. Actually implementing this will obviously be a big project that |
| can happen as the installed base of swift code grows. |
| |
| SDKs |
| ---- |
| |
| The manifest file generated as a build product describes (among other things) |
| the full list of decls exported by the top-level component (which includes their |
| type information, not just symbol names). This manifest file is used when a |
| client builds against the component to type check the client and ensure that its |
| references are resolved. |
| |
| Because we have the version number as well as the full interface to the |
| component available in a consumable format is that we can build an SDK generation |
| tool. This tool would take manifest files for a set of releases (e.g. iOS 4.0, |
| 4.0.1, 4.0.2, 4.1, 4.1.1, 4.2) and build a single SDK manifest which would have |
| a mapping from symbol+type -> version list that indicates what the versions a |
| given symbol are available in. This means that framework authors don't have to |
| worry about availability macros etc, it just naturally falls out of the system. |
| |
| This tool can also produce warnings/errors about cases where API is in version N |
| but removed in version N+1, or when some declaration has an invalid change |
| (e.g. an argument added or something else "fragile"). Blue sky idea: We could |
| conceivable extend it so that the SDK manifest file contains rewrite rules for |
| obsolete APIs that the compiler could automatically apply to upgrade user's |
| source code. |
| |
| Future optimization opportunities |
| --------------------------------- |
| |
| The system has been carefully designed to allow fast builds at -O0 (including |
| keeping cached dependence information and the compiler around in memory "across |
| builds"), allowing a very incremental compilation model and allowing carefully |
| limited/understood cyclic dependencies across components. However, we also care |
| about really fast runtime performance (better than our current system), and we |
| should be able to get that as well. |
| |
| There are several different possibilities to look at in the future: |
| |
| 1. Components are a natural unit to do "link time" optimization. Since the |
| entire thing is shipped as a unit, we know that it is safe to inline |
| functions and analyze side effects within the bounds of the component. This |
| current LTO model should scale to the component level, but we'd need new |
| (more scalable/parallel and memory efficient) approaches to optimize across |
| the entire mac os product. Processing components bottom-up within a large |
| component allows efficient context sensitive (and summary-based) analyzes, |
| like mod/ref, interprocedural constant prop, inlining, and nocapture |
| propagation. I expect nocapture to be specifically important to get stuff on |
| the stack instead of causing them to get promoted to the heap all the time. |
| |
| 2. The dyld shared cache can be seen as an optimization across components within |
| the mac os top-level component. Though it has the capability to include third |
| party and other dylibs, in practice it is rooted from a few key apps, so it |
| doesn't get "everything" in macos and it isn't used for other stuff (like |
| xcode). The proposed (but never implemented) "per-app shared cache" is a |
| straight-forward extension if this were based on optimizing across |
| components. |
| |
| 3. There are a bunch of optimizations to take advantage of known fragility |
| levels for devirtualization, inlining, and other stuff that I'm not going to |
| describe here. Generalization of DaveZ's positive/negative ivar/vtable idea. |
| |
| 4. The low level tools are already factored to be mostly object file format |
| independent. There is no reason that we need to keep using actual macho .o |
| files if it turns out to be inconvenient. We obviously must keep around macho |
| executables and dylibs. |