| =============================== |
| ORC Design and Implementation |
| =============================== |
| |
| Introduction |
| ============ |
| |
| This document aims to provide a high-level overview of the design and |
| implementation of the ORC JIT APIs. Except where otherwise stated, all |
| discussion applies to the design of the APIs as of LLVM verison 9 (ORCv2). |
| |
| .. contents:: |
| :local: |
| |
| Use-cases |
| ========= |
| |
| ORC provides a modular API for building JIT compilers. There are a range |
| of use cases for such an API: |
| |
| 1. The LLVM tutorials use a simple ORC-based JIT class to execute expressions |
| compiled from a toy languge: Kaleidoscope. |
| |
| 2. The LLVM debugger, LLDB, uses a cross-compiling JIT for expression |
| evaluation. In this use case, cross compilation allows expressions compiled |
| in the debugger process to be executed on the debug target process, which may |
| be on a different device/architecture. |
| |
| 3. In high-performance JITs (e.g. JVMs, Julia) that want to make use of LLVM's |
| optimizations within an existing JIT infrastructure. |
| |
| 4. In interpreters and REPLs, e.g. Cling (C++) and the Swift interpreter. |
| |
| By adoping a modular, library-based design we aim to make ORC useful in as many |
| of these contexts as possible. |
| |
| Features |
| ======== |
| |
| ORC provides the following features: |
| |
| - *JIT-linking* links relocatable object files (COFF, ELF, MachO) [1]_ into a |
| target process an runtime. The target process may be the same process that |
| contains the JIT session object and jit-linker, or may be another process |
| (even one running on a different machine or architecture) that communicates |
| with the JIT via RPC. |
| |
| - *LLVM IR compilation*, which is provided by off the shelf components |
| (IRCompileLayer, SimpleCompiler, ConcurrentIRCompiler) that make it easy to |
| add LLVM IR to a JIT'd process. |
| |
| - *Eager and lazy compilation*. By default, ORC will compile symbols as soon as |
| they are looked up in the JIT session object (``ExecutionSession``). Compiling |
| eagerly by default makes it easy to use ORC as a simple in-memory compiler for |
| an existing JIT. ORC also provides a simple mechanism, lazy-reexports, for |
| deferring compilation until first call. |
| |
| - *Support for custom compilers and program representations*. Clients can supply |
| custom compilers for each symbol that they define in their JIT session. ORC |
| will run the user-supplied compiler when the a definition of a symbol is |
| needed. ORC is actually fully language agnostic: LLVM IR is not treated |
| specially, and is supported via the same wrapper mechanism (the |
| ``MaterializationUnit`` class) that is used for custom compilers. |
| |
| - *Concurrent JIT'd code* and *concurrent compilation*. JIT'd code may spawn |
| multiple threads, and may re-enter the JIT (e.g. for lazy compilation) |
| concurrently from multiple threads. The ORC APIs also support running multiple |
| compilers concurrently, and provides off-the-shelf infrastructure to track |
| dependencies on running compiles (e.g. to ensure that we never call into code |
| until it is safe to do so, even if that involves waiting on multiple |
| compiles). |
| |
| - *Orthogonality* and *composability*: Each of the features above can be used (or |
| not) independently. It is possible to put ORC components together to make a |
| non-lazy, in-process, single threaded JIT or a lazy, out-of-process, |
| concurrent JIT, or anything in between. |
| |
| LLJIT and LLLazyJIT |
| =================== |
| |
| ORC provides two basic JIT classes off-the-shelf. These are useful both as |
| examples of how to assemble ORC components to make a JIT, and as replacements |
| for earlier LLVM JIT APIs (e.g. MCJIT). |
| |
| The LLJIT class uses an IRCompileLayer and RTDyldObjectLinkingLayer to support |
| compilation of LLVM IR and linking of relocatable object files. All operations |
| are performed eagerly on symbol lookup (i.e. a symbol's definition is compiled |
| as soon as you attempt to look up its address). LLJIT is a suitable replacement |
| for MCJIT in most cases (note: some more advanced features, e.g. |
| JITEventListeners are not supported yet). |
| |
| The LLLazyJIT extends LLJIT and adds a CompileOnDemandLayer to enable lazy |
| compilation of LLVM IR. When an LLVM IR module is added via the addLazyIRModule |
| method, function bodies in that module will not be compiled until they are first |
| called. LLLazyJIT aims to provide a replacement of LLVM's original (pre-MCJIT) |
| JIT API. |
| |
| LLJIT and LLLazyJIT instances can be created using their respective builder |
| classes: LLJITBuilder and LLazyJITBuilder. For example, assuming you have a |
| module ``M`` loaded on an ThreadSafeContext ``Ctx``: |
| |
| .. code-block:: c++ |
| |
| // Try to detect the host arch and construct an LLJIT instance. |
| auto JIT = LLJITBuilder().create(); |
| |
| // If we could not construct an instance, return an error. |
| if (!JIT) |
| return JIT.takeError(); |
| |
| // Add the module. |
| if (auto Err = JIT->addIRModule(TheadSafeModule(std::move(M), Ctx))) |
| return Err; |
| |
| // Look up the JIT'd code entry point. |
| auto EntrySym = JIT->lookup("entry"); |
| if (!EntrySym) |
| return EntrySym.takeError(); |
| |
| auto *Entry = (void(*)())EntrySym.getAddress(); |
| |
| Entry(); |
| |
| The builder clasess provide a number of configuration options that can be |
| specified before the JIT instance is constructed. For example: |
| |
| .. code-block:: c++ |
| |
| // Build an LLLazyJIT instance that uses four worker threads for compilation, |
| // and jumps to a specific error handler (rather than null) on lazy compile |
| // failures. |
| |
| void handleLazyCompileFailure() { |
| // JIT'd code will jump here if lazy compilation fails, giving us an |
| // opportunity to exit or throw an exception into JIT'd code. |
| throw JITFailed(); |
| } |
| |
| auto JIT = LLLazyJITBuilder() |
| .setNumCompileThreads(4) |
| .setLazyCompileFailureAddr( |
| toJITTargetAddress(&handleLazyCompileFailure)) |
| .create(); |
| |
| // ... |
| |
| For users wanting to get started with LLJIT a minimal example program can be |
| found at ``llvm/examples/HowToUseLLJIT``. |
| |
| Design Overview |
| =============== |
| |
| ORC's JIT'd program model aims to emulate the linking and symbol resolution |
| rules used by the static and dynamic linkers. This allows ORC to JIT |
| arbitrary LLVM IR, including IR produced by an ordinary static compiler (e.g. |
| clang) that uses constructs like symbol linkage and visibility, and weak and |
| common symbol definitions. |
| |
| To see how this works, imagine a program ``foo`` which links against a pair |
| of dynamic libraries: ``libA`` and ``libB``. On the command line, building this |
| system might look like: |
| |
| .. code-block:: bash |
| |
| $ clang++ -shared -o libA.dylib a1.cpp a2.cpp |
| $ clang++ -shared -o libB.dylib b1.cpp b2.cpp |
| $ clang++ -o myapp myapp.cpp -L. -lA -lB |
| $ ./myapp |
| |
| In ORC, this would translate into API calls on a "CXXCompilingLayer" (with error |
| checking omitted for brevity) as: |
| |
| .. code-block:: c++ |
| |
| ExecutionSession ES; |
| RTDyldObjectLinkingLayer ObjLinkingLayer( |
| ES, []() { return llvm::make_unique<SectionMemoryManager>(); }); |
| CXXCompileLayer CXXLayer(ES, ObjLinkingLayer); |
| |
| // Create JITDylib "A" and add code to it using the CXX layer. |
| auto &LibA = ES.createJITDylib("A"); |
| CXXLayer.add(LibA, MemoryBuffer::getFile("a1.cpp")); |
| CXXLayer.add(LibA, MemoryBuffer::getFile("a2.cpp")); |
| |
| // Create JITDylib "B" and add code to it using the CXX layer. |
| auto &LibB = ES.createJITDylib("B"); |
| CXXLayer.add(LibB, MemoryBuffer::getFile("b1.cpp")); |
| CXXLayer.add(LibB, MemoryBuffer::getFile("b2.cpp")); |
| |
| // Specify the search order for the main JITDylib. This is equivalent to a |
| // "links against" relationship in a command-line link. |
| ES.getMainJITDylib().setSearchOrder({{&LibA, false}, {&LibB, false}}); |
| CXXLayer.add(ES.getMainJITDylib(), MemoryBuffer::getFile("main.cpp")); |
| |
| // Look up the JIT'd main, cast it to a function pointer, then call it. |
| auto MainSym = ExitOnErr(ES.lookup({&ES.getMainJITDylib()}, "main")); |
| auto *Main = (int(*)(int, char*[]))MainSym.getAddress(); |
| |
| int Result = Main(...); |
| |
| |
| This example tells us nothing about *how* or *when* compilation will happen. |
| That will depend on the implementation of the hypothetical CXXCompilingLayer, |
| but the linking rules will be the same regardless. For example, if a1.cpp and |
| a2.cpp both define a function "foo" the API should generate a duplicate |
| definition error. On the other hand, if a1.cpp and b1.cpp both define "foo" |
| there is no error (different dynamic libraries may define the same symbol). If |
| main.cpp refers to "foo", it should bind to the definition in LibA rather than |
| the one in LibB, since main.cpp is part of the "main" dylib, and the main dylib |
| links against LibA before LibB. |
| |
| Many JIT clients will have no need for this strict adherence to the usual |
| ahead-of-time linking rules and should be able to get by just fine by putting |
| all of their code in a single JITDylib. However, clients who want to JIT code |
| for languages/projects that traditionally rely on ahead-of-time linking (e.g. |
| C++) will find that this feature makes life much easier. |
| |
| Symbol lookup in ORC serves two other important functions, beyond basic lookup: |
| (1) It triggers compilation of the symbol(s) searched for, and (2) it provides |
| the synchronization mechanism for concurrent compilation. The pseudo-code for |
| the lookup process is: |
| |
| .. code-block:: none |
| |
| construct a query object from a query set and query handler |
| lock the session |
| lodge query against requested symbols, collect required materializers (if any) |
| unlock the session |
| dispatch materializers (if any) |
| |
| In this context a materializer is something that provides a working definition |
| of a symbol upon request. Generally materializers wrap compilers, but they may |
| also wrap a linker directly (if the program representation backing the |
| definitions is an object file), or even just a class that writes bits directly |
| into memory (if the definitions are stubs). Materialization is the blanket term |
| for any actions (compiling, linking, splatting bits, registering with runtimes, |
| etc.) that is requried to generate a symbol definition that is safe to call or |
| access. |
| |
| As each materializer completes its work it notifies the JITDylib, which in turn |
| notifies any query objects that are waiting on the newly materialized |
| definitions. Each query object maintains a count of the number of symbols that |
| it is still waiting on, and once this count reaches zero the query object calls |
| the query handler with a *SymbolMap* (a map of symbol names to addresses) |
| describing the result. If any symbol fails to materialize the query immediately |
| calls the query handler with an error. |
| |
| The collected materialization units are sent to the ExecutionSession to be |
| dispatched, and the dispatch behavior can be set by the client. By default each |
| materializer is run on the calling thread. Clients are free to create new |
| threads to run materializers, or to send the work to a work queue for a thread |
| pool (this is what LLJIT/LLLazyJIT do). |
| |
| Top Level APIs |
| ============== |
| |
| Many of ORC's top-level APIs are visible in the example above: |
| |
| - *ExecutionSession* represents the JIT'd program and provides context for the |
| JIT: It contains the JITDylibs, error reporting mechanisms, and dispatches the |
| materializers. |
| |
| - *JITDylibs* provide the symbol tables. |
| |
| - *Layers* (ObjLinkingLayer and CXXLayer) are wrappers around compilers and |
| allow clients to add uncompiled program representations supported by those |
| compilers to JITDylibs. |
| |
| Several other important APIs are used explicitly. JIT clients need not be aware |
| of them, but Layer authors will use them: |
| |
| - *MaterializationUnit* - When XXXLayer::add is invoked it wraps the given |
| program representation (in this example, C++ source) in a MaterializationUnit, |
| which is then stored in the JITDylib. MaterializationUnits are responsible for |
| describing the definitions they provide, and for unwrapping the program |
| representation and passing it back to the layer when compilation is required |
| (this ownership shuffle makes writing thread-safe layers easier, since the |
| ownership of the program representation will be passed back on the stack, |
| rather than having to be fished out of a Layer member, which would require |
| synchronization). |
| |
| - *MaterializationResponsibility* - When a MaterializationUnit hands a program |
| representation back to the layer it comes with an associated |
| MaterializationResponsibility object. This object tracks the definitions |
| that must be materialized and provides a way to notify the JITDylib once they |
| are either successfully materialized or a failure occurs. |
| |
| Handy utilities |
| =============== |
| |
| TBD: absolute symbols, aliases, off-the-shelf layers. |
| |
| Laziness |
| ======== |
| |
| Laziness in ORC is provided by a utility called "lazy-reexports". The aim of |
| this utility is to re-use the synchronization provided by the symbol lookup |
| mechanism to make it safe to lazily compile functions, even if calls to the |
| stub occur simultaneously on multiple threads of JIT'd code. It does this by |
| reducing lazy compilation to symbol lookup: The lazy stub performs a lookup of |
| its underlying definition on first call, updating the function body pointer |
| once the definition is available. If additional calls arrive on other threads |
| while compilation is ongoing they will be safely blocked by the normal lookup |
| synchronization guarantee (no result until the result is safe) and can also |
| proceed as soon as compilation completes. |
| |
| TBD: Usage example. |
| |
| Supporting Custom Compilers |
| =========================== |
| |
| TBD. |
| |
| Low Level (MCJIT style) Use |
| =========================== |
| |
| TBD. |
| |
| Future Features |
| =============== |
| |
| TBD: Speculative compilation. Object Caches. |
| |
| .. [1] Formats/architectures vary in terms of supported features. MachO and |
| ELF tend to have better support than COFF. Patches very welcome! |