| ============================================================ | 
 | Extending LLVM: Adding instructions, intrinsics, types, etc. | 
 | ============================================================ | 
 |  | 
 | Introduction and Warning | 
 | ======================== | 
 |  | 
 |  | 
 | During the course of using LLVM, you may wish to customize it for your research | 
 | project or for experimentation. At this point, you may realize that you need to | 
 | add something to LLVM, whether it be a new fundamental type, a new intrinsic | 
 | function, or a whole new instruction. | 
 |  | 
 | When you come to this realization, stop and think. Do you really need to extend | 
 | LLVM? Is it a new fundamental capability that LLVM does not support at its | 
 | current incarnation or can it be synthesized from already pre-existing LLVM | 
 | elements? If you are not sure, ask on the `LLVM forums | 
 | <https://discourse.llvm.org>`_. The reason is that | 
 | extending LLVM will get involved as you need to update all the different passes | 
 | that you intend to use with your extension, and there are ``many`` LLVM analyses | 
 | and transformations, so it may be quite a bit of work. | 
 |  | 
 | Adding an `intrinsic function`_ is far easier than adding an | 
 | instruction, and is transparent to optimization passes.  If your added | 
 | functionality can be expressed as a function call, an intrinsic function is the | 
 | method of choice for LLVM extension. | 
 |  | 
 | Before you invest a significant amount of effort into a non-trivial extension, | 
 | **ask on the list** if what you are looking to do can be done with | 
 | already-existing infrastructure, or if maybe someone else is already working on | 
 | it. You will save yourself a lot of time and effort by doing so. | 
 |  | 
 | .. _intrinsic function: | 
 |  | 
 | Adding a new intrinsic function | 
 | =============================== | 
 |  | 
 | Adding a new intrinsic function to LLVM is much easier than adding a new | 
 | instruction.  Almost all extensions to LLVM should start as an intrinsic | 
 | function and then be turned into an instruction if warranted. | 
 |  | 
 | #. ``llvm/docs/LangRef.html``: | 
 |  | 
 |    Document the intrinsic.  Decide whether it is code generator specific and | 
 |    what the restrictions are.  Talk to other people about it so that you are | 
 |    sure it's a good idea. | 
 |  | 
 | #. ``llvm/include/llvm/IR/Intrinsics*.td``: | 
 |  | 
 |    Add an entry for your intrinsic.  Describe its memory access | 
 |    characteristics for optimization (this controls whether it will be | 
 |    DCE'd, CSE'd, etc). If any arguments need to be immediates, these | 
 |    must be indicated with the ImmArg property. Note that any intrinsic | 
 |    using one of the ``llvm_any*_ty`` types for an argument or return | 
 |    type will be deemed by ``tblgen`` as overloaded and the | 
 |    corresponding suffix will be required on the intrinsic's name. | 
 |  | 
 | #. ``llvm/lib/Analysis/ConstantFolding.cpp``: | 
 |  | 
 |    If it is possible to constant fold your intrinsic, add support to it in the | 
 |    ``canConstantFoldCallTo`` and ``ConstantFoldCall`` functions. | 
 |  | 
 | #. ``llvm/test/*``: | 
 |  | 
 |    Add test cases for your test cases to the test suite | 
 |  | 
 | Once the intrinsic has been added to the system, you must add code generator | 
 | support for it.  Generally you must do the following steps: | 
 |  | 
 | Add support to the .td file for the target(s) of your choice in | 
 | ``lib/Target/*/*.td``. | 
 |  | 
 |   This is usually a matter of adding a pattern to the .td file that matches the | 
 |   intrinsic, though it may obviously require adding the instructions you want to | 
 |   generate as well.  There are lots of examples in the PowerPC and X86 backend | 
 |   to follow. | 
 |  | 
 | Adding a new SelectionDAG node | 
 | ============================== | 
 |  | 
 | As with intrinsics, adding a new SelectionDAG node to LLVM is much easier than | 
 | adding a new instruction.  New nodes are often added to help represent | 
 | instructions common to many targets.  These nodes often map to an LLVM | 
 | instruction (add, sub) or intrinsic (byteswap, population count).  In other | 
 | cases, new nodes have been added to allow many targets to perform a common task | 
 | (converting between floating point and integer representation) or capture more | 
 | complicated behavior in a single node (rotate). | 
 |  | 
 | #. ``include/llvm/CodeGen/ISDOpcodes.h``: | 
 |  | 
 |    Add an enum value for the new SelectionDAG node. | 
 |  | 
 | #. ``lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp``: | 
 |  | 
 |    Add code to print the node to ``getOperationName``.  If your new node can be | 
 |    evaluated at compile time when given constant arguments (such as an add of a | 
 |    constant with another constant), find the ``getNode`` method that takes the | 
 |    appropriate number of arguments, and add a case for your node to the switch | 
 |    statement that performs constant folding for nodes that take the same number | 
 |    of arguments as your new node. | 
 |  | 
 | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: | 
 |  | 
 |    Add code to `legalize, promote, and expand | 
 |    <CodeGenerator.html#selectiondag_legalize>`_ the node as necessary.  At a | 
 |    minimum, you will need to add a case statement for your node in | 
 |    ``LegalizeOp`` which calls LegalizeOp on the node's operands, and returns a | 
 |    new node if any of the operands changed as a result of being legalized.  It | 
 |    is likely that not all targets supported by the SelectionDAG framework will | 
 |    natively support the new node.  In this case, you must also add code in your | 
 |    node's case statement in ``LegalizeOp`` to Expand your node into simpler, | 
 |    legal operations.  The case for ``ISD::UREM`` for expanding a remainder into | 
 |    a divide, multiply, and a subtract is a good example. | 
 |  | 
 | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: | 
 |  | 
 |    If targets may support the new node being added only at certain sizes, you | 
 |    will also need to add code to your node's case statement in ``LegalizeOp`` | 
 |    to Promote your node's operands to a larger size, and perform the correct | 
 |    operation.  You will also need to add code to ``PromoteOp`` to do this as | 
 |    well.  For a good example, see ``ISD::BSWAP``, which promotes its operand to | 
 |    a wider size, performs the byteswap, and then shifts the correct bytes right | 
 |    to emulate the narrower byteswap in the wider type. | 
 |  | 
 | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: | 
 |  | 
 |    Add a case for your node in ``ExpandOp`` to teach the legalizer how to | 
 |    perform the action represented by the new node on a value that has been split | 
 |    into high and low halves.  This case will be used to support your node with a | 
 |    64 bit operand on a 32 bit target. | 
 |  | 
 | #. ``lib/CodeGen/SelectionDAG/DAGCombiner.cpp``: | 
 |  | 
 |    If your node can be combined with itself, or other existing nodes in a | 
 |    peephole-like fashion, add a visit function for it, and call that function | 
 |    from. There are several good examples for simple combines you can do; | 
 |    ``visitFABS`` and ``visitSRL`` are good starting places. | 
 |  | 
 | #. ``lib/Target/PowerPC/PPCISelLowering.cpp``: | 
 |  | 
 |    Each target has an implementation of the ``TargetLowering`` class, usually in | 
 |    its own file (although some targets include it in the same file as the | 
 |    DAGToDAGISel).  The default behavior for a target is to assume that your new | 
 |    node is legal for all types that are legal for that target.  If this target | 
 |    does not natively support your node, then tell the target to either Promote | 
 |    it (if it is supported at a larger type) or Expand it.  This will cause the | 
 |    code you wrote in ``LegalizeOp`` above to decompose your new node into other | 
 |    legal nodes for this target. | 
 |  | 
 | #. ``include/llvm/Target/TargetSelectionDAG.td``: | 
 |  | 
 |    Most current targets supported by LLVM generate code using the DAGToDAG | 
 |    method, where SelectionDAG nodes are pattern matched to target-specific | 
 |    nodes, which represent individual instructions.  In order for the targets to | 
 |    match an instruction to your new node, you must add a def for that node to | 
 |    the list in this file, with the appropriate type constraints. Look at | 
 |    ``add``, ``bswap``, and ``fadd`` for examples. | 
 |  | 
 | #. ``lib/Target/PowerPC/PPCInstrInfo.td``: | 
 |  | 
 |    Each target has a tablegen file that describes the target's instruction set. | 
 |    For targets that use the DAGToDAG instruction selection framework, add a | 
 |    pattern for your new node that uses one or more target nodes.  Documentation | 
 |    for this is a bit sparse right now, but there are several decent examples. | 
 |    See the patterns for ``rotl`` in ``PPCInstrInfo.td``. | 
 |  | 
 | #. TODO: document complex patterns. | 
 |  | 
 | #. ``llvm/test/CodeGen/*``: | 
 |  | 
 |    Add test cases for your new node to the test suite. | 
 |    ``llvm/test/CodeGen/X86/bswap.ll`` is a good example. | 
 |  | 
 | Adding a new instruction | 
 | ======================== | 
 |  | 
 | .. warning:: | 
 |  | 
 |   Adding instructions changes the bitcode format, and it will take some effort | 
 |   to maintain compatibility with the previous version. Only add an instruction | 
 |   if it is absolutely necessary. | 
 |  | 
 | #. ``llvm/include/llvm/IR/Instruction.def``: | 
 |  | 
 |    add a number for your instruction and an enum name | 
 |  | 
 | #. ``llvm/include/llvm/IR/Instructions.h``: | 
 |  | 
 |    add a definition for the class that will represent your instruction | 
 |  | 
 | #. ``llvm/include/llvm/IR/InstVisitor.h``: | 
 |  | 
 |    add a prototype for a visitor to your new instruction type | 
 |  | 
 | #. ``llvm/lib/AsmParser/LLLexer.cpp``: | 
 |  | 
 |    add a new token to parse your instruction from assembly text file | 
 |  | 
 | #. ``llvm/lib/AsmParser/LLParser.cpp``: | 
 |  | 
 |    add the grammar on how your instruction can be read and what it will | 
 |    construct as a result | 
 |  | 
 | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: | 
 |  | 
 |    add a case for your instruction and how it will be parsed from bitcode | 
 |  | 
 | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: | 
 |  | 
 |    add a case for your instruction and how it will be parsed from bitcode | 
 |  | 
 | #. ``llvm/lib/IR/Instruction.cpp``: | 
 |  | 
 |    add a case for how your instruction will be printed out to assembly | 
 |  | 
 | #. ``llvm/lib/IR/Instructions.cpp``: | 
 |  | 
 |    implement the class you defined in ``llvm/include/llvm/Instructions.h`` | 
 |  | 
 | #. Test your instruction | 
 |  | 
 | #. ``llvm/lib/Target/*``: | 
 |  | 
 |    add support for your instruction to code generators, or add a lowering pass. | 
 |  | 
 | #. ``llvm/test/*``: | 
 |  | 
 |    add your test cases to the test suite. | 
 |  | 
 | Also, you need to implement (or modify) any analyses or passes that you want to | 
 | understand this new instruction. | 
 |  | 
 | Adding a new type | 
 | ================= | 
 |  | 
 | .. warning:: | 
 |  | 
 |   Adding new types changes the bitcode format, and will break compatibility with | 
 |   currently-existing LLVM installations. Only add new types if it is absolutely | 
 |   necessary. | 
 |  | 
 | Adding a fundamental type | 
 | ------------------------- | 
 |  | 
 | #. ``llvm/include/llvm/IR/Type.h``: | 
 |  | 
 |    add enum for the new type; add static ``Type*`` for this type | 
 |  | 
 | #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/CodeGen/ValueTypes.cpp``: | 
 |  | 
 |    add mapping from ``TypeID`` => ``Type*``; initialize the static ``Type*`` | 
 |  | 
 | #. ``llvm/include/llvm-c/Core.h`` and ``llvm/lib/IR/Core.cpp``: | 
 |  | 
 |    add enum ``LLVMTypeKind`` and modify | 
 |    ``LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)`` for the new type | 
 |  | 
 | #. ``llvm/lib/AsmParser/LLLexer.cpp``: | 
 |  | 
 |    add ability to parse in the type from text assembly | 
 |  | 
 | #. ``llvm/lib/AsmParser/LLParser.cpp``: | 
 |  | 
 |    add a token for that type | 
 |  | 
 | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: | 
 |  | 
 |    modify ``void ModuleBitcodeWriter::writeTypeTable()`` to serialize your type | 
 |  | 
 | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: | 
 |  | 
 |    modify ``Error BitcodeReader::parseTypeTableBody()`` to read your data type | 
 |  | 
 | #. ``include/llvm/Bitcode/LLVMBitCodes.h``: | 
 |  | 
 |    add enum ``TypeCodes`` for the new type | 
 |  | 
 | Adding a derived type | 
 | --------------------- | 
 |  | 
 | #. ``llvm/include/llvm/IR/Type.h``: | 
 |  | 
 |    add enum for the new type; add a forward declaration of the type also | 
 |  | 
 | #. ``llvm/include/llvm/IR/DerivedTypes.h``: | 
 |  | 
 |    add new class to represent new class in the hierarchy; add forward | 
 |    declaration to the TypeMap value type | 
 |  | 
 | #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/CodeGen/ValueTypes.cpp``: | 
 |  | 
 |    add support for derived type, notably `enum TypeID` and `is`, `get` methods. | 
 |  | 
 | #. ``llvm/include/llvm-c/Core.h`` and ``llvm/lib/IR/Core.cpp``: | 
 |  | 
 |    add enum ``LLVMTypeKind`` and modify | 
 |    `LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)` for the new type | 
 |  | 
 | #. ``llvm/lib/AsmParser/LLLexer.cpp``: | 
 |  | 
 |    modify ``lltok::Kind LLLexer::LexIdentifier()`` to add ability to | 
 |    parse in the type from text assembly | 
 |  | 
 | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: | 
 |  | 
 |    modify ``void ModuleBitcodeWriter::writeTypeTable()`` to serialize your type | 
 |  | 
 | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: | 
 |  | 
 |    modify ``Error BitcodeReader::parseTypeTableBody()`` to read your data type | 
 |  | 
 | #. ``include/llvm/Bitcode/LLVMBitCodes.h``: | 
 |  | 
 |    add enum ``TypeCodes`` for the new type | 
 |  | 
 | #. ``llvm/lib/IR/AsmWriter.cpp``: | 
 |  | 
 |    modify ``void TypePrinting::print(Type *Ty, raw_ostream &OS)`` | 
 |    to output the new derived type |