|  | ==================== | 
|  | Clang Linker Wrapper | 
|  | ==================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | .. _clang-linker-wrapper: | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | This tool works as a wrapper of the normal host linking job. This tool is used | 
|  | to create linked device images for offloading and the necessary runtime calls to | 
|  | register them. It works by first scanning the linker's input for embedded device | 
|  | offloading data stored at the ``.llvm.offloading`` section. This section | 
|  | contains binary data created by the :doc:`ClangOffloadPackager`. The extracted | 
|  | device files will then be linked. The linked modules will then be wrapped into a | 
|  | new object file containing the code necessary to register it with the offloading | 
|  | runtime. | 
|  |  | 
|  | Usage | 
|  | ===== | 
|  |  | 
|  | This tool can be used with the following options. Any arguments not intended | 
|  | only for the linker wrapper will be forwarded to the wrapped linker job. | 
|  |  | 
|  | .. code-block:: console | 
|  |  | 
|  | USAGE: clang-linker-wrapper [options] -- <options to passed to the linker> | 
|  |  | 
|  | OPTIONS: | 
|  | --cuda-path=<dir>      Set the system CUDA path | 
|  | --device-debug         Use debugging | 
|  | --device-linker=<value> or <triple>=<value> | 
|  | Arguments to pass to the device linker invocation | 
|  | --dry-run              Print program arguments without running | 
|  | --help-hidden          Display all available options | 
|  | --help                 Display available options (--help-hidden for more) | 
|  | --host-triple=<triple> Triple to use for the host compilation | 
|  | --linker-path=<path>   The linker executable to invoke | 
|  | -L <dir>               Add <dir> to the library search path | 
|  | -l <libname>           Search for library <libname> | 
|  | --opt-level=<O0, O1, O2, or O3> | 
|  | Optimization level for LTO | 
|  | --override-image=<kind=file> | 
|  | Uses the provided file as if it were the output of the device link step | 
|  | -o <path>              Path to file to write output | 
|  | --pass-remarks-analysis=<value> | 
|  | Pass remarks for LTO | 
|  | --pass-remarks-missed=<value> | 
|  | Pass remarks for LTO | 
|  | --pass-remarks=<value> Pass remarks for LTO | 
|  | --print-wrapped-module Print the wrapped module's IR for testing | 
|  | --ptxas-arg=<value>    Argument to pass to the 'ptxas' invocation | 
|  | --relocatable           Link device code to create a relocatable offloading application | 
|  | --save-temps           Save intermediate results | 
|  | --sysroot<value>       Set the system root | 
|  | --verbose              Verbose output from tools | 
|  | --v                    Display the version number and exit | 
|  | --                     The separator for the wrapped linker arguments | 
|  |  | 
|  | Relocatable Linking | 
|  | =================== | 
|  |  | 
|  | The ``clang-linker-wrapper`` handles linking embedded device code and then | 
|  | registering it with the appropriate runtime. Normally, this is only done when | 
|  | the executable is created so other files containing device code can be linked | 
|  | together. This can be somewhat problematic for users who wish to ship static | 
|  | libraries that contain offloading code to users without a compatible offloading | 
|  | toolchain. | 
|  |  | 
|  | When using a relocatable link with ``-r``, the ``clang-linker-wrapper`` will | 
|  | perform the device linking and registration eagerly. This will remove the | 
|  | embedded device code and register it correctly with the runtime. Semantically, | 
|  | this is similar to creating a shared library object. If standard relocatable | 
|  | linking is desired, simply do not run the binaries through the | 
|  | ``clang-linker-wrapper``. This will simply append the embedded device code so | 
|  | that it can be linked later. | 
|  |  | 
|  | Matching | 
|  | ======== | 
|  |  | 
|  | The linker wrapper will link extracted device code that is compatible with each | 
|  | other. Generally, this requires that the target triple and architecture match. | 
|  | An exception is made when the architecture is listed as ``generic``, which will | 
|  | cause it be linked with any other device code with the same target triple. | 
|  |  | 
|  | Debugging | 
|  | ========= | 
|  |  | 
|  | The linker wrapper performs a lot of steps internally, such as input matching, | 
|  | symbol resolution, and image registration. This makes it difficult to debug in | 
|  | some scenarios. The behavior of the linker-wrapper is controlled mostly through | 
|  | metadata, described in `clang documentation | 
|  | <https://clang.llvm.org/docs/OffloadingDesign.html>`_. Intermediate output can | 
|  | be obtained from the linker-wrapper using the ``--save-temps`` flag. These files | 
|  | can then be modified. | 
|  |  | 
|  | .. code-block:: sh | 
|  |  | 
|  | $> clang openmp.c -fopenmp --offload-arch=gfx90a -c | 
|  | $> clang openmp.o -fopenmp --offload-arch=gfx90a -Wl,--save-temps | 
|  | $> ; Modify temp files. | 
|  | $> llvm-objcopy --update-section=.llvm.offloading=out.bc openmp.o | 
|  |  | 
|  | Doing this will allow you to override one of the input files by replacing its | 
|  | embedded offloading metadata with a user-modified version. However, this will be | 
|  | more difficult when there are multiple input files. For a very large hammer, the | 
|  | ``--override-image=<kind>=<file>`` flag can be used. | 
|  |  | 
|  | In the following example, we use the ``--save-temps`` to obtain the LLVM-IR just | 
|  | before running the backend. We then modify it to test altered behavior, and then | 
|  | compile it to a binary. This can then be passed to the linker-wrapper which will | 
|  | then ignore all embedded metadata and use the provided image as if it were the | 
|  | result of the device linking phase. | 
|  |  | 
|  | .. code-block:: sh | 
|  |  | 
|  | $> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--save-temps | 
|  | $> ; Modify temp files. | 
|  | $> clang --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib out.bc -o a.out | 
|  | $> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--override-image=openmp=a.out | 
|  |  | 
|  | Example | 
|  | ======= | 
|  |  | 
|  | This tool links object files with offloading images embedded within it using the | 
|  | ``-fembed-offload-object`` flag in Clang. Given an input file containing the | 
|  | magic section we can pass it to this tool to extract the data contained at that | 
|  | section and run a device linking job on it. | 
|  |  | 
|  | .. code-block:: console | 
|  |  | 
|  | clang-linker-wrapper --host-triple=x86_64 --linker-path=/usr/bin/ld -- <Args> |