Add option to separate debug information (DWARF) from executable (#1973)

When building Debug or RelWithDebInfo, the size of the binaries (i.e.
`stablehlo-opt`) can be quite beefy.
I noticed it can be up to **2GiB** in size and attaching GDB is quite
slow -- in fact loading the process to run is slow.
_This is much larger and noticeable than the Bazel build._

This change introduces a common idiom to speed up large binaries by
separating the debug information (DWARF) into separate files (`*.dwo).
In order to speedup GDB attachment, a gdb-index is created and stored in
the file itself.

One could achieve the same effect with `.gdbinit` settings however the
management of the index files are never pruned and it's a bit cumbersome
to rely on developers to set it up.

For reference here is a possible `.gdbinit`
```
# History.
set history filename ~/.gdb_history
set history save on
set history size 100000

# Makes multiple invocations much faster
set index-cache on

# Allow per-project gdbinit files
set auto-load local-gdbinit on
add-auto-load-safe-path /
```

**Note**: Looks like you also need to build MLIR with separate dwarfs as
well. To be honest, the interplay between settings on LLVM and then on
StableHLO are not very clear at times. Some settings seem to propagate
based on how the previous object code was created and others did not.

Specifically for GDB launch, you can see a very big time saving in the
below benchmark. **76s vs 1.3s**

## Inline debug info
**Size**: 3.2GiB
```shell
❯ ll -h bin/stablehlo-opt
Permissions Size User   Date Modified Name
.rwxr-xr-x  3.2G 780412 31 Jan 18:14  bin/stablehlo-opt
```

Most of the space is the debug information.
```shell
❯ /google/bin/releases/protobuf-team/bloaty/bloaty bin/stablehlo-opt --allow_unsafe_non_google3_input
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  60.2%  1.80Gi   0.0%       0    .debug_info
  12.4%   378Mi   0.0%       0    .debug_str
  10.3%   314Mi   0.0%       0    .debug_loclists
   5.9%   180Mi   0.0%       0    .debug_line
```

**Benchmark**
```
# benchmark just running the build
❯ hyperfine './build/bin/stablehlo-opt --version'
Benchmark 1: ./build/bin/stablehlo-opt --version
  Time (mean ± σ):     181.6 ms ±   3.8 ms    [User: 73.4 ms, System: 108.2 ms]
  Range (min … max):   173.8 ms … 188.3 ms    16 runs


# benchark with GDB
❯ hyperfine 'gdb -ex run --args ./build/bin/stablehlo-opt --version' --warmup 1 --runs 3
Benchmark 1: gdb -ex run --args ./build/bin/stablehlo-opt --version
  Time (mean ± σ):     74.063 s ±  2.381 s    [User: 71.044 s, System: 3.958 s]
  Range (min … max):   72.361 s … 76.784 s    3 runs
```

## With separate debug info

**Size**: 847M
```shell
❯ ll -h build/bin/stablehlo-opt
Permissions Size User   Date Modified Name
.rwxr-xr-x  847M 780412 31 Jan 17:29  build/bin/stablehlo-opt
````

Much of the space is now a `gdb-index` to make attaching to the debugger
much faster
```shell
❯ /google/bin/releases/protobuf-team/bloaty/bloaty build/bin/stablehlo-opt --allow_unsafe_non_google3_input
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  38.0%   306Mi   0.0%       0    .gdb_index
  24.2%   195Mi   0.0%       0    .debug_addr
  20.6%   166Mi   0.0%       0    .debug_line
   6.5%  52.5Mi  55.5%  52.5Mi    .text
```
**Benchark**:
```
# benchmark just running the build
❯ hyperfine './build/bin/stablehlo-opt --version'
Benchmark 1: ./build/bin/stablehlo-opt --version
  Time (mean ± σ):      38.6 ms ±   1.6 ms    [User: 15.7 ms, System: 23.1 ms]
  Range (min … max):    36.5 ms …  44.6 ms    68 runs

# benchark with GDB
❯ hyperfine 'gdb -ex run --args ./build/bin/stablehlo-opt --version'
Benchmark 1: gdb -ex run --args ./build/bin/stablehlo-opt --version
  Time (mean ± σ):      1.401 s ±  0.037 s    [User: 1.532 s, System: 0.883 s]
  Range (min … max):    1.345 s …  1.445 s    10 runs
```
4 files changed
tree: 853d9b10ae3f960a269aedda0e408228f0582675
  1. .github/
  2. build_tools/
  3. cmake/
  4. docs/
  5. rfcs/
  6. stablehlo/
  7. .bazelignore
  8. .bazelrc
  9. .bazelversion
  10. .clang-format
  11. .gitignore
  12. .markdownlint.yaml
  13. BUILD.bazel
  14. CMakeLists.txt
  15. CODE_OF_CONDUCT.md
  16. CONTRIBUTING.md
  17. LICENSE
  18. README.md
  19. WORKSPACE.bazel
README.md

StableHLO

StableHLO is an operation set for high-level operations (HLO) in machine learning (ML) models. Essentially, it's a portability layer between different ML frameworks and ML compilers: ML frameworks that produce StableHLO programs are compatible with ML compilers that consume StableHLO programs.

Our goal is to simplify and accelerate ML development by creating more interoperability between various ML frameworks (such as TensorFlow, JAX and PyTorch) and ML compilers (such as XLA and IREE).

StableHLO is based on the MHLO dialect and enhances it with additional functionality, including serialization and versioning. We use MLIR bytecode as serialization format and provide backward and forward compatibility guarantees. This ensures compatibility between frameworks and compilers, even as StableHLO continues to evolve.

This repository includes the StableHLO specification along with an MLIR-based implementation in C++ and Python, which you can use to define StableHLO programs for consumption by compilers such as XLA and IREE.

Build instructions

Here's how to build the StableHLO repo on Linux or macOS:

  1. CMake is our primary build tool, so before you begin make sure that you have CMake and Ninja installed.

    If you're using Linux, we recommend installing lld as well - we have observed it to be noticeably faster than alternatives on our typical software and hardware configurations.

    # On Linux
    sudo apt install cmake ninja-build lld
    
    # On macOS
    brew install cmake ninja
    
  2. Set the LLVM_ENABLE_LLD shell variable depending on your preferences. We recommend setting it to ON on Linux and to OFF on macOS.

    [[ "$(uname)" != "Darwin" ]] && LLVM_ENABLE_LLD="ON" || LLVM_ENABLE_LLD="OFF"
    
  3. Clone the StableHLO repo and the LLVM repository:

    git clone https://github.com/openxla/stablehlo
    
    cd stablehlo && git clone https://github.com/llvm/llvm-project.git
    

    Cloning the LLVM repository may take a few minutes.

  4. Make sure you check out the correct commit in the LLVM repository:

    (cd llvm-project && git fetch && git checkout $(cat ../build_tools/llvm_version.txt))
    

    You need to do this every time llvm_version.txt changes.

  5. Configure and build MLIR:

    MLIR_ENABLE_BINDINGS_PYTHON=ON build_tools/build_mlir.sh ${PWD}/llvm-project/ ${PWD}/llvm-build
    

    This will take a considerable amount of time. For example, on a MacBook Pro with an M1 Pro chip, building MLIR took around 10 minutes at the moment of writing.

    Again, you need to do this every time llvm_version.txt changes.

  6. Build StableHLO as a standalone library:

    mkdir -p build && cd build
    
    cmake .. -GNinja \
      -DLLVM_ENABLE_LLD="$LLVM_ENABLE_LLD" \
      -DCMAKE_BUILD_TYPE=Release \
      -DLLVM_ENABLE_ASSERTIONS=ON \
      -DSTABLEHLO_ENABLE_BINDINGS_PYTHON=ON \
      -DMLIR_DIR=${PWD}/../llvm-build/lib/cmake/mlir
    

    If you are actively developing StableHLO, you may want the following additional CMake settings:

       cmake .. -GNinja \
      -DSTABLEHLO_ENABLE_LLD=ON \
      -DCMAKE_BUILD_TYPE=RelWithDebInfo \
      -DLLVM_ENABLE_ASSERTIONS=ON \
      -DSTABLEHLO_ENABLE_BINDINGS_PYTHON=OFF \
      -DSTABLEHLO_ENABLE_SPLIT_DWARF=ON \
      -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
      -DCMAKE_C_COMPILER_LAUNCHER=ccache \
      -DSTABLEHLO_ENABLE_SANITIZER=address \
      -DMLIR_DIR=${PWD}/../llvm-build/lib/cmake/mlir
    

    This will enable debug symbols and ccache, which can speed up incremental builds. It also creates a GDB index file in the binary to speed up debugging.

    If you build MLIR using the script above it should also set by default LLVM_USE_SPLIT_DWARF which does the majority of the size saving for the binary and should also be set.

  7. Now you can make sure it works by running some tests:

    ninja check-stablehlo-tests
    

    You should see results like this:

    Testing Time: 5.99s
      Passed: 47
    

    This runs all the tests in stablehlo/tests/.

Python

If you‘d like to build the Python bindings, you’ll need to install a few additional dependencies.

pip install  install -r ./llvm-project/mlir/python/requirements.txt

If you've built MLIR & StableHLO using the script above, the Python bindings for MLIR may already built.

After you have built the project you can import the Python bindings to begin by modifying your Python path variable

$ PYTHONPATH="./build/python_packages/stablehlo" python3
Python 3.11.6 (main, Oct  8 2023, 05:06:43) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mlir.dialects.stablehlo
>>> from mlir.ir import Context, Location
>>> import mlir.dialects.arith

You can also build a wheel yourself using the setup.py file. We also make nightly wheels available on our GitHub Releases page.

pip install stablehlo -f https://github.com/openxla/stablehlo/releases/expanded_assets/dev-wheels

Community

Building an amazing portability layer between ML frameworks and ML compilers requires collaboration across the whole ML industry, so we're happy to have your help on the StableHLO project.

We're using GitHub issues / pull requests to organize development and openxla-discuss to have longer discussions. We also have a #stablehlo channel on the OpenXLA Discord server.