commit | 03c09d9262e53e2359b1f74e42cab20bc93b47a8 | [log] [tgz] |
---|---|---|
author | Farid Zakaria <fmzakari@google.com> | Wed Jan 31 16:07:43 2024 -0800 |
committer | GitHub <noreply@github.com> | Wed Jan 31 16:07:43 2024 -0800 |
tree | 853d9b10ae3f960a269aedda0e408228f0582675 | |
parent | 968b59aadde0aa2b51e3059d392a4dfef4917b06 [diff] |
Add option to separate debug information (DWARF) from executable (#1973) When building Debug or RelWithDebInfo, the size of the binaries (i.e. `stablehlo-opt`) can be quite beefy. I noticed it can be up to **2GiB** in size and attaching GDB is quite slow -- in fact loading the process to run is slow. _This is much larger and noticeable than the Bazel build._ This change introduces a common idiom to speed up large binaries by separating the debug information (DWARF) into separate files (`*.dwo). In order to speedup GDB attachment, a gdb-index is created and stored in the file itself. One could achieve the same effect with `.gdbinit` settings however the management of the index files are never pruned and it's a bit cumbersome to rely on developers to set it up. For reference here is a possible `.gdbinit` ``` # History. set history filename ~/.gdb_history set history save on set history size 100000 # Makes multiple invocations much faster set index-cache on # Allow per-project gdbinit files set auto-load local-gdbinit on add-auto-load-safe-path / ``` **Note**: Looks like you also need to build MLIR with separate dwarfs as well. To be honest, the interplay between settings on LLVM and then on StableHLO are not very clear at times. Some settings seem to propagate based on how the previous object code was created and others did not. Specifically for GDB launch, you can see a very big time saving in the below benchmark. **76s vs 1.3s** ## Inline debug info **Size**: 3.2GiB ```shell ❯ ll -h bin/stablehlo-opt Permissions Size User Date Modified Name .rwxr-xr-x 3.2G 780412 31 Jan 18:14 bin/stablehlo-opt ``` Most of the space is the debug information. ```shell ❯ /google/bin/releases/protobuf-team/bloaty/bloaty bin/stablehlo-opt --allow_unsafe_non_google3_input FILE SIZE VM SIZE -------------- -------------- 60.2% 1.80Gi 0.0% 0 .debug_info 12.4% 378Mi 0.0% 0 .debug_str 10.3% 314Mi 0.0% 0 .debug_loclists 5.9% 180Mi 0.0% 0 .debug_line ``` **Benchmark** ``` # benchmark just running the build ❯ hyperfine './build/bin/stablehlo-opt --version' Benchmark 1: ./build/bin/stablehlo-opt --version Time (mean ± σ): 181.6 ms ± 3.8 ms [User: 73.4 ms, System: 108.2 ms] Range (min … max): 173.8 ms … 188.3 ms 16 runs # benchark with GDB ❯ hyperfine 'gdb -ex run --args ./build/bin/stablehlo-opt --version' --warmup 1 --runs 3 Benchmark 1: gdb -ex run --args ./build/bin/stablehlo-opt --version Time (mean ± σ): 74.063 s ± 2.381 s [User: 71.044 s, System: 3.958 s] Range (min … max): 72.361 s … 76.784 s 3 runs ``` ## With separate debug info **Size**: 847M ```shell ❯ ll -h build/bin/stablehlo-opt Permissions Size User Date Modified Name .rwxr-xr-x 847M 780412 31 Jan 17:29 build/bin/stablehlo-opt ```` Much of the space is now a `gdb-index` to make attaching to the debugger much faster ```shell ❯ /google/bin/releases/protobuf-team/bloaty/bloaty build/bin/stablehlo-opt --allow_unsafe_non_google3_input FILE SIZE VM SIZE -------------- -------------- 38.0% 306Mi 0.0% 0 .gdb_index 24.2% 195Mi 0.0% 0 .debug_addr 20.6% 166Mi 0.0% 0 .debug_line 6.5% 52.5Mi 55.5% 52.5Mi .text ``` **Benchark**: ``` # benchmark just running the build ❯ hyperfine './build/bin/stablehlo-opt --version' Benchmark 1: ./build/bin/stablehlo-opt --version Time (mean ± σ): 38.6 ms ± 1.6 ms [User: 15.7 ms, System: 23.1 ms] Range (min … max): 36.5 ms … 44.6 ms 68 runs # benchark with GDB ❯ hyperfine 'gdb -ex run --args ./build/bin/stablehlo-opt --version' Benchmark 1: gdb -ex run --args ./build/bin/stablehlo-opt --version Time (mean ± σ): 1.401 s ± 0.037 s [User: 1.532 s, System: 0.883 s] Range (min … max): 1.345 s … 1.445 s 10 runs ```
StableHLO is an operation set for high-level operations (HLO) in machine learning (ML) models. Essentially, it's a portability layer between different ML frameworks and ML compilers: ML frameworks that produce StableHLO programs are compatible with ML compilers that consume StableHLO programs.
Our goal is to simplify and accelerate ML development by creating more interoperability between various ML frameworks (such as TensorFlow, JAX and PyTorch) and ML compilers (such as XLA and IREE).
StableHLO is based on the MHLO dialect and enhances it with additional functionality, including serialization and versioning. We use MLIR bytecode as serialization format and provide backward and forward compatibility guarantees. This ensures compatibility between frameworks and compilers, even as StableHLO continues to evolve.
This repository includes the StableHLO specification along with an MLIR-based implementation in C++ and Python, which you can use to define StableHLO programs for consumption by compilers such as XLA and IREE.
Here's how to build the StableHLO repo on Linux or macOS:
CMake is our primary build tool, so before you begin make sure that you have CMake and Ninja installed.
If you're using Linux, we recommend installing lld
as well - we have observed it to be noticeably faster than alternatives on our typical software and hardware configurations.
# On Linux sudo apt install cmake ninja-build lld # On macOS brew install cmake ninja
Set the LLVM_ENABLE_LLD
shell variable depending on your preferences. We recommend setting it to ON
on Linux and to OFF
on macOS.
[[ "$(uname)" != "Darwin" ]] && LLVM_ENABLE_LLD="ON" || LLVM_ENABLE_LLD="OFF"
Clone the StableHLO repo and the LLVM repository:
git clone https://github.com/openxla/stablehlo
cd stablehlo && git clone https://github.com/llvm/llvm-project.git
Cloning the LLVM repository may take a few minutes.
Make sure you check out the correct commit in the LLVM repository:
(cd llvm-project && git fetch && git checkout $(cat ../build_tools/llvm_version.txt))
You need to do this every time llvm_version.txt
changes.
Configure and build MLIR:
MLIR_ENABLE_BINDINGS_PYTHON=ON build_tools/build_mlir.sh ${PWD}/llvm-project/ ${PWD}/llvm-build
This will take a considerable amount of time. For example, on a MacBook Pro with an M1 Pro chip, building MLIR took around 10 minutes at the moment of writing.
Again, you need to do this every time llvm_version.txt
changes.
Build StableHLO as a standalone library:
mkdir -p build && cd build cmake .. -GNinja \ -DLLVM_ENABLE_LLD="$LLVM_ENABLE_LLD" \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DSTABLEHLO_ENABLE_BINDINGS_PYTHON=ON \ -DMLIR_DIR=${PWD}/../llvm-build/lib/cmake/mlir
If you are actively developing StableHLO, you may want the following additional CMake settings:
cmake .. -GNinja \ -DSTABLEHLO_ENABLE_LLD=ON \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DSTABLEHLO_ENABLE_BINDINGS_PYTHON=OFF \ -DSTABLEHLO_ENABLE_SPLIT_DWARF=ON \ -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \ -DCMAKE_C_COMPILER_LAUNCHER=ccache \ -DSTABLEHLO_ENABLE_SANITIZER=address \ -DMLIR_DIR=${PWD}/../llvm-build/lib/cmake/mlir
This will enable debug symbols and ccache, which can speed up incremental builds. It also creates a GDB index file in the binary to speed up debugging.
If you build MLIR using the script above it should also set by default LLVM_USE_SPLIT_DWARF
which does the majority of the size saving for the binary and should also be set.
Now you can make sure it works by running some tests:
ninja check-stablehlo-tests
You should see results like this:
Testing Time: 5.99s
Passed: 47
This runs all the tests in stablehlo/tests/
.
If you‘d like to build the Python bindings, you’ll need to install a few additional dependencies.
pip install install -r ./llvm-project/mlir/python/requirements.txt
If you've built MLIR & StableHLO using the script above, the Python bindings for MLIR may already built.
After you have built the project you can import the Python bindings to begin by modifying your Python path variable
$ PYTHONPATH="./build/python_packages/stablehlo" python3 Python 3.11.6 (main, Oct 8 2023, 05:06:43) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import mlir.dialects.stablehlo >>> from mlir.ir import Context, Location >>> import mlir.dialects.arith
You can also build a wheel yourself using the setup.py
file. We also make nightly wheels available on our GitHub Releases page.
pip install stablehlo -f https://github.com/openxla/stablehlo/releases/expanded_assets/dev-wheels
Building an amazing portability layer between ML frameworks and ML compilers requires collaboration across the whole ML industry, so we're happy to have your help on the StableHLO project.
We're using GitHub issues / pull requests to organize development and openxla-discuss to have longer discussions. We also have a #stablehlo
channel on the OpenXLA Discord server.