blob: 4972fc889fc6eec87a11488ac0d81c9dee4ecc0e [file] [log] [blame] [view]
## Overview
In this demo we will look at:
* Building LLVM with the correct settings to allow for model training
* Collecting a training corpus for the regalloc model based on Chromium
* Training the regalloc model on the collected corpus
* Compiling the trained model into LLVM
## Preliminaries
Set up some environment variables according to where you want to clone/build
all of the code:
```bash
export WORKING_DIR=~
```
Change the directory to wherever you'd like to put everything.
## Get repositories
### ml-compiler-opt (this repository)
Clone the github repo:
```bash
cd $WORKING_DIR
git clone https://github.com/google/ml-compiler-opt
```
### LLVM
Grabbing LLVM should be as simple as running the below command, but if
something goes awry, make sure to check the
[official documentation](https://llvm.org/docs/GettingStarted.html).
```bash
git clone https://github.com/llvm/llvm-project.git
```
### Chromium
Grabbing Chromium is a bit more involved. The
[official documentation](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md)
for Linux based systems (the only platform currently supported by MLGO) is
available at that link. However, cloning the code should be as simple as
downloading depot_tools and adding them to your `$PATH`:
```bash
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
export PATH="$PATH:$WORKING_DIR/depot_tools"
```
Creating a folder for Chromium and cloning it using the `fetch` utility:
```bash
mkdir chromium
cd chromium
fetch --nohooks --no-history chromium
```
**Note:** Running `fetch` with the `--no-history` flag makes your local
checkout significantly smaller and speeds up the underlying git clone by a
significant amount. However, if you actually need the history for any
reason (eg to revert to a previous commit or to work on the Chromium side
with MLGO stuff), make sure to omit this flag to do a full checkout.
The fetch command will take at least a couple minutes on a fast internet
connection and much longer on slower ones.
Next, we need to modify the `.gclient` file that `fetch` creates in the
directory that you run it in to make sure that the Chromium PGO profiles
get checked out:
```bash
sed -i 's/"custom_vars": {},/"custom_vars": { "checkout_pgo_profiles" : True },/' .gclient
```
This `sed` command will set the necessary variable correctly. After this,
you can move into the `src` directory that `fetch` created that contains
the actual Chromium codebase.
Now that this is all in place, you need to run the Chromium hooks in order to
get the development environment ready for a full compilation:
```bash
gclient runhooks
```
## Install Dependencies
If you're working in a Debian based docker container, it will most likely
not come by default with `sudo`. It isn't strictly necessary to install,
but it makes it easier to copypaste the installation commands below and it
also enables the use of the Chromium dependency auto-installation script:
```bash
apt-get install sudo
```
First, install some base dependencies that will be needed when building
LLVM:
```bash
sudo apt-get install cmake ninja-build lld
```
Now, install the Chromium dependencies using the auto-installation script:
```bash
$WORKING_DIR/chromium/src/build/install-build-deps.sh
```
**Note:** These installation commands are all designed to be run on Debian
based distros. However, adapting to other distros with alternative package
management systems should not be too difficult. The packages for the first
command should be very similarly named and the
[official Chromium documentation](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md)
has info on dependency installation for their build process on other common
distros.
Also make sure that you install the Python dependencies for the
ml-compiler-opt repository:
```bash
cd $WORKING_DIR/ml-compiler-opt
pip3 install pipenv
pipenv sync --system
```
If you plan on doing development work on this checkout of ml-compiler-opt,
use the `/ml-compiler-opt/requirements-ci.txt` requirements file to install
some additional development dependencies.
## Building Chromium
**WARNING:** Currently, Chromium only builds with protobuf 4.x.x while Tensorflow
requires protobuf 3.x.x. In order to make sure that Chromium compiles correctly
you can either use a virtual environment and install protobuf 4.x.x there (currently
just the latest version), or you can install protobuf 4.x.x over the currently
installed version and then undo it later after the compile is complete. This tutorial
assumes no usage of virtual environments. Install a compatible version of protobuf:
```bash
pip3 install protobuf==4.21.7
```
To build Chromium, make sure you are in the `/chromium/src` directory and then
run the following command to open a CLI text editor that will allow you to
configure build settings:
```bash
cd $WORKING_DIR/chromium/src
gn args ./out/Release
```
Then, you need to specify the configuration options to use when compiling Chromium.
This will depend upon the type of training corpus that you want to extract. If you
want to extract a non-thinLTO corpus, you can use the configuration listed below:
```
is_official_build=true
use_thin_lto=false
is_cfi=false
clang_embed_bitcode=true
is_debug=false
symbol_level=0
enable_nacl=false
```
But if you want to extract a thinLTO corpus, you need to use the following config:
```
is_official_build=true
lld_emit_indexes_and_imports=true
is_debug=false
symbol_level=0
enable_nacl=false
```
Immediately after closing the editor, `gn` will generate all of the files
necessary so that `ninja` can execute all the necessary compilation steps.
However, to extract a corpus for ML training, we also need a database of
compilation commands. This can be obtained by running the following command:
```
gn gen ./out/Release --export-compile-commands
```
Then you can build Chromium with the `autoninja` utility:
```bash
autoninja -C ./out/Release
```
A full Chromium compile will take at least an hour on pretty well specced
hardware (ie 96 thread work station) and much longer on lower specced
hardware.
TODO(boomanaiden154): Investigate the source of this assertion error.
**Note:** If the build fails in the last couple steps tripping an assertion in
the linker when compiling a non-thinLTO corpus, you can safely ignore this.
Preparing a corpus for ML training in the non-thinLTO case only requires the
object files that get fed to the linker
**WARNING:** make sure to reinstall a version of protobuf compatible with the
current tf-nightly release used by ml-compiler-opt if you changed versions earlier
to get the Chromium compile working. Reinstalling using the ml-compiler-opt lockfile
should work:
```bash
pip3 uninstall protobuf
pipenv sync --system
```
## Building LLVM
To build LLVM to train ML models, we first need to build TFLite and some
dependencies so that we can embed it within LLVM to load and execute models
on the fly during reinforcement learning. There is a script within this
repository that clones and builds everything automatically and prepares
a CMake cache file that can be passed to CMake during the LLVM build
configuration. Running the script looks like this:
```bash
cd $WORKING_DIR
mkdir tflite
cd tflite
$WORKING_DIR/ml-compiler-opt/buildbot/build_tflite.sh
```
This script should only take a couple minutes to execute as all the libraries
that it pulls and builds are relatively small.
Now, create a new folder to do an LLVM build and configure it using CMake:
```bash
mkdir $WORKING_DIR/llvm-build
cd $WORKING_DIR/llvm-build
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS="clang" \
-C $WORKING_DIR/tflite/tflite.cmake \
$WORKING_DIR/llvm-project/llvm
```
Now you can run the actual build with the following command:
```bash
cmake --build .
```
## ML training
All of the following example commands assume you are working from within
your checkout of the ml-compiler-opt repository:
```bash
cd $WORKING_DIR/ml-compiler-opt
```
To start off training, we need to extract a corpus from the Chromium compile.
The procedure for this will depend upon how you built your corpus, particularly
whether or not you used thinLTO.
### Installing the corpus extraction tooling
Install the corpus extraction tooling:
```bash
pip3 install mlgo-utils
```
Make sure that the local binary directory that Python installs executables into
(typically `~/.local/bin`) is on your `$PATH` so that the invocations below will
work as expected.
### Corpus extraction (non-thinLTO case)
For corpus extraction in the non-thinLTO case, you can simply run the following
command:
```bash
extract_ir \
--cmd_filter="^-O2|-O3" \
--input=$WORKING_DIR/chromium/src/out/Release/compile_commands.json \
--input_type=json \
--llvm_objcopy_path=$WORKING_DIR/llvm-build/bin/llvm-objcopy \
--output_dir=$WORKING_DIR/corpus
```
This command will extract all the relevant bitcode and compilation flags from
the Chromium compile and put them in the `$WORKING_DIR/corpus` directory. No
further processing on the corpus should be needed.
### Corpus extraction (thinLTO case)
Corpus extraction for the thinLTO case is slightly more involved. Start off by
running the following command to do the initial step in the corpus extraction
process:
```bash
extract_ir \
--cmd_filter="^-O2|-O3" \
--llvm_objcopy_path=$WORKING_DIR/llvm-build/bin/llvm-objcopy \
--output_dir=$WORKING_DIR/corpus \
--thinlto_build=local \
--obj_base_dir=$WORKING_DIR/chromium/src/out/Release/obj
```
After this, it is necessary to grab the flags passed to the linker and add
them to the `corpus_description.json` file in the `$WORKING_DIR/corpus` folder.
To find this, it is helpful to look at the actual invocation of the linker. To
see the linker invocation for a target like `chrome`, go to the Chromium build
directory and run the following command:
```bash
cd $WORKING_DIR/chromium/src/out/Release
ninja -t commands chrome
```
(**Note** - `libchrome` instead of `chrome` if targeting Android)
The last command should look something like this:
```bash
python3 "../../build/toolchain/gcc_link_wrapper.py" --output="./chrome" -- ../../third_party/llvm-build/Release+Asserts/bin/clang++ -Wl,--version-script=../../build/linux/chrome.map -Werror -fuse-ld=lld -Wl,--fatal-warnings -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--icf=all -Wl,--color-diagnostics -Wl,-mllvm,-instcombine-lower-dbg-declare=0 -Wl,--save-temps=import -Wl,--thinlto-emit-index-files -flto=thin -Wl,--thinlto-jobs=all -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy=cache_size=10\%:cache_size_bytes=40g:cache_size_files=100000 -Wl,-mllvm,-import-instr-limit=30 -fwhole-program-vtables -Wl,--undefined-version -m64 -no-canonical-prefixes -Wl,-O2 -Wl,--gc-sections -rdynamic -Wl,-z,defs -Wl,--as-needed -nostdlib++ --sysroot=../../build/linux/debian_bullseye_amd64-sysroot -fsanitize=cfi-vcall -fsanitize=cfi-icall -pie -Wl,--disable-new-dtags -Wl,--lto-O2 -o "./chrome" -Wl,--start-group @"./chrome.rsp" -Wl,--end-group -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lnss3 -lnssutil3 -lsmime3 -lplds4 -lplc4 -lnspr4 -latk-1.0 -latk-bridge-2.0 -lcups -lgio-2.0 -ldrm -ldbus-1 -latspi -lresolv -lm -lX11 -lXcomposite -lXdamage -lXext -lXfixes -lXrender -lXrandr -lXtst -lgbm -lEGL -lexpat -luuid -lxcb -lxkbcommon -lXi -lpci -l:libffi_pic.a -lpangocairo-1.0 -lpango-1.0 -lharfbuzz -lcairo -lasound -lz -lstdc++ -lxshmfence
```
From this, we can take out a couple flags that we need for our `corpus_description.json`. A more
precise description of the flags that are needed is available in the [MLGO ThinLTO documentation](../thinlto.md). Given the linker command above, the flags that we need consist of the following:
```
-fPIC
-mllvm,-instcombine-lower-dbg-declare=0
-mllvm,-import-instr-limit=30
-no-canonical-prefixes
-O2
-nostdlib++
--sysroot=../../build/linux/debian_bullseye_amd64-sysroot
-c
```
Make sure to rewrite the `--sysroot` flag to be an absolute path. Setting it
to the output of the following should work:
```bash
echo $WORKING_DIR/chromium/src/build/linux/debian_bullseye_amd64-sysroot
```
Now, add the commands to the `global_command_override` section in the
`corpus_description.json` file. Afterwards, the `global_command_override`
section in the file should look something like the following:
```json
"global_command_override": [
"-fPIC",
"-mllvm",
"-instcombine-lower-dbg-declare=0",
"-mllvm",
"-import-instr-limit=30",
"-no-canonical-prefixes",
"-O2",
"-nostdlib++",
"--sysroot=/path/to/workdir/build/linux/debian_bullseye_amd64-sysroot",
"-c"
]
```
Now you should have a properly prepared Chromium thinLTO corpus.
### PGO Path Rewriting
**NOTE:** This step is only necessary if you are working on a non-thinLTO
corpus. If you are working on a thinLTO corpus, making changes outlined in the
section below will result in an error when you try and generate a default trace.
It is essential to have PGO data when training the regalloc model. However,
the Chromium build process uses relative paths when referencing PGO profiles.
This needs to be fixed by replacing the default `-fprofile-instrument-use-path`
flag with one that uses an absolute path. First we need to know the profile
being used. The profiles are located in
`$WOKRING_DIR/chromium/src/chrome/build/pgo_profiles` and they will have a
`*.profdata` extension. If you have never resynced your checkout, there should
be just one file. Then to find the absolute path of the file, run the following
command:
```bash
realpath -s $WORKING_DIR/chromium/src/chrome/build/pgo_profiles/*.profdata
```
This should output an absolute path to a `*.profdata` file. Then, open the
gin config for the regalloc problem which should be located at
`compiler_opt/rl/regalloc/gin_configs/common.gin` within the ml-compiler-opt
root. Then, replace the line
```
problem_config.flags_to_replace.replace_flags={}
```
With this:
```
problem_config.flags_to_replace.replace_flags = {
'-fprofile-instrument-use-path': '<path to profdata from above command>'
}
```
eg:
```
problem_config.flags_to_replace.replace_flags = {
'-fprofile-instrument-use-path': '/home/aiden/chromium/src/chrome/build/pgo_profiles/chrome-linux-main-1665359392-180123bdd1fedc45c1cbb781e2ad04bd98ab1546.profdata'
}
```
Adding a flag to remove warnings related to PGO profdata hash mismatches
can also be helpful to not clutter up your output. These can be added in
by adjusting this line:
```
problem_config.flags_to_add.add_flags=()
```
to
```
problem_config.flags_to_add.add_flags=('-Wno-backend-plugin',)
```
### Collect the Default Trace and Generate Vocab
Before we run reinforcement training, it is best to train the model using
behavioral cloning on the default heuristic. First off, we need to collect
a trace of the decisions the default heuristic is making:
```bash
PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_default_trace.py \
--data_path=$WORKING_DIR/corpus \
--output_path=$WORKING_DIR/default_trace \
--gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin \
--gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
--sampling_rate=0.2
```
This will compile 20% of the corpus and save all of the regalloc eviction
problem instances it encounters into the `/default_trace` file.
After we have collected a default trace, an optional step is to regenerate the
vocab that is used to normalize some of the values that get fed to the ML
model:
```bash
rm -rf ./compiler_opt/rl/regalloc/vocab
PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_vocab.py \
--input=$WORKING_DIR/default_trace \
--output_dir=./compiler_opt/rl/regalloc/vocab \
--gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin
```
This isn't completely necessary as there are already default values stored
within the repository in the `./compiler_opt/rl/regalloc/vocab` folder,
but it definitely doesn't hurt to regenerate them. If adding/modifying
features, it is necessary to regenerate the vocab.
Now that the vocab is present (or has been regenerated) and we have a default
trace, we can start to train the model using behavioral cloning to mimic the
default heuristic:
```bash
PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/rl/train_bc.py \
--root_dir=$WORKING_DIR/warmstart \
--data_path=$WORKING_DIR/default_trace \
--gin_files=compiler_opt/rl/regalloc/gin_configs/behavioral_cloning_nn_agent.gin
```
This script shouldn't take too long to run on decently powerful hardware.
It will output a trained model in the directory specified by the `--rootdir`
flag, in this case `$WORKING_DIR/warmstart`.
## Reinforcement Learning
Now that we have a model that has been warmstarted based on the default
heuristic, we can now proceed with RL learning so that the model can
improve beyond the performance of the default heuristic:
```bash
PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/rl/train_locally.py \
--root_dir=$WORKING_DIR/output_model \
--data_path=$WORKING_DIR/corpus \
--gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
--gin_files=compiler_opt/rl/regalloc/gin_configs/ppo_nn_agent.gin \
--gin_bindings=train_eval.warmstart_policy_dir=\"$WORKING_DIR/warmstart/saved_policy\"
```
This script will take quite a while to run. Probably most of the day on pretty
powerful hardware (~100+ vCPUs), and potentially many days on less powerful
hardware.
## Evaluating the Policy
If you interested in seeing how the trained policy performs, you can go
through two different avenues. You can run the `generate_default_trace.py`
script to get info on the reward (reduction in number of instructions) over
a specific corpus. However, this still doesn't tell the whole story for the
regalloc case and actual benchmarking is needed. There is also some tooling
available in this repository to run benchmarks in Chromium and the
llvm-test-suite using performance counters to track instructions executed,
loads, and stores, all of which are metrics that show how the model is
performing.
### Evaluating the Model With Reward Metrics
To evaluate a trained policy (for example looking at the output from
RL training in `$WORKING_DIR/output_model/saved_policy`), run the
`generate_default_trace.py` script with some flags to tell it to output
performance data:
```bash
PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_default_trace.py \
--data_path=$WORKING_DIR/corpus \
--gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin \
--gin_bindings=config_registry.get_configuration.implementation=@configs.RegallocEvictionConfig \
--gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
--output_performance_path=$WORKING_DIR/performance_data.csv \
--policy_path=$WORKING_DIR/output_model/saved_policy
```
This will collect reward data over the entire corpus. If you want it to run
faster and don't care about collecting data over the whole corpus, you can
set the `--sample_rate` flag to a desired value to only operate over a portion
of the corpus.
### Evaluating the Model With Benchmarking
See the documentation available [here](../benchmarking.md)
## Deploying the New Policy
To compile the model into LLVM using Tensorflow AOT compilation,
create a new folder and run a CMake configuration with the following
commands:
```bash
mkdir $WORKING_DIR/llvm-release-build
cd $WORKING_DIR/llvm-release-build
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DTENSORFLOW_AOT_PATH=$(python3 -c "import tensorflow; import os; print(os.path.dirname(tensorflow.__file__))") \
-DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_RAEVICT_MODEL_PATH="$WORKING_DIR/output_model/saved_policy" \
$WORKING_DIR/llvm-project/llvm
```
Then run the actual build:
```bash
cmake --build .
```
Now, you should have a build of clang in `$WORKING_DIR/llvm-release-build/bin` that
you can use to compile projects using the ML regalloc eviction heuristic.
To compile with the ML regalloc eviction heuristic, all you need to do
is make sure to pass the `-mllvm -regalloc-enable-advisor=release` flag
to `clang` whenever you're compiling something.