docs/regalloc-demo/demo.md - third_party/github.com/google/ml-compiler-opt - Git at Google

 ## Overview

 In this demo we will look at:

 * Building LLVM with the correct settings to allow for model training
 * Collecting a training corpus for the regalloc model based on Chromium
 * Training the regalloc model on the collected corpus
 * Compiling the trained model into LLVM

 ## Preliminaries

 Set up some environment variables according to where you want to clone/build
 all of the code:
 ```bash
 export WORKING_DIR=~
 ```

 Change the directory to wherever you'd like to put everything.

 ## Get repositories

 ### ml-compiler-opt (this repository)

 Clone the github repo:

 ```bash
 cd $WORKING_DIR
 git clone https://github.com/google/ml-compiler-opt
 ```

 ### LLVM

 Grabbing LLVM should be as simple as running the below command, but if
 something goes awry, make sure to check the
 [official documentation](https://llvm.org/docs/GettingStarted.html).

 ```bash
 git clone https://github.com/llvm/llvm-project.git
 ```

 ### Chromium

 Grabbing Chromium is a bit more involved. The
 [official documentation](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md)
 for Linux based systems (the only platform currently supported by MLGO) is
 available at that link. However, cloning the code should be as simple as
 downloading depot_tools and adding them to your `$PATH`:

 ```bash
 git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
 export PATH="$PATH:$WORKING_DIR/depot_tools"
 ```

 Creating a folder for Chromium and cloning it using the `fetch` utility:
 ```bash
 mkdir chromium
 cd chromium
 fetch --nohooks --no-history chromium
 ```

 **Note:** Running `fetch` with the `--no-history` flag makes your local
 checkout significantly smaller and speeds up the underlying git clone by a
 significant amount. However, if you actually need the history for any
 reason (eg to revert to a previous commit or to work on the Chromium side
 with MLGO stuff), make sure to omit this flag to do a full checkout.

 The fetch command will take at least a couple minutes on a fast internet
 connection and much longer on slower ones.

 Next, we need to modify the `.gclient` file that `fetch` creates in the
 directory that you run it in to make sure that the Chromium PGO profiles
 get checked out:

 ```bash
 sed -i 's/"custom_vars": {},/"custom_vars": { "checkout_pgo_profiles" : True },/' .gclient
 ```

 This `sed` command will set the necessary variable correctly. After this,
 you can move into the `src` directory that `fetch` created that contains
 the actual Chromium codebase.

 Now that this is all in place, you need to run the Chromium hooks in order to
 get the development environment ready for a full compilation:

 ```bash
 gclient runhooks
 ```

 ## Install Dependencies

 If you're working in a Debian based docker container, it will most likely
 not come by default with `sudo`. It isn't strictly necessary to install,
 but it makes it easier to copypaste the installation commands below and it
 also enables the use of the Chromium dependency auto-installation script:

 ```bash
 apt-get install sudo
 ```

 First, install some base dependencies that will be needed when building
 LLVM:

 ```bash
 sudo apt-get install cmake ninja-build lld
 ```

 Now, install the Chromium dependencies using the auto-installation script:
 ```bash
 $WORKING_DIR/chromium/src/build/install-build-deps.sh
 ```

 **Note:** These installation commands are all designed to be run on Debian
 based distros. However, adapting to other distros with alternative package
 management systems should not be too difficult. The packages for the first
 command should be very similarly named and the
 [official Chromium documentation](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md)
 has info on dependency installation for their build process on other common
 distros.

 Also make sure that you install the Python dependencies for the
 ml-compiler-opt repository:

 ```bash
 cd $WORKING_DIR/ml-compiler-opt
 pip3 install pipenv
 pipenv sync --system
 ```

 If you plan on doing development work on this checkout of ml-compiler-opt,
 use the `/ml-compiler-opt/requirements-ci.txt` requirements file to install
 some additional development dependencies.

 ## Building Chromium

 **WARNING:** Currently, Chromium only builds with protobuf 4.x.x while Tensorflow
 requires protobuf 3.x.x. In order to make sure that Chromium compiles correctly
 you can either use a virtual environment and install protobuf 4.x.x there (currently
 just the latest version), or you can install protobuf 4.x.x over the currently
 installed version and then undo it later after the compile is complete. This tutorial
 assumes no usage of virtual environments. Install a compatible version of protobuf:

 ```bash
 pip3 install protobuf==4.21.7
 ```

 To build Chromium, make sure you are in the `/chromium/src` directory and then
 run the following command to open a CLI text editor that will allow you to
 configure build settings:

 ```bash
 cd $WORKING_DIR/chromium/src
 gn args ./out/Release
 ```

 Then, you need to specify the configuration options to use when compiling Chromium.
 This will depend upon the type of training corpus that you want to extract. If you
 want to extract a non-thinLTO corpus, you can use the configuration listed below:

 ```
 is_official_build=true
 use_thin_lto=false
 is_cfi=false
 clang_embed_bitcode=true
 is_debug=false
 symbol_level=0
 enable_nacl=false
 ```

 But if you want to extract a thinLTO corpus, you need to use the following config:

 ```
 is_official_build=true
 lld_emit_indexes_and_imports=true
 is_debug=false
 symbol_level=0
 enable_nacl=false
 ```

 Immediately after closing the editor, `gn` will generate all of the files
 necessary so that `ninja` can execute all the necessary compilation steps.
 However, to extract a corpus for ML training, we also need a database of
 compilation commands. This can be obtained by running the following command:

 ```
 gn gen ./out/Release --export-compile-commands
 ```

 Then you can build Chromium with the `autoninja` utility:

 ```bash
 autoninja -C ./out/Release
 ```

 A full Chromium compile will take at least an hour on pretty well specced
 hardware (ie 96 thread work station) and much longer on lower specced
 hardware.

 TODO(boomanaiden154): Investigate the source of this assertion error.

 **Note:** If the build fails in the last couple steps tripping an assertion in
 the linker when compiling a non-thinLTO corpus, you can safely ignore this.
 Preparing a corpus for ML training in the non-thinLTO case only requires the
 object files that get fed to the linker

 **WARNING:** make sure to reinstall a version of protobuf compatible with the
 current tf-nightly release used by ml-compiler-opt if you changed versions earlier
 to get the Chromium compile working. Reinstalling using the ml-compiler-opt lockfile
 should work:

 ```bash
 pip3 uninstall protobuf
 pipenv sync --system
 ```

 ## Building LLVM

 To build LLVM to train ML models, we first need to build TFLite and some
 dependencies so that we can embed it within LLVM to load and execute models
 on the fly during reinforcement learning. There is a script within this
 repository that clones and builds everything automatically and prepares
 a CMake cache file that can be passed to CMake during the LLVM build
 configuration. Running the script looks like this:

 ```bash
 cd $WORKING_DIR
 mkdir tflite
 cd tflite
 $WORKING_DIR/ml-compiler-opt/buildbot/build_tflite.sh
 ```

 This script should only take a couple minutes to execute as all the libraries
 that it pulls and builds are relatively small.

 Now, create a new folder to do an LLVM build and configure it using CMake:
 ```bash
 mkdir $WORKING_DIR/llvm-build
 cd $WORKING_DIR/llvm-build
 cmake -G Ninja \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_PROJECTS="clang" \
   -C $WORKING_DIR/tflite/tflite.cmake \
   $WORKING_DIR/llvm-project/llvm
 ```

 Now you can run the actual build with the following command:
 ```bash
 cmake --build .
 ```

 ## ML training

 All of the following example commands assume you are working from within
 your checkout of the ml-compiler-opt repository:

 ```bash
 cd $WORKING_DIR/ml-compiler-opt
 ```

 To start off training, we need to extract a corpus from the Chromium compile.
 The procedure for this will depend upon how you built your corpus, particularly
 whether or not you used thinLTO.

 ### Installing the corpus extraction tooling

 Install the corpus extraction tooling:

 ```bash
 pip3 install mlgo-utils
 ```

 Make sure that the local binary directory that Python installs executables into
 (typically `~/.local/bin`) is on your `$PATH` so that the invocations below will
 work as expected.

 ### Corpus extraction (non-thinLTO case)

 For corpus extraction in the non-thinLTO case, you can simply run the following
 command:

 ```bash
 extract_ir \
   --cmd_filter="^-O2|-O3" \
   --input=$WORKING_DIR/chromium/src/out/Release/compile_commands.json \
   --input_type=json \
   --llvm_objcopy_path=$WORKING_DIR/llvm-build/bin/llvm-objcopy \
   --output_dir=$WORKING_DIR/corpus
 ```

 This command will extract all the relevant bitcode and compilation flags from
 the Chromium compile and put them in the `$WORKING_DIR/corpus` directory. No
 further processing on the corpus should be needed.

 ### Corpus extraction (thinLTO case)

 Corpus extraction for the thinLTO case is slightly more involved. Start off by
 running the following command to do the initial step in the corpus extraction
 process:

 ```bash
 extract_ir \
   --cmd_filter="^-O2|-O3" \
   --llvm_objcopy_path=$WORKING_DIR/llvm-build/bin/llvm-objcopy \
   --output_dir=$WORKING_DIR/corpus \
   --thinlto_build=local \
   --obj_base_dir=$WORKING_DIR/chromium/src/out/Release/obj
 ```

 After this, it is necessary to grab the flags passed to the linker and add
 them to the `corpus_description.json` file in the `$WORKING_DIR/corpus` folder.
 To find this, it is helpful to look at the actual invocation of the linker. To
 see the linker invocation for a target like `chrome`, go to the Chromium build
 directory and run the following command:

 ```bash
 cd $WORKING_DIR/chromium/src/out/Release
 ninja -t commands chrome
 ```
 (**Note** - `libchrome` instead of `chrome` if targeting Android)

 The last command should look something like this:

 ```bash
 python3 "../../build/toolchain/gcc_link_wrapper.py" --output="./chrome" -- ../../third_party/llvm-build/Release+Asserts/bin/clang++ -Wl,--version-script=../../build/linux/chrome.map -Werror -fuse-ld=lld -Wl,--fatal-warnings -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--icf=all -Wl,--color-diagnostics -Wl,-mllvm,-instcombine-lower-dbg-declare=0 -Wl,--save-temps=import -Wl,--thinlto-emit-index-files -flto=thin -Wl,--thinlto-jobs=all -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy=cache_size=10\%:cache_size_bytes=40g:cache_size_files=100000 -Wl,-mllvm,-import-instr-limit=30 -fwhole-program-vtables -Wl,--undefined-version -m64 -no-canonical-prefixes -Wl,-O2 -Wl,--gc-sections -rdynamic -Wl,-z,defs -Wl,--as-needed -nostdlib++ --sysroot=../../build/linux/debian_bullseye_amd64-sysroot -fsanitize=cfi-vcall -fsanitize=cfi-icall -pie -Wl,--disable-new-dtags -Wl,--lto-O2 -o "./chrome" -Wl,--start-group @"./chrome.rsp"  -Wl,--end-group  -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lnss3 -lnssutil3 -lsmime3 -lplds4 -lplc4 -lnspr4 -latk-1.0 -latk-bridge-2.0 -lcups -lgio-2.0 -ldrm -ldbus-1 -latspi -lresolv -lm -lX11 -lXcomposite -lXdamage -lXext -lXfixes -lXrender -lXrandr -lXtst -lgbm -lEGL -lexpat -luuid -lxcb -lxkbcommon -lXi -lpci -l:libffi_pic.a -lpangocairo-1.0 -lpango-1.0 -lharfbuzz -lcairo -lasound -lz -lstdc++ -lxshmfence
 ```

 From this, we can take out a couple flags that we need for our `corpus_description.json`. A more
 precise description of the flags that are needed is available in the [MLGO ThinLTO documentation](../thinlto.md). Given the linker command above, the flags that we need consist of the following:

 ```
 -fPIC
 -mllvm,-instcombine-lower-dbg-declare=0
 -mllvm,-import-instr-limit=30
 -no-canonical-prefixes
 -O2
 -nostdlib++
 --sysroot=../../build/linux/debian_bullseye_amd64-sysroot
 -c
 ```

 Make sure to rewrite the `--sysroot` flag to be an absolute path. Setting it
 to the output of the following should work:

 ```bash
 echo $WORKING_DIR/chromium/src/build/linux/debian_bullseye_amd64-sysroot
 ```

 Now, add the commands to the `global_command_override` section in the
 `corpus_description.json` file. Afterwards, the `global_command_override`
 section in the file should look something like the following:

 ```json
 "global_command_override": [
   "-fPIC",
   "-mllvm",
   "-instcombine-lower-dbg-declare=0",
   "-mllvm",
   "-import-instr-limit=30",
   "-no-canonical-prefixes",
   "-O2",
   "-nostdlib++",
   "--sysroot=/path/to/workdir/build/linux/debian_bullseye_amd64-sysroot",
   "-c"
 ]
 ```

 Now you should have a properly prepared Chromium thinLTO corpus.

 ### PGO Path Rewriting

 **NOTE:** This step is only necessary if you are working on a non-thinLTO
 corpus. If you are working on a thinLTO corpus, making changes outlined in the
 section below will result in an error when you try and generate a default trace.

 It is essential to have PGO data when training the regalloc model. However,
 the Chromium build process uses relative paths when referencing PGO profiles.
 This needs to be fixed by replacing the default `-fprofile-instrument-use-path`
 flag with one that uses an absolute path. First we need to know the profile
 being used. The profiles are located in
 `$WOKRING_DIR/chromium/src/chrome/build/pgo_profiles` and they will have a
 `*.profdata` extension. If you have never resynced your checkout, there should
 be just one file. Then to find the absolute path of the file, run the following
 command:

 ```bash
 realpath -s $WORKING_DIR/chromium/src/chrome/build/pgo_profiles/*.profdata
 ```

 This should output an absolute path to a `*.profdata` file. Then, open the
 gin config for the regalloc problem which should be located at
 `compiler_opt/rl/regalloc/gin_configs/common.gin` within the ml-compiler-opt
 root. Then, replace the line

 ```
 problem_config.flags_to_replace.replace_flags={}
 ```

 With this:
 ```
 problem_config.flags_to_replace.replace_flags = {
   '-fprofile-instrument-use-path': '<path to profdata from above command>'
 }
 ```

 eg:

 ```
 problem_config.flags_to_replace.replace_flags = {
   '-fprofile-instrument-use-path': '/home/aiden/chromium/src/chrome/build/pgo_profiles/chrome-linux-main-1665359392-180123bdd1fedc45c1cbb781e2ad04bd98ab1546.profdata'
 }
 ```

 Adding a flag to remove warnings related to PGO profdata hash mismatches
 can also be helpful to not clutter up your output. These can be added in
 by adjusting this line:

 ```
 problem_config.flags_to_add.add_flags=()
 ```

 to

 ```
 problem_config.flags_to_add.add_flags=('-Wno-backend-plugin',)
 ```

 ### Collect the Default Trace and Generate Vocab

 Before we run reinforcement training, it is best to train the model using
 behavioral cloning on the default heuristic. First off, we need to collect
 a trace of the decisions the default heuristic is making:

 ```bash
 PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_default_trace.py \
   --data_path=$WORKING_DIR/corpus \
   --output_path=$WORKING_DIR/default_trace \
   --gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin \
   --gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
   --sampling_rate=0.2
 ```

 This will compile 20% of the corpus and save all of the regalloc eviction
 problem instances it encounters into the `/default_trace` file.

 After we have collected a default trace, an optional step is to regenerate the
 vocab that is used to normalize some of the values that get fed to the ML
 model:

 ```bash
 rm -rf ./compiler_opt/rl/regalloc/vocab
 PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_vocab.py \
   --input=$WORKING_DIR/default_trace \
   --output_dir=./compiler_opt/rl/regalloc/vocab \
   --gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin
 ```

 This isn't completely necessary as there are already default values stored
 within the repository in the `./compiler_opt/rl/regalloc/vocab` folder,
 but it definitely doesn't hurt to regenerate them. If adding/modifying
 features, it is necessary to regenerate the vocab.

 Now that the vocab is present (or has been regenerated) and we have a default
 trace, we can start to train the model using behavioral cloning to mimic the
 default heuristic:

 ```bash
 PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/rl/train_bc.py \
   --root_dir=$WORKING_DIR/warmstart \
   --data_path=$WORKING_DIR/default_trace \
   --gin_files=compiler_opt/rl/regalloc/gin_configs/behavioral_cloning_nn_agent.gin
 ```

 This script shouldn't take too long to run on decently powerful hardware.
 It will output a trained model in the directory specified by the `--rootdir`
 flag, in this case `$WORKING_DIR/warmstart`.

 ## Reinforcement Learning

 Now that we have a model that has been warmstarted based on the default
 heuristic, we can now proceed with RL learning so that the model can
 improve beyond the performance of the default heuristic:

 ```bash
 PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/rl/train_locally.py \
   --root_dir=$WORKING_DIR/output_model \
   --data_path=$WORKING_DIR/corpus \
   --gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
   --gin_files=compiler_opt/rl/regalloc/gin_configs/ppo_nn_agent.gin \
   --gin_bindings=train_eval.warmstart_policy_dir=\"$WORKING_DIR/warmstart/saved_policy\"
 ```

 This script will take quite a while to run. Probably most of the day on pretty
 powerful hardware (~100+ vCPUs), and potentially many days on less powerful
 hardware.

 ## Evaluating the Policy

 If you interested in seeing how the trained policy performs, you can go
 through two different avenues. You can run the `generate_default_trace.py`
 script to get info on the reward (reduction in number of instructions) over
 a specific corpus. However, this still doesn't tell the whole story for the
 regalloc case and actual benchmarking is needed. There is also some tooling
 available in this repository to run benchmarks in Chromium and the
 llvm-test-suite using performance counters to track instructions executed,
 loads, and stores, all of which are metrics that show how the model is
 performing.

 ### Evaluating the Model With Reward Metrics

 To evaluate a trained policy (for example looking at the output from
 RL training in `$WORKING_DIR/output_model/saved_policy`), run the
 `generate_default_trace.py` script with some flags to tell it to output
 performance data:

 ```bash
 PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_default_trace.py \
   --data_path=$WORKING_DIR/corpus \
   --gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin \
   --gin_bindings=config_registry.get_configuration.implementation=@configs.RegallocEvictionConfig \
   --gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
   --output_performance_path=$WORKING_DIR/performance_data.csv \
   --policy_path=$WORKING_DIR/output_model/saved_policy
 ```

 This will collect reward data over the entire corpus. If you want it to run
 faster and don't care about collecting data over the whole corpus, you can
 set the `--sample_rate` flag to a desired value to only operate over a portion
 of the corpus.

 ### Evaluating the Model With Benchmarking

 See the documentation available [here](../benchmarking.md)

 ## Deploying the New Policy

 To compile the model into LLVM using Tensorflow AOT compilation,
 create a new folder and run a CMake configuration with the following
 commands:

 ```bash
 mkdir $WORKING_DIR/llvm-release-build
 cd $WORKING_DIR/llvm-release-build
 cmake -G Ninja \
   -DCMAKE_BUILD_TYPE=Release \
   -DTENSORFLOW_AOT_PATH=$(python3 -c "import tensorflow; import os; print(os.path.dirname(tensorflow.__file__))") \
   -DLLVM_ENABLE_PROJECTS="clang" \
   -DLLVM_RAEVICT_MODEL_PATH="$WORKING_DIR/output_model/saved_policy" \
   $WORKING_DIR/llvm-project/llvm
 ```

 Then run the actual build:

 ```bash
 cmake --build .
 ```

 Now, you should have a build of clang in `$WORKING_DIR/llvm-release-build/bin` that
 you can use to compile projects using the ML regalloc eviction heuristic.
 To compile with the ML regalloc eviction heuristic, all you need to do
 is make sure to pass the `-mllvm -regalloc-enable-advisor=release` flag
 to `clang` whenever you're compiling something.
	## Overview

	In this demo we will look at:

	* Building LLVM with the correct settings to allow for model training
	* Collecting a training corpus for the regalloc model based on Chromium
	* Training the regalloc model on the collected corpus
	* Compiling the trained model into LLVM

	## Preliminaries

	Set up some environment variables according to where you want to clone/build
	all of the code:
	```bash
	export WORKING_DIR=~
	```

	Change the directory to wherever you'd like to put everything.

	## Get repositories

	### ml-compiler-opt (this repository)

	Clone the github repo:

	```bash
	cd $WORKING_DIR
	git clone https://github.com/google/ml-compiler-opt
	```

	### LLVM

	Grabbing LLVM should be as simple as running the below command, but if
	something goes awry, make sure to check the
	[official documentation](https://llvm.org/docs/GettingStarted.html).

	```bash
	git clone https://github.com/llvm/llvm-project.git
	```

	### Chromium

	Grabbing Chromium is a bit more involved. The
	[official documentation](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md)
	for Linux based systems (the only platform currently supported by MLGO) is
	available at that link. However, cloning the code should be as simple as
	downloading depot_tools and adding them to your `$PATH`:

	```bash
	git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
	export PATH="$PATH:$WORKING_DIR/depot_tools"
	```

	Creating a folder for Chromium and cloning it using the `fetch` utility:
	```bash
	mkdir chromium
	cd chromium
	fetch --nohooks --no-history chromium
	```

	Note: Running `fetch` with the `--no-history` flag makes your local
	checkout significantly smaller and speeds up the underlying git clone by a
	significant amount. However, if you actually need the history for any
	reason (eg to revert to a previous commit or to work on the Chromium side
	with MLGO stuff), make sure to omit this flag to do a full checkout.

	The fetch command will take at least a couple minutes on a fast internet
	connection and much longer on slower ones.

	Next, we need to modify the `.gclient` file that `fetch` creates in the
	directory that you run it in to make sure that the Chromium PGO profiles
	get checked out:

	```bash
	sed -i 's/"custom_vars": {},/"custom_vars": { "checkout_pgo_profiles" : True },/' .gclient
	```

	This `sed` command will set the necessary variable correctly. After this,
	you can move into the `src` directory that `fetch` created that contains
	the actual Chromium codebase.

	Now that this is all in place, you need to run the Chromium hooks in order to
	get the development environment ready for a full compilation:

	```bash
	gclient runhooks
	```

	## Install Dependencies

	If you're working in a Debian based docker container, it will most likely
	not come by default with `sudo`. It isn't strictly necessary to install,
	but it makes it easier to copypaste the installation commands below and it
	also enables the use of the Chromium dependency auto-installation script:

	```bash
	apt-get install sudo
	```

	First, install some base dependencies that will be needed when building
	LLVM:

	```bash
	sudo apt-get install cmake ninja-build lld
	```

	Now, install the Chromium dependencies using the auto-installation script:
	```bash
	$WORKING_DIR/chromium/src/build/install-build-deps.sh
	```

	Note: These installation commands are all designed to be run on Debian
	based distros. However, adapting to other distros with alternative package
	management systems should not be too difficult. The packages for the first
	command should be very similarly named and the
	[official Chromium documentation](https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md)
	has info on dependency installation for their build process on other common
	distros.

	Also make sure that you install the Python dependencies for the
	ml-compiler-opt repository:

	```bash
	cd $WORKING_DIR/ml-compiler-opt
	pip3 install pipenv
	pipenv sync --system
	```

	If you plan on doing development work on this checkout of ml-compiler-opt,
	use the `/ml-compiler-opt/requirements-ci.txt` requirements file to install
	some additional development dependencies.

	## Building Chromium

	WARNING: Currently, Chromium only builds with protobuf 4.x.x while Tensorflow
	requires protobuf 3.x.x. In order to make sure that Chromium compiles correctly
	you can either use a virtual environment and install protobuf 4.x.x there (currently
	just the latest version), or you can install protobuf 4.x.x over the currently
	installed version and then undo it later after the compile is complete. This tutorial
	assumes no usage of virtual environments. Install a compatible version of protobuf:

	```bash
	pip3 install protobuf==4.21.7
	```

	To build Chromium, make sure you are in the `/chromium/src` directory and then
	run the following command to open a CLI text editor that will allow you to
	configure build settings:

	```bash
	cd $WORKING_DIR/chromium/src
	gn args ./out/Release
	```

	Then, you need to specify the configuration options to use when compiling Chromium.
	This will depend upon the type of training corpus that you want to extract. If you
	want to extract a non-thinLTO corpus, you can use the configuration listed below:

	```
	is_official_build=true
	use_thin_lto=false
	is_cfi=false
	clang_embed_bitcode=true
	is_debug=false
	symbol_level=0
	enable_nacl=false
	```

	But if you want to extract a thinLTO corpus, you need to use the following config:

	```
	is_official_build=true
	lld_emit_indexes_and_imports=true
	is_debug=false
	symbol_level=0
	enable_nacl=false
	```

	Immediately after closing the editor, `gn` will generate all of the files
	necessary so that `ninja` can execute all the necessary compilation steps.
	However, to extract a corpus for ML training, we also need a database of
	compilation commands. This can be obtained by running the following command:

	```
	gn gen ./out/Release --export-compile-commands
	```

	Then you can build Chromium with the `autoninja` utility:

	```bash
	autoninja -C ./out/Release
	```

	A full Chromium compile will take at least an hour on pretty well specced
	hardware (ie 96 thread work station) and much longer on lower specced
	hardware.

	TODO(boomanaiden154): Investigate the source of this assertion error.

	Note: If the build fails in the last couple steps tripping an assertion in
	the linker when compiling a non-thinLTO corpus, you can safely ignore this.
	Preparing a corpus for ML training in the non-thinLTO case only requires the
	object files that get fed to the linker

	WARNING: make sure to reinstall a version of protobuf compatible with the
	current tf-nightly release used by ml-compiler-opt if you changed versions earlier
	to get the Chromium compile working. Reinstalling using the ml-compiler-opt lockfile
	should work:

	```bash
	pip3 uninstall protobuf
	pipenv sync --system
	```

	## Building LLVM

	To build LLVM to train ML models, we first need to build TFLite and some
	dependencies so that we can embed it within LLVM to load and execute models
	on the fly during reinforcement learning. There is a script within this
	repository that clones and builds everything automatically and prepares
	a CMake cache file that can be passed to CMake during the LLVM build
	configuration. Running the script looks like this:

	```bash
	cd $WORKING_DIR
	mkdir tflite
	cd tflite
	$WORKING_DIR/ml-compiler-opt/buildbot/build_tflite.sh
	```

	This script should only take a couple minutes to execute as all the libraries
	that it pulls and builds are relatively small.

	Now, create a new folder to do an LLVM build and configure it using CMake:
	```bash
	mkdir $WORKING_DIR/llvm-build
	cd $WORKING_DIR/llvm-build
	cmake -G Ninja \
	-DCMAKE_BUILD_TYPE=Release \
	-DLLVM_ENABLE_PROJECTS="clang" \
	-C $WORKING_DIR/tflite/tflite.cmake \
	$WORKING_DIR/llvm-project/llvm
	```

	Now you can run the actual build with the following command:
	```bash
	cmake --build .
	```

	## ML training

	All of the following example commands assume you are working from within
	your checkout of the ml-compiler-opt repository:

	```bash
	cd $WORKING_DIR/ml-compiler-opt
	```

	To start off training, we need to extract a corpus from the Chromium compile.
	The procedure for this will depend upon how you built your corpus, particularly
	whether or not you used thinLTO.

	### Installing the corpus extraction tooling

	Install the corpus extraction tooling:

	```bash
	pip3 install mlgo-utils
	```

	Make sure that the local binary directory that Python installs executables into
	(typically `~/.local/bin`) is on your `$PATH` so that the invocations below will
	work as expected.

	### Corpus extraction (non-thinLTO case)

	For corpus extraction in the non-thinLTO case, you can simply run the following
	command:

	```bash
	extract_ir \
	--cmd_filter="^-O2\|-O3" \
	--input=$WORKING_DIR/chromium/src/out/Release/compile_commands.json \
	--input_type=json \
	--llvm_objcopy_path=$WORKING_DIR/llvm-build/bin/llvm-objcopy \
	--output_dir=$WORKING_DIR/corpus
	```

	This command will extract all the relevant bitcode and compilation flags from
	the Chromium compile and put them in the `$WORKING_DIR/corpus` directory. No
	further processing on the corpus should be needed.

	### Corpus extraction (thinLTO case)

	Corpus extraction for the thinLTO case is slightly more involved. Start off by
	running the following command to do the initial step in the corpus extraction
	process:

	```bash
	extract_ir \
	--cmd_filter="^-O2\|-O3" \
	--llvm_objcopy_path=$WORKING_DIR/llvm-build/bin/llvm-objcopy \
	--output_dir=$WORKING_DIR/corpus \
	--thinlto_build=local \
	--obj_base_dir=$WORKING_DIR/chromium/src/out/Release/obj
	```

	After this, it is necessary to grab the flags passed to the linker and add
	them to the `corpus_description.json` file in the `$WORKING_DIR/corpus` folder.
	To find this, it is helpful to look at the actual invocation of the linker. To
	see the linker invocation for a target like `chrome`, go to the Chromium build
	directory and run the following command:

	```bash
	cd $WORKING_DIR/chromium/src/out/Release
	ninja -t commands chrome
	```
	(Note - `libchrome` instead of `chrome` if targeting Android)

	The last command should look something like this:

	```bash
	python3 "../../build/toolchain/gcc_link_wrapper.py" --output="./chrome" -- ../../third_party/llvm-build/Release+Asserts/bin/clang++ -Wl,--version-script=../../build/linux/chrome.map -Werror -fuse-ld=lld -Wl,--fatal-warnings -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--icf=all -Wl,--color-diagnostics -Wl,-mllvm,-instcombine-lower-dbg-declare=0 -Wl,--save-temps=import -Wl,--thinlto-emit-index-files -flto=thin -Wl,--thinlto-jobs=all -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy=cache_size=10\%:cache_size_bytes=40g:cache_size_files=100000 -Wl,-mllvm,-import-instr-limit=30 -fwhole-program-vtables -Wl,--undefined-version -m64 -no-canonical-prefixes -Wl,-O2 -Wl,--gc-sections -rdynamic -Wl,-z,defs -Wl,--as-needed -nostdlib++ --sysroot=../../build/linux/debian_bullseye_amd64-sysroot -fsanitize=cfi-vcall -fsanitize=cfi-icall -pie -Wl,--disable-new-dtags -Wl,--lto-O2 -o "./chrome" -Wl,--start-group @"./chrome.rsp" -Wl,--end-group -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lnss3 -lnssutil3 -lsmime3 -lplds4 -lplc4 -lnspr4 -latk-1.0 -latk-bridge-2.0 -lcups -lgio-2.0 -ldrm -ldbus-1 -latspi -lresolv -lm -lX11 -lXcomposite -lXdamage -lXext -lXfixes -lXrender -lXrandr -lXtst -lgbm -lEGL -lexpat -luuid -lxcb -lxkbcommon -lXi -lpci -l:libffi_pic.a -lpangocairo-1.0 -lpango-1.0 -lharfbuzz -lcairo -lasound -lz -lstdc++ -lxshmfence
	```

	From this, we can take out a couple flags that we need for our `corpus_description.json`. A more
	precise description of the flags that are needed is available in the [MLGO ThinLTO documentation](../thinlto.md). Given the linker command above, the flags that we need consist of the following:

	```
	-fPIC
	-mllvm,-instcombine-lower-dbg-declare=0
	-mllvm,-import-instr-limit=30
	-no-canonical-prefixes
	-O2
	-nostdlib++
	--sysroot=../../build/linux/debian_bullseye_amd64-sysroot
	-c
	```

	Make sure to rewrite the `--sysroot` flag to be an absolute path. Setting it
	to the output of the following should work:

	```bash
	echo $WORKING_DIR/chromium/src/build/linux/debian_bullseye_amd64-sysroot
	```

	Now, add the commands to the `global_command_override` section in the
	`corpus_description.json` file. Afterwards, the `global_command_override`
	section in the file should look something like the following:

	```json
	"global_command_override": [
	"-fPIC",
	"-mllvm",
	"-instcombine-lower-dbg-declare=0",
	"-mllvm",
	"-import-instr-limit=30",
	"-no-canonical-prefixes",
	"-O2",
	"-nostdlib++",
	"--sysroot=/path/to/workdir/build/linux/debian_bullseye_amd64-sysroot",
	"-c"
	]
	```

	Now you should have a properly prepared Chromium thinLTO corpus.

	### PGO Path Rewriting

	NOTE: This step is only necessary if you are working on a non-thinLTO
	corpus. If you are working on a thinLTO corpus, making changes outlined in the
	section below will result in an error when you try and generate a default trace.

	It is essential to have PGO data when training the regalloc model. However,
	the Chromium build process uses relative paths when referencing PGO profiles.
	This needs to be fixed by replacing the default `-fprofile-instrument-use-path`
	flag with one that uses an absolute path. First we need to know the profile
	being used. The profiles are located in
	`$WOKRING_DIR/chromium/src/chrome/build/pgo_profiles` and they will have a
	`*.profdata` extension. If you have never resynced your checkout, there should
	be just one file. Then to find the absolute path of the file, run the following
	command:

	```bash
	realpath -s $WORKING_DIR/chromium/src/chrome/build/pgo_profiles/*.profdata
	```

	This should output an absolute path to a `*.profdata` file. Then, open the
	gin config for the regalloc problem which should be located at
	`compiler_opt/rl/regalloc/gin_configs/common.gin` within the ml-compiler-opt
	root. Then, replace the line

	```
	problem_config.flags_to_replace.replace_flags={}
	```

	With this:
	```
	problem_config.flags_to_replace.replace_flags = {
	'-fprofile-instrument-use-path': '<path to profdata from above command>'
	}
	```

	eg:

	```
	problem_config.flags_to_replace.replace_flags = {
	'-fprofile-instrument-use-path': '/home/aiden/chromium/src/chrome/build/pgo_profiles/chrome-linux-main-1665359392-180123bdd1fedc45c1cbb781e2ad04bd98ab1546.profdata'
	}
	```

	Adding a flag to remove warnings related to PGO profdata hash mismatches
	can also be helpful to not clutter up your output. These can be added in
	by adjusting this line:

	```
	problem_config.flags_to_add.add_flags=()
	```

	to

	```
	problem_config.flags_to_add.add_flags=('-Wno-backend-plugin',)
	```

	### Collect the Default Trace and Generate Vocab

	Before we run reinforcement training, it is best to train the model using
	behavioral cloning on the default heuristic. First off, we need to collect
	a trace of the decisions the default heuristic is making:

	```bash
	PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_default_trace.py \
	--data_path=$WORKING_DIR/corpus \
	--output_path=$WORKING_DIR/default_trace \
	--gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin \
	--gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
	--sampling_rate=0.2
	```

	This will compile 20% of the corpus and save all of the regalloc eviction
	problem instances it encounters into the `/default_trace` file.

	After we have collected a default trace, an optional step is to regenerate the
	vocab that is used to normalize some of the values that get fed to the ML
	model:

	```bash
	rm -rf ./compiler_opt/rl/regalloc/vocab
	PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_vocab.py \
	--input=$WORKING_DIR/default_trace \
	--output_dir=./compiler_opt/rl/regalloc/vocab \
	--gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin
	```

	This isn't completely necessary as there are already default values stored
	within the repository in the `./compiler_opt/rl/regalloc/vocab` folder,
	but it definitely doesn't hurt to regenerate them. If adding/modifying
	features, it is necessary to regenerate the vocab.

	Now that the vocab is present (or has been regenerated) and we have a default
	trace, we can start to train the model using behavioral cloning to mimic the
	default heuristic:

	```bash
	PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/rl/train_bc.py \
	--root_dir=$WORKING_DIR/warmstart \
	--data_path=$WORKING_DIR/default_trace \
	--gin_files=compiler_opt/rl/regalloc/gin_configs/behavioral_cloning_nn_agent.gin
	```

	This script shouldn't take too long to run on decently powerful hardware.
	It will output a trained model in the directory specified by the `--rootdir`
	flag, in this case `$WORKING_DIR/warmstart`.

	## Reinforcement Learning

	Now that we have a model that has been warmstarted based on the default
	heuristic, we can now proceed with RL learning so that the model can
	improve beyond the performance of the default heuristic:

	```bash
	PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/rl/train_locally.py \
	--root_dir=$WORKING_DIR/output_model \
	--data_path=$WORKING_DIR/corpus \
	--gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
	--gin_files=compiler_opt/rl/regalloc/gin_configs/ppo_nn_agent.gin \
	--gin_bindings=train_eval.warmstart_policy_dir=\"$WORKING_DIR/warmstart/saved_policy\"
	```

	This script will take quite a while to run. Probably most of the day on pretty
	powerful hardware (~100+ vCPUs), and potentially many days on less powerful
	hardware.

	## Evaluating the Policy

	If you interested in seeing how the trained policy performs, you can go
	through two different avenues. You can run the `generate_default_trace.py`
	script to get info on the reward (reduction in number of instructions) over
	a specific corpus. However, this still doesn't tell the whole story for the
	regalloc case and actual benchmarking is needed. There is also some tooling
	available in this repository to run benchmarks in Chromium and the
	llvm-test-suite using performance counters to track instructions executed,
	loads, and stores, all of which are metrics that show how the model is
	performing.

	### Evaluating the Model With Reward Metrics

	To evaluate a trained policy (for example looking at the output from
	RL training in `$WORKING_DIR/output_model/saved_policy`), run the
	`generate_default_trace.py` script with some flags to tell it to output
	performance data:

	```bash
	PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/generate_default_trace.py \
	--data_path=$WORKING_DIR/corpus \
	--gin_files=compiler_opt/rl/regalloc/gin_configs/common.gin \
	--gin_bindings=config_registry.get_configuration.implementation=@configs.RegallocEvictionConfig \
	--gin_bindings=clang_path="'$WORKING_DIR/llvm-build/bin/clang'" \
	--output_performance_path=$WORKING_DIR/performance_data.csv \
	--policy_path=$WORKING_DIR/output_model/saved_policy
	```

	This will collect reward data over the entire corpus. If you want it to run
	faster and don't care about collecting data over the whole corpus, you can
	set the `--sample_rate` flag to a desired value to only operate over a portion
	of the corpus.

	### Evaluating the Model With Benchmarking

	See the documentation available [here](../benchmarking.md)

	## Deploying the New Policy

	To compile the model into LLVM using Tensorflow AOT compilation,
	create a new folder and run a CMake configuration with the following
	commands:

	```bash
	mkdir $WORKING_DIR/llvm-release-build
	cd $WORKING_DIR/llvm-release-build
	cmake -G Ninja \
	-DCMAKE_BUILD_TYPE=Release \
	-DTENSORFLOW_AOT_PATH=$(python3 -c "import tensorflow; import os; print(os.path.dirname(tensorflow.__file__))") \
	-DLLVM_ENABLE_PROJECTS="clang" \
	-DLLVM_RAEVICT_MODEL_PATH="$WORKING_DIR/output_model/saved_policy" \
	$WORKING_DIR/llvm-project/llvm
	```

	Then run the actual build:

	```bash
	cmake --build .
	```

	Now, you should have a build of clang in `$WORKING_DIR/llvm-release-build/bin` that
	you can use to compile projects using the ML regalloc eviction heuristic.
	To compile with the ML regalloc eviction heuristic, all you need to do
	is make sure to pass the `-mllvm -regalloc-enable-advisor=release` flag
	to `clang` whenever you're compiling something.