| # Infrastructure for MLGO - a Machine Learning Guided Compiler Optimizations Framework. |
| |
| MLGO is a framework for integrating ML techniques systematically in LLVM. It |
| replaces human-crafted optimization heuristics in LLVM with machine learned |
| models. The MLGO framework currently supports two optimizations: |
| |
| 1. inlining-for-size([LLVM RFC](https://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html)); |
| 2. register-allocation-for-performance([LLVM RFC](https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html)) |
| |
| The compiler components are both available in the main LLVM repository. This |
| repository contains the training infrastructure and related tools for MLGO. |
| |
| We currently use two different ML algorithms: Policy Gradient and Evolution |
| Strategies to train policies. Currently, this repository only support Policy |
| Gradient training. The release of Evolution Strategies training is on our |
| roadmap. |
| |
| Check out this [demo](docs/inlining-demo/demo.md) for an end-to-end demonstration of how |
| to train your own inlining-for-size policy from the scratch with Policy |
| Gradient, or check out this [demo](docs/regalloc-demo/demo.md) for a demonstration of how |
| to train your own regalloc-for-performance policy. |
| |
| For more details about MLGO, please refer to our paper |
| [MLGO: a Machine Learning Guided Compiler Optimizations Framework](https://arxiv.org/abs/2101.04808). |
| |
| For more details about how to contribute to the project, please refer to |
| [contributions](docs/contributing.md). |
| |
| ## Pretrained models |
| |
| We occasionally release pretrained models that may be used as-is with LLVM. |
| Models are released as github releases, and are named as |
| [task]-[major-version].[minor-version].The versions are semantic: the major |
| version corresponds to breaking changes on the LLVM/compiler side, and the minor |
| version corresponds to model updates that are independent of the compiler. |
| |
| When building LLVM, there is a flag `-DLLVM_INLINER_MODEL_PATH` which you may |
| set to the path to your inlining model. If the path is set to `download`, then |
| cmake will download the most recent (compatible) model from github to use. Other |
| values for the flag could be: |
| |
| ```sh |
| # Model is in /tmp/model, i.e. there is a file /tmp/model/saved_model.pb along |
| # with the rest of the tensorflow saved_model files produced from training. |
| -DLLVM_INLINER_MODEL_PATH=/tmp/model |
| |
| # Download the most recent compatible model |
| -DLLVM_INLINER_MODEL_PATH=download |
| ``` |
| |
| ## Prerequisites |
| |
| Currently, the assumptions for the system are: |
| |
| * Recent Ubuntu distro, e.g. 22.04 |
| * python 3.10.x/3.11.x |
| * for local training, which is currently the only supported mode, we recommend |
| a high-performance workstation (e.g. 96 hardware threads). |
| |
| Training assumes a clang build with ML 'development-mode'. Please refer to: |
| |
| * [LLVM documentation](https://llvm.org/docs/CMake.html) |
| * the build |
| [bot script](https://github.com/google/ml-compiler-opt/blob/main/buildbot/buildbot_init.sh) |
| |
| The model training - specific prerequisites are: |
| |
| Pipenv: |
| ```shell |
| pip3 install pipenv |
| ``` |
| |
| The actual dependencies: |
| ```shell |
| ./versioned_pipenv sync --system --categories "packages dev-packages ci" |
| ``` |
| Note that the above command will only work from the root of the repository |
| since it needs to have `Pipfile.lock` in the working directory at the time |
| of execution. |
| |
| The above command will also install all the packages, including development |
| packages (the `dev-packages` category), and packages only needed in CI (the |
| `ci` category). If you do not need those, you can omit them from the categories |
| option. |
| |
| Optionally, to run tests (run_tests.sh), you also need: |
| |
| ```shell |
| sudo apt-get install virtualenv |
| ``` |
| |
| Note that the same tensorflow package is also needed for building the 'release' |
| mode for LLVM. |
| |
| ## Docs |
| |
| An end-to-end [demo](docs/inlining-demo/demo.md) using Fuchsia as a codebase from which |
| we extract a corpus and train a model. |
| |
| [How to add a feature](docs/adding_features.md) guide. |
| [Extensibility model](docs/extensibility.md). |