blob: 912a29a132bfdfb602e992026764367d28d6eb0a [file] [log] [blame] [view] [edit]
## Extending to new optimization problems
This guide is about extending the training tools to support new optimization
problems. It is assumed the necessary LLVM changes have been made - i.e.
instrumenting the optimization pass with a way to carry out decision making via
a trained model, training log collection - see the
lib/Analysis/MLInlineAdvisor.cpp and lib/CodeGen/MLRegallocEvictAdvisor.cpp for
examples.
### Extensibility steps
Refer to `compiler_opt/rl/inlining` or `compiler_opt/rl/regalloc`.
1) create a directory peer to `inlining` and `regalloc`. This placement is
not necessary, but sufficient for illustration.
2) define the implementation of
`compiler_opt.rl.compilation_runner.CompilationRunner` that's specific to your
problem. Refer to the examples. Note how we always start processes via the
`compiler_opt.rl.start_cancellable_process()` utility.
3) define the ML interface - see the `config.py` file in each of the examples.
4) extend `compiler_opt.rl.problem_configuration.ProblemConfiguration`. Make the
new class gin-configurable. By convention, define this in the `__init__.py`.
5) place specific gin configs in the subdirectory, as well as vocab (these are
optional, but likely necessary). A convention here is to make sure your gin
files make the configurable `config_registry.get_configuration.implementation`
point to your implementation of `ProblemConfiguration`. See the `common.gin`
files in our examples. This allows any tool to just pick up your problem when
pointing it (via `--gin_files`) to your problem.
You can have multiple gin files for different algorithm configurations, and
reuse common settings (like the above) via gin's `import` mechanism. See our
examples where we have different configs for PPO or behavioral cloning.
6) add your module to the list in `compiler_opt.rl.registry.py`, under the
"Register implementations" comment.
'compilation problem' is an optimization problem with a specific way of
invoking clang and specific features and tensorflow topologies. The component
model requires all these be exported in a class implementing
ProblemConfiguration below, however, to avoid cycle dependencies in Bazel
environments, do not explicitly inherit from it.
Internally, all the module's implementation parameters are expected to be
gin-initialized.
### Use
Existing tools (e.g. `train_locally.py`) will just transparently use your new
component if you point the tool to one of your gin files. This assumes your gin
file binds `config_registry.get_configuration.implementation` as described:
`--gin_bindings=config_registry.get_configuration.implementation=@configs.InliningConfig`
To use in a new tool:
* just get a ProblemConfiguration object in your python:
`config = problem_configuration.get_configuration()`
* make sure your tool also exposes `--gin_files` and `--gin_bindings` and
bootstraps gin.
### Conventions
* to avoid long binding names, use the `runners` module name for the
`CompilationRunner` implementation, and use the `configs` module name for the
implementation of `ProblemConfiguration`.
* the `CompilationRunner` gin initialization should initialize to None, and use,
the `clang_path` and `launcher_path` macros
(https://github.com/google/gin-config#syntax-quick-reference):
```
clang_path = None
launcher_path = None
runners.MyCompilationRunner.clang_path = %clang_path
runners.MyCompilationRunner.launcher_path = %launcher_path
```
Use a similar pattern for problem-specific additional flags (see inlining's
`llvm_size_path` for example). When running tools, this allows the user pass
common flags transparently wrt the underlying runner - i.e. if swapping 2
runners, the clang flag stays the same:
`--gin_bindings=clang_path="'/foo/bar/clang'"`