docs/extensibility.md

Extending to new optimization problems

This guide is about extending the training tools to support new optimization problems. It is assumed the necessary LLVM changes have been made - i.e. instrumenting the optimization pass with a way to carry out decision making via a trained model, training log collection - see the lib/Analysis/MLInlineAdvisor.cpp and lib/CodeGen/MLRegallocEvictAdvisor.cpp for examples.

Extensibility steps

Refer to compiler_opt/rl/inlining or compiler_opt/rl/regalloc.

create a directory peer to inlining and regalloc. This placement is not necessary, but sufficient for illustration.
define the implementation of compiler_opt.rl.compilation_runner.CompilationRunner that's specific to your problem. Refer to the examples. Note how we always start processes via the compiler_opt.rl.start_cancellable_process() utility.
define the ML interface - see the config.py file in each of the examples.
extend compiler_opt.rl.problem_configuration.ProblemConfiguration. Make the new class gin-configurable. By convention, define this in the __init__.py.
place specific gin configs in the subdirectory, as well as vocab (these are optional, but likely necessary). A convention here is to make sure your gin files make the configurable config_registry.get_configuration.implementation point to your implementation of ProblemConfiguration. See the common.gin files in our examples. This allows any tool to just pick up your problem when pointing it (via --gin_files) to your problem.

You can have multiple gin files for different algorithm configurations, and reuse common settings (like the above) via gin's import mechanism. See our examples where we have different configs for PPO or behavioral cloning.

add your module to the list in compiler_opt.rl.registry.py, under the “Register implementations” comment.

‘compilation problem’ is an optimization problem with a specific way of invoking clang and specific features and tensorflow topologies. The component model requires all these be exported in a class implementing ProblemConfiguration below, however, to avoid cycle dependencies in Bazel environments, do not explicitly inherit from it.

Internally, all the module's implementation parameters are expected to be gin-initialized.

Use

Existing tools (e.g. train_locally.py) will just transparently use your new component if you point the tool to one of your gin files. This assumes your gin file binds config_registry.get_configuration.implementation as described:

--gin_bindings=config_registry.get_configuration.implementation=@configs.InliningConfig

To use in a new tool:

just get a ProblemConfiguration object in your python:
config = problem_configuration.get_configuration()
make sure your tool also exposes --gin_files and --gin_bindings and bootstraps gin.

Conventions

to avoid long binding names, use the runners module name for the CompilationRunner implementation, and use the configs module name for the implementation of ProblemConfiguration.
the CompilationRunner gin initialization should initialize to None, and use, the clang_path and launcher_path macros (https://github.com/google/gin-config#syntax-quick-reference):

  clang_path = None
  launcher_path = None
  runners.MyCompilationRunner.clang_path = %clang_path
  runners.MyCompilationRunner.launcher_path = %launcher_path

Use a similar pattern for problem-specific additional flags (see inlining's llvm_size_path for example). When running tools, this allows the user pass common flags transparently wrt the underlying runner - i.e. if swapping 2 runners, the clang flag stays the same: --gin_bindings=clang_path="'/foo/bar/clang'"