docs/extensibility.md - third_party/github.com/google/ml-compiler-opt - Git at Google

 ## Extending to new optimization problems

 This guide is about extending the training tools to support new optimization
 problems. It is assumed the necessary LLVM changes have been made - i.e.
 instrumenting the optimization pass with a way to carry out decision making via
 a trained model, training log collection - see the
 lib/Analysis/MLInlineAdvisor.cpp and lib/CodeGen/MLRegallocEvictAdvisor.cpp for
 examples.


 ### Extensibility steps

 Refer to `compiler_opt/rl/inlining` or `compiler_opt/rl/regalloc`.

 1) create a directory peer to `inlining` and `regalloc`. This placement is
 not necessary, but sufficient for illustration.

 2) define the implementation of
 `compiler_opt.rl.compilation_runner.CompilationRunner` that's specific to your
 problem. Refer to the examples. Note how we always start processes via the
 `compiler_opt.rl.start_cancellable_process()` utility.

 3) define the ML interface - see the `config.py` file in each of the examples.

 4) extend `compiler_opt.rl.problem_configuration.ProblemConfiguration`. Make the
 new class gin-configurable. By convention, define this in the `__init__.py`.

 5) place specific gin configs in the subdirectory, as well as vocab (these are
 optional, but likely necessary). A convention here is to make sure your gin
 files make the configurable `config_registry.get_configuration.implementation`
 point to your implementation of `ProblemConfiguration`. See the `common.gin`
 files in our examples. This allows any tool to just pick up your problem when
 pointing it (via `--gin_files`) to your problem.

 You can have multiple gin files for different algorithm configurations, and
 reuse common settings (like the above) via gin's `import` mechanism. See our
 examples where we have different configs for PPO or behavioral cloning.

 6) add your module to the list in `compiler_opt.rl.registry.py`, under the
 "Register implementations" comment.

  'compilation problem' is an optimization problem with a specific way of
 invoking clang and specific features and tensorflow topologies. The component
 model requires all these be exported in a class implementing
 ProblemConfiguration below, however, to avoid cycle dependencies in Bazel
 environments, do not explicitly inherit from it.

 Internally, all the module's implementation parameters are expected to be
 gin-initialized.

 ### Use

 Existing tools (e.g. `train_locally.py`) will just transparently use your new
 component if you point the tool to one of your gin files. This assumes your gin
 file binds `config_registry.get_configuration.implementation` as described:

 `--gin_bindings=config_registry.get_configuration.implementation=@configs.InliningConfig`

 To use in a new tool:

 *   just get a ProblemConfiguration object in your python:

     `config = problem_configuration.get_configuration()`

 *   make sure your tool also exposes `--gin_files` and `--gin_bindings` and
     bootstraps gin.

 ### Conventions

 * to avoid long binding names, use the `runners` module name for the
   `CompilationRunner` implementation, and use the `configs` module name for the
   implementation of `ProblemConfiguration`.

 * the `CompilationRunner` gin initialization should initialize to None, and use,
   the `clang_path` and `launcher_path` macros
   (https://github.com/google/gin-config#syntax-quick-reference):

 ```
   clang_path = None
   launcher_path = None
   runners.MyCompilationRunner.clang_path = %clang_path
   runners.MyCompilationRunner.launcher_path = %launcher_path
 ```

 Use a similar pattern for problem-specific additional flags (see inlining's
 `llvm_size_path` for example). When running tools, this allows the user pass
 common flags transparently wrt the underlying runner - i.e. if swapping 2
 runners, the clang flag stays the same:
 `--gin_bindings=clang_path="'/foo/bar/clang'"`
	## Extending to new optimization problems

	This guide is about extending the training tools to support new optimization
	problems. It is assumed the necessary LLVM changes have been made - i.e.
	instrumenting the optimization pass with a way to carry out decision making via
	a trained model, training log collection - see the
	lib/Analysis/MLInlineAdvisor.cpp and lib/CodeGen/MLRegallocEvictAdvisor.cpp for
	examples.


	### Extensibility steps

	Refer to `compiler_opt/rl/inlining` or `compiler_opt/rl/regalloc`.

	1) create a directory peer to `inlining` and `regalloc`. This placement is
	not necessary, but sufficient for illustration.

	2) define the implementation of
	`compiler_opt.rl.compilation_runner.CompilationRunner` that's specific to your
	problem. Refer to the examples. Note how we always start processes via the
	`compiler_opt.rl.start_cancellable_process()` utility.

	3) define the ML interface - see the `config.py` file in each of the examples.

	4) extend `compiler_opt.rl.problem_configuration.ProblemConfiguration`. Make the
	new class gin-configurable. By convention, define this in the `__init__.py`.

	5) place specific gin configs in the subdirectory, as well as vocab (these are
	optional, but likely necessary). A convention here is to make sure your gin
	files make the configurable `config_registry.get_configuration.implementation`
	point to your implementation of `ProblemConfiguration`. See the `common.gin`
	files in our examples. This allows any tool to just pick up your problem when
	pointing it (via `--gin_files`) to your problem.

	You can have multiple gin files for different algorithm configurations, and
	reuse common settings (like the above) via gin's `import` mechanism. See our
	examples where we have different configs for PPO or behavioral cloning.

	6) add your module to the list in `compiler_opt.rl.registry.py`, under the
	"Register implementations" comment.

	'compilation problem' is an optimization problem with a specific way of
	invoking clang and specific features and tensorflow topologies. The component
	model requires all these be exported in a class implementing
	ProblemConfiguration below, however, to avoid cycle dependencies in Bazel
	environments, do not explicitly inherit from it.

	Internally, all the module's implementation parameters are expected to be
	gin-initialized.

	### Use

	Existing tools (e.g. `train_locally.py`) will just transparently use your new
	component if you point the tool to one of your gin files. This assumes your gin
	file binds `config_registry.get_configuration.implementation` as described:

	`--gin_bindings=config_registry.get_configuration.implementation=@configs.InliningConfig`

	To use in a new tool:

	* just get a ProblemConfiguration object in your python:

	`config = problem_configuration.get_configuration()`

	* make sure your tool also exposes `--gin_files` and `--gin_bindings` and
	bootstraps gin.

	### Conventions

	* to avoid long binding names, use the `runners` module name for the
	`CompilationRunner` implementation, and use the `configs` module name for the
	implementation of `ProblemConfiguration`.

	* the `CompilationRunner` gin initialization should initialize to None, and use,
	the `clang_path` and `launcher_path` macros
	(https://github.com/google/gin-config#syntax-quick-reference):

	```
	clang_path = None
	launcher_path = None
	runners.MyCompilationRunner.clang_path = %clang_path
	runners.MyCompilationRunner.launcher_path = %launcher_path
	```

	Use a similar pattern for problem-specific additional flags (see inlining's
	`llvm_size_path` for example). When running tools, this allows the user pass
	common flags transparently wrt the underlying runner - i.e. if swapping 2
	runners, the clang flag stays the same:
	`--gin_bindings=clang_path="'/foo/bar/clang'"`