docs/adding_features.md - third_party/github.com/google/ml-compiler-opt - Git at Google

 ## How to add a new feature

 TL;DR; 3 steps:

 - add the feature on the LLVM side
 - tell training about the new feature
 - retrain

 A reference for the first two: LLVM [side](https://github.com/llvm/llvm-project/commit/99f00635d7acf1cbcdba35e7621f3a211aa3f237); and associated ml-compiler-opt [side](https://github.com/google/ml-compiler-opt/commit/882674933ce1c7a141591dfce0f2ae6e54a9fb9c)

 ## Adding the feature on the LLVM side

 Most of the work here is choosing the feature and extracting it from IR, after
 that, follow the existing pattern in `MLInlineAdvisor.cpp` (see `MLInlineAdvisor::getAdviceImpl`) or `MLRegallocEvictAdvisor.cpp` (see ` MLEvictAdvisor::extractFeatures`). Note that passing the feature to the ML model
 happens generically, regardless how the model is evaluated (AOT or
 development-mode); also, populating the training log happens generically, so you
 do not need to worry about logging.

 The key is to remember the name, type, and dimensions of the feature, they need
 to match what we do in the next step.

 ## Training side

 For each policy we train, there should be a `config.py`, e.g. `compiler_opt/rl/inlining/config.py`. Follow the example there.

 ## Retrain

 First and foremost, **you must regenerate the vocabulary** - technically you
 just need a vocab file for the new feature, but it's simpler to regenerate it
 all. See the [demo section](inlining-demo/demo.md#collect-trace-and-generate-vocab)

 **Note:** You only need to regenerate the vocabulary if the feature is going
 to be normalized by a preprocessing layer for your model. If your feature does
 not need to get put through a lambda normalization preprocessing layer, make sure
 to regenerate the vocabulary and that your feature is added to the list
 returned by `get_nonnormalized_features()` in `config.py`. In either case,
 it is still quite simple and fast to just call the vocab generation again.

 After that, retrain from [scratch](inlining-demo/demo.md#train-a-new-model).

 ## Notes

 Currently, the LLVM side insists that all features it knows of be supported by a model. This means that we can't add a new feature, then use the previous trained policy as baseline for training. We are planning to relax this requirement and support using a previous policy as baseline - as long as the new feature set is
 a superset of the old one.
	## How to add a new feature

	TL;DR; 3 steps:

	- add the feature on the LLVM side
	- tell training about the new feature
	- retrain

	A reference for the first two: LLVM [side](https://github.com/llvm/llvm-project/commit/99f00635d7acf1cbcdba35e7621f3a211aa3f237); and associated ml-compiler-opt [side](https://github.com/google/ml-compiler-opt/commit/882674933ce1c7a141591dfce0f2ae6e54a9fb9c)

	## Adding the feature on the LLVM side

	Most of the work here is choosing the feature and extracting it from IR, after
	that, follow the existing pattern in `MLInlineAdvisor.cpp` (see `MLInlineAdvisor::getAdviceImpl`) or `MLRegallocEvictAdvisor.cpp` (see ` MLEvictAdvisor::extractFeatures`). Note that passing the feature to the ML model
	happens generically, regardless how the model is evaluated (AOT or
	development-mode); also, populating the training log happens generically, so you
	do not need to worry about logging.

	The key is to remember the name, type, and dimensions of the feature, they need
	to match what we do in the next step.

	## Training side

	For each policy we train, there should be a `config.py`, e.g. `compiler_opt/rl/inlining/config.py`. Follow the example there.

	## Retrain

	First and foremost, you must regenerate the vocabulary - technically you
	just need a vocab file for the new feature, but it's simpler to regenerate it
	all. See the [demo section](inlining-demo/demo.md#collect-trace-and-generate-vocab)

	Note: You only need to regenerate the vocabulary if the feature is going
	to be normalized by a preprocessing layer for your model. If your feature does
	not need to get put through a lambda normalization preprocessing layer, make sure
	to regenerate the vocabulary and that your feature is added to the list
	returned by `get_nonnormalized_features()` in `config.py`. In either case,
	it is still quite simple and fast to just call the vocab generation again.

	After that, retrain from [scratch](inlining-demo/demo.md#train-a-new-model).

	## Notes

	Currently, the LLVM side insists that all features it knows of be supported by a model. This means that we can't add a new feature, then use the previous trained policy as baseline for training. We are planning to relax this requirement and support using a previous policy as baseline - as long as the new feature set is
	a superset of the old one.