docs/adding_features.md

How to add a new feature

TL;DR; 3 steps:

add the feature on the LLVM side
tell training about the new feature
retrain

A reference for the first two: LLVM side; and associated ml-compiler-opt side

Adding the feature on the LLVM side

Most of the work here is choosing the feature and extracting it from IR, after that, follow the existing pattern in MLInlineAdvisor.cpp (see MLInlineAdvisor::getAdviceImpl) or MLRegallocEvictAdvisor.cpp (see MLEvictAdvisor::extractFeatures). Note that passing the feature to the ML model happens generically, regardless how the model is evaluated (AOT or development-mode); also, populating the training log happens generically, so you do not need to worry about logging.

The key is to remember the name, type, and dimensions of the feature, they need to match what we do in the next step.

Training side

For each policy we train, there should be a config.py, e.g. compiler_opt/rl/inlining/config.py. Follow the example there.

Retrain

First and foremost, you must regenerate the vocabulary - technically you just need a vocab file for the new feature, but it's simpler to regenerate it all. See the demo section

Note: You only need to regenerate the vocabulary if the feature is going to be normalized by a preprocessing layer for your model. If your feature does not need to get put through a lambda normalization preprocessing layer, make sure to regenerate the vocabulary and that your feature is added to the list returned by get_nonnormalized_features() in config.py. In either case, it is still quite simple and fast to just call the vocab generation again.

After that, retrain from scratch.

Notes

Currently, the LLVM side insists that all features it knows of be supported by a model. This means that we can't add a new feature, then use the previous trained policy as baseline for training. We are planning to relax this requirement and support using a previous policy as baseline - as long as the new feature set is a superset of the old one.