auto-sync
is the architecture update tool for Capstone. Because the architecture modules of Capstone use mostly code from LLVM, we need to update this part with every LLVM release. auto-sync
helps with this synchronization between LLVM and Capstone's modules by automating most of it.
Please refer to intro.md for an introduction about this tool.
Setup Python environment and Tree-sitter
cd <root-dir-Capstone> # Python version must be at least 3.11 sudo apt install python3-venv # Setup virtual environment in Capstone root dir python3 -m venv ./.venv source ./.venv/bin/activate
Install Auto-Sync framework
cd suite/auto-sync/ pip install -e .
Please read ARCHITECTURE.md to understand how Auto-Sync works.
This step is essential! Please don't skip it.
Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync. Not all arch-modules support Auto-Sync yet.
Check if your architecture is supported.
./src/autosync/ASUpdater.py -h
Clone Capstones LLVM fork and build llvm-tblgen
git clone https://github.com/capstone-engine/llvm-capstone vendor/llvm_root/ cd llvm-capstone git checkout auto-sync mkdir build cd build # You can also build the "Release" version cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm cmake --build . --target llvm-tblgen --config Debug cd ../../
Run the updater
./src/autosync/ASUpdater.py -a <ARCH>
ASUpdater.py
script.<ARCH>DisassemblerExtension.*
to LLVM (search the function names in the LLVM root) and update them if necessary.This update translates some LLVM C++ files to C. Because the translation is not perfect (maybe it will some day) you will get build errors if you try to compile Capstone.
The last step to finish the update is to fix those build errors by hand.
This is a rough overview what files of an architecture are updated and where they are coming from.
Files originating from LLVM (Automatically updated)
These files are LLVM source files which were translated from C++ to C Not all the listed files below are used by each architecture. But those are the most common.
<ARCH>Disassembler.*
: Bytes to MCInst
decoder.<ARCH>InstPrinter.*
or <ARCH>AsmPrinter.*
: MCInst
to asm string decoder.<ARCH>BaseInfo.*
: Commonly use functions and definitions.*.inc
files are exclusively generated by LLVM TableGen backends:
*.inc
files for the LLVM component are named like this:
<ARCH>Gen*.inc
(note: no CS
in the name)Additionally, we generate more details for Capstone with llvm-tblgen
. Like enums, operand details and other things.
They are saved also to *.inc
files, but have the CS
in the name to make them distinct from the LLVM generated files.
<ARCH>GenCS*.inc
Capstone module files (Not automatically updated)
Those files are written by us:
<ARCH>DisassemblerExtension.*
All kind of functions which are needed by the LLVM component, but could not be generated or translated.<ARCH>Mapping.*
: Binding code between the architecture module and the LLVM files. This is also where the detail is set.<ARCH>Module.*
: Interface to the Capstone core.LLVM file translation
For details about the C++ to C translation of the LLVM files refer to CppTranslator/README.md
.
Generated .inc files
Documentation about the .inc
file generation is in the llvm-capstone repository.
Troubleshooting
If some features aren't generated and are missing in the .inc
files, make sure they are defined as AssemblerPredicate
in the .td
files.
Correct:
def In32BitMode : Predicate<"!Subtarget->isPPC64()">, AssemblerPredicate<(all_of (not Feature64Bit)), "64bit">;
Incorrect:
def In32BitMode : Predicate<"!Subtarget->isPPC64()">;
Formatting
CppTranslator
please format the files with black
and usort
pip3 install black usort python3 -m usort format src/autosync python3 -m black src/autosync
Not all architecture modules support Auto-Sync yet. Here is an overview of the steps to add support for it.
To refactor one of them to use auto-sync
, you need to add it to the configuration.
ASUpdater.py
.CppTranslator
for your architecture (suite/auto-sync/CppTranslator/arch_config.json
)Now, manually run the update commands within ASUpdater.py
but skip the Differ
step:
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
The task after this is to:
add_cs_detail()
handler in <ARCH>Mapping
for each operand type.include/capstone/<ARCH>.h
) to include the generated enums (see below)Notes:
Some generated enums must be included in the include/capstone/<ARCH>.h
header. At the position where the enum should be inserted, add a comment like this (don't remove the <>
brackets):
// generate content <FILENAME.inc> begin // generate content <FILENAME.inc> end
The update script will insert the content of the .inc
file at this place.
If you find yourself fixing the same syntax error multiple times, please consider adding a Patch
to the CppTranslator
for this case.
Please check out the implementation of ARM's add_cs_detail()
before implementing your own.
Running the Differ
after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.
Sometimes the LLVM code uses a single function from a larger source file. It is not worth it to translate the whole file just for this function. Bundle those lonely functions in <ARCH>DisassemblerExtension.c
.
Adding a new architecture follows the same steps as above. With the exception that you need to implement all the Capstone files from scratch.
Check out an auto-sync
supporting architectures for guidance and open an issue if you need help.