asm
The tracking issue for this feature is: #72016
For extremely low-level manipulations and performance reasons, one might wish to control the CPU directly. Rust supports using inline assembly to do this via the asm!
macro.
Rust provides support for inline assembly via the asm!
macro. It can be used to embed handwritten assembly in the assembly output generated by the compiler. Generally this should not be necessary, but might be where the required performance or timing cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.
Note: the examples here are given in x86/x86-64 assembly, but other architectures are also supported.
Inline assembly is currently supported on the following architectures:
Let us start with the simplest possible example:
# #![feature(asm)] unsafe { asm!("nop"); }
This will insert a NOP (no operation) instruction into the assembly generated by the compiler. Note that all asm!
invocations have to be inside an unsafe
block, as they could insert arbitrary instructions and break various invariants. The instructions to be inserted are listed in the first argument of the asm!
macro as a string literal.
Now inserting an instruction that does nothing is rather boring. Let us do something that actually acts on data:
# #![feature(asm)] let x: u64; unsafe { asm!("mov {}, 5", out(reg) x); } assert_eq!(x, 5);
This will write the value 5
into the u64
variable x
. You can see that the string literal we use to specify instructions is actually a template string. It is governed by the same rules as Rust format strings. The arguments that are inserted into the template however look a bit different then you may be familiar with. First we need to specify if the variable is an input or an output of the inline assembly. In this case it is an output. We declared this by writing out
. We also need to specify in what kind of register the assembly expects the variable. In this case we put it in an arbitrary general purpose register by specifying reg
. The compiler will choose an appropriate register to insert into the template and will read the variable from there after the inline assembly finishes executing.
Let us see another example that also uses an input:
# #![feature(asm)] let i: u64 = 3; let o: u64; unsafe { asm!( "mov {0}, {1}", "add {0}, {number}", out(reg) o, in(reg) i, number = const 5, ); } assert_eq!(o, 8);
This will add 5
to the input in variable i
and write the result to variable o
. The particular way this assembly does this is first copying the value from i
to the output, and then adding 5
to it.
The example shows a few things:
First, we can see that asm!
allows multiple template string arguments; each one is treated as a separate line of assembly code, as if they were all joined together with newlines between them. This makes it easy to format assembly code.
Second, we can see that inputs are declared by writing in
instead of out
.
Third, one of our operands has a type we haven't seen yet, const
. This tells the compiler to expand this argument to value directly inside the assembly template. This is only possible for constants and literals.
Fourth, we can see that we can specify an argument number, or name as in any format string. For inline assembly templates this is particularly useful as arguments are often used more than once. For more complex inline assembly using this facility is generally recommended, as it improves readability, and allows reordering instructions without changing the argument order.
We can further refine the above example to avoid the mov
instruction:
# #![feature(asm)] let mut x: u64 = 3; unsafe { asm!("add {0}, {number}", inout(reg) x, number = const 5); } assert_eq!(x, 8);
We can see that inout
is used to specify an argument that is both input and output. This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.
It is also possible to specify different variables for the input and output parts of an inout
operand:
# #![feature(asm)] let x: u64 = 3; let y: u64; unsafe { asm!("add {0}, {number}", inout(reg) x => y, number = const 5); } assert_eq!(y, 8);
The Rust compiler is conservative with its allocation of operands. It is assumed that an out
can be written at any time, and can therefore not share its location with any other argument. However, to guarantee optimal performance it is important to use as few registers as possible, so they won't have to be saved and reloaded around the inline assembly block. To achieve this Rust provides a lateout
specifier. This can be used on any output that is written only after all inputs have been consumed. There is also a inlateout
variant of this specifier.
Here is an example where inlateout
cannot be used:
# #![feature(asm)] let mut a: u64 = 4; let b: u64 = 4; let c: u64 = 4; unsafe { asm!( "add {0}, {1}", "add {0}, {2}", inout(reg) a, in(reg) b, in(reg) c, ); } assert_eq!(a, 12);
Here the compiler is free to allocate the same register for inputs b
and c
since it knows they have the same value. However it must allocate a separate register for a
since it uses inout
and not inlateout
. If inlateout
was used, then a
and c
could be allocated to the same register, in which case the first instruction to overwrite the value of c
and cause the assembly code to produce the wrong result.
However the following example can use inlateout
since the output is only modified after all input registers have been read:
# #![feature(asm)] let mut a: u64 = 4; let b: u64 = 4; unsafe { asm!("add {0}, {1}", inlateout(reg) a, in(reg) b); } assert_eq!(a, 8);
As you can see, this assembly fragment will still work correctly if a
and b
are assigned to the same register.
Some instructions require that the operands be in a specific register. Therefore, Rust inline assembly provides some more specific constraint specifiers. While reg
is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers eax
, ebx
, ecx
, edx
, ebp
, esi
, and edi
among others can be addressed by their name.
# #![feature(asm)] let cmd = 0xd1; unsafe { asm!("out 0x64, eax", in("eax") cmd); }
In this example we call the out
instruction to output the content of the cmd
variable to port 0x64
. Since the out
instruction only accepts eax
(and its sub registers) as operand we had to use the eax
constraint specifier.
Note that unlike other operand types, explicit register operands cannot be used in the template string: you can't use {}
and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.
Consider this example which uses the x86 mul
instruction:
# #![feature(asm)] fn mul(a: u64, b: u64) -> u128 { let lo: u64; let hi: u64; unsafe { asm!( // The x86 mul instruction takes rax as an implicit input and writes // the 128-bit result of the multiplication to rax:rdx. "mul {}", in(reg) a, inlateout("rax") b => lo, lateout("rdx") hi ); } ((hi as u128) << 64) + lo as u128 }
This uses the mul
instruction to multiply two 64-bit inputs with a 128-bit result. The only explicit operand is a register, that we fill from the variable a
. The second operand is implicit, and must be the rax
register, which we fill from the variable b
. The lower 64 bits of the result are stored in rax
from which we fill the variable lo
. The higher 64 bits are stored in rdx
from which we fill the variable hi
.
In many cases inline assembly will modify state that is not needed as an output. Usually this is either because we have to use a scratch register in the assembly, or instructions modify state that we don't need to further examine. This state is generally referred to as being “clobbered”. We need to tell the compiler about this since it may need to save and restore this state around the inline assembly block.
# #![feature(asm)] let ebx: u32; let ecx: u32; unsafe { asm!( "cpuid", // EAX 4 selects the "Deterministic Cache Parameters" CPUID leaf inout("eax") 4 => _, // ECX 0 selects the L0 cache information. inout("ecx") 0 => ecx, lateout("ebx") ebx, lateout("edx") _, ); } println!( "L1 Cache: {}", ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1) );
In the example above we use the cpuid
instruction to get the L1 cache size. This instruction writes to eax
, ebx
, ecx
, and edx
, but for the cache size we only care about the contents of ebx
and ecx
.
However we still need to tell the compiler that eax
and edx
have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with _
instead of a variable name, which indicates that the output value is to be discarded.
This can also be used with a general register class (e.g. reg
) to obtain a scratch register for use inside the asm code:
# #![feature(asm)] // Multiply x by 6 using shifts and adds let mut x: u64 = 4; unsafe { asm!( "mov {tmp}, {x}", "shl {tmp}, 1", "shl {x}, 2", "add {x}, {tmp}", x = inout(reg) x, tmp = out(reg) _, ); } assert_eq!(x, 4 * 6);
A special operand type, sym
, allows you to use the symbol name of a fn
or static
in inline assembly code. This allows you to call a function or access a global variable without needing to keep its address in a register.
# #![feature(asm)] extern "C" fn foo(arg: i32) { println!("arg = {}", arg); } fn call_foo(arg: i32) { unsafe { asm!( "call {}", sym foo, // 1st argument in rdi, which is caller-saved inout("rdi") arg => _, // All caller-saved registers must be marked as clobberred out("rax") _, out("rcx") _, out("rdx") _, out("rsi") _, out("r8") _, out("r9") _, out("r10") _, out("r11") _, out("xmm0") _, out("xmm1") _, out("xmm2") _, out("xmm3") _, out("xmm4") _, out("xmm5") _, out("xmm6") _, out("xmm7") _, out("xmm8") _, out("xmm9") _, out("xmm10") _, out("xmm11") _, out("xmm12") _, out("xmm13") _, out("xmm14") _, out("xmm15") _, ) } }
Note that the fn
or static
item does not need to be public or #[no_mangle]
: the compiler will automatically insert the appropriate mangled symbol name into the assembly code.
In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a “view” over a subset of the register (e.g. the low 32 bits of a 64-bit register).
By default the compiler will always choose the name that refers to the full register size (e.g. rax
on x86-64, eax
on x86, etc).
This default can be overriden by using modifiers on the template string operands, just like you would with format strings:
# #![feature(asm)] let mut x: u16 = 0xab; unsafe { asm!("mov {0:h}, {0:l}", inout(reg_abcd) x); } assert_eq!(x, 0xabab);
In this example, we use the reg_abcd
register class to restrict the register allocator to the 4 legacy x86 register (ax
, bx
, cx
, dx
) of which the first two bytes can be addressed independently.
Let us assume that the register allocator has chosen to allocate x
in the ax
register. The h
modifier will emit the register name for the high byte of that register and the l
modifier will emit the register name for the low byte. The asm code will therefore be expanded as mov ah, al
which copies the low byte of the value into the high byte.
If you use a smaller data type (e.g. u16
) with an operand and forget the use template modifiers, the compiler will emit a warning and suggest the correct modifier to use.
Sometimes assembly instructions require operands passed via memory addresses/memory locations. You have to manually use the memory address syntax specified by the respectively architectures. For example, in x86/x86_64 and intel assembly syntax, you should wrap inputs/outputs in []
to indicate they are memory operands:
# #![feature(asm, llvm_asm)] # fn load_fpu_control_word(control: u16) { unsafe { asm!("fldcw [{}]", in(reg) &control, options(nostack)); // Previously this would have been written with the deprecated `llvm_asm!` like this llvm_asm!("fldcw $0" :: "m" (control) :: "volatile"); } # }
By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.
Let's take our previous example of an add
instruction:
# #![feature(asm)] let mut a: u64 = 4; let b: u64 = 4; unsafe { asm!( "add {0}, {1}", inlateout(reg) a, in(reg) b, options(pure, nomem, nostack), ); } assert_eq!(a, 8);
Options can be provided as an optional final argument to the asm!
macro. We specified three options here:
pure
means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.nomem
means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).nostack
means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86-64 to avoid stack pointer adjustments.These allow the compiler to better optimize code using asm!
, for example by eliminating pure asm!
blocks whose outputs are not needed.
See the reference for the full list of available options and their effects.
Inline assembler is implemented as an unsafe macro asm!()
. The first argument to this macro is a template string literal used to build the final assembly. The following arguments specify input and output operands. When required, options are specified as the final argument.
The following ABNF specifies the general syntax:
dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout" reg_spec := <register class> / "<explicit register>" operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_" reg_operand := dir_spec "(" reg_spec ")" operand_expr operand := reg_operand / "const" const_expr / "sym" path option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn" / "nostack" / "att_syntax" options := "options(" option *["," option] [","] ")" asm := "asm!(" format_string *("," format_string) *("," [ident "="] operand) ["," options] [","] ")"
The macro will initially be supported only on ARM, AArch64, Hexagon, x86, x86-64 and RISC-V targets. Support for more targets may be added in the future. The compiler will emit an error if asm!
is used on an unsupported target.
The assembler template uses the same syntax as format strings (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by RFC #2795) are not supported.
An asm!
invocation may have one or more template string arguments; an asm!
with multiple template string arguments is treated as if all the strings were concatenated with a \n
between them. The expected usage is for each template string argument to correspond to a line of assembly code. All template string arguments must appear before any other arguments.
As with format strings, named arguments must appear after positional arguments. Explicit register operands must appear at the end of the operand list, after named arguments if any.
Explicit register operands cannot be used by placeholders in the template string. All other named and positional operands must appear at least once in the template string, otherwise a compiler error is generated.
The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.
The 5 targets specified in this RFC (x86, ARM, AArch64, RISC-V, Hexagon) all use the assembly code syntax of the GNU assembler (GAS). On x86, the .intel_syntax noprefix
mode of GAS is used by default. On ARM, the .syntax unified
mode is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with .section
) must be restored to its original value at the end of the asm string. Assembly code that does not conform to the GAS syntax will result in assembler-specific behavior.
Several types of operands are supported:
in(<reg>) <expr>
<reg>
can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.<expr>
at the start of the asm code.lateout
is allocated to the same register).out(<reg>) <expr>
<reg>
can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.<expr>
must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code._
) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).lateout(<reg>) <expr>
out
except that the register allocator can reuse a register allocated to an in
.inout(<reg>) <expr>
<reg>
can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.<expr>
at the start of the asm code.<expr>
must be a mutable initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.inout(<reg>) <in expr> => <out expr>
inout
except that the initial value of the register is taken from the value of <in expr>
.<out expr>
must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code._
) may be specified instead of an expression for <out expr>
, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).<in expr>
and <out expr>
may have different types.inlateout(<reg>) <expr>
/ inlateout(<reg>) <in expr> => <out expr>
inout
except that the register allocator can reuse a register allocated to an in
(this can happen if the compiler knows the in
has the same initial value as the inlateout
).const <expr>
<expr>
must be an integer or floating-point constant expression.sym <path>
<path>
must refer to a fn
or static
.<path>
is allowed to point to a #[thread_local]
static, in which case the asm code can combine the symbol with relocations (e.g. @plt
, @TPOFF
) to read from thread-local data.Operand expressions are evaluated from left to right, just like function call arguments. After the asm!
has executed, outputs are written to in left to right order. This is significant if two outputs point to the same place: that place will contain the value of the rightmost output.
Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. "eax"
) while register classes are specified as identifiers (e.g. reg
). Using string literals for register names enables support for architectures that use special characters in register names, such as MIPS ($0
, $1
, etc).
Note that explicit registers treat register aliases (e.g. r14
vs lr
on ARM) and smaller views of a register (e.g. eax
vs rax
) as equivalent to the base register. It is a compile-time error to use the same explicit register for two input operands or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.
Only the following types are allowed as operands for inline assembly:
#[repr(simd)]
and which implement Copy
). This includes architecture-specific vector types defined in std::arch
such as __m128
(x86) or int8x16_t
(ARM).Here is the list of currently supported register classes:
Architecture | Register class | Registers | LLVM constraint code |
---|---|---|---|
x86 | reg | ax , bx , cx , dx , si , di , r[8-15] (x86-64 only) | r |
x86 | reg_abcd | ax , bx , cx , dx | Q |
x86-32 | reg_byte | al , bl , cl , dl , ah , bh , ch , dh | q |
x86-64 | reg_byte | al , bl , cl , dl , sil , dil , r[8-15]b , ah *, bh *, ch *, dh * | q |
x86 | xmm_reg | xmm[0-7] (x86) xmm[0-15] (x86-64) | x |
x86 | ymm_reg | ymm[0-7] (x86) ymm[0-15] (x86-64) | x |
x86 | zmm_reg | zmm[0-7] (x86) zmm[0-31] (x86-64) | v |
x86 | kreg | k[1-7] | Yk |
AArch64 | reg | x[0-28] , x30 | r |
AArch64 | vreg | v[0-31] | w |
AArch64 | vreg_low16 | v[0-15] | x |
ARM | reg | r[0-5] r7 *, r[8-10] , r11 *, r12 , r14 | r |
ARM (Thumb) | reg_thumb | r[0-r7] | l |
ARM (ARM) | reg_thumb | r[0-r10] , r12 , r14 | l |
ARM | sreg | s[0-31] | t |
ARM | sreg_low16 | s[0-15] | x |
ARM | dreg | d[0-31] | w |
ARM | dreg_low16 | d[0-15] | t |
ARM | dreg_low8 | d[0-8] | x |
ARM | qreg | q[0-15] | w |
ARM | qreg_low8 | q[0-7] | t |
ARM | qreg_low4 | q[0-3] | x |
MIPS | reg | $[2-25] | r |
MIPS | freg | $f[0-31] | f |
NVPTX | reg16 | None* | h |
NVPTX | reg32 | None* | r |
NVPTX | reg64 | None* | l |
RISC-V | reg | x1 , x[5-7] , x[9-15] , x[16-31] (non-RV32E) | r |
RISC-V | freg | f[0-31] | f |
Hexagon | reg | r[0-28] | r |
Note: On x86 we treat
reg_byte
differently fromreg
because the compiler can allocateal
andah
separately whereasreg
reserves the whole register.Note #2: On x86-64 the high byte registers (e.g.
ah
) are only available when used as an explicit register. Specifying thereg_byte
register class for an operand will always allocate a low byte register.Note #3: NVPTX doesn't have a fixed register set, so named registers are not supported.
Note #4: On ARM the frame pointer is either
r7
orr11
depending on the platform.
Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).
Each register class has constraints on which value types they can be used with. This is necessary because the way a value is loaded into a register depends on its type. For example, on big-endian systems, loading a i32x4
and a i8x16
into a SIMD register may result in different register contents even if the byte-wise memory representation of both values is identical. The availability of supported types for a particular register class may depend on what target features are currently enabled.
Architecture | Register class | Target feature | Allowed types |
---|---|---|---|
x86-32 | reg | None | i16 , i32 , f32 |
x86-64 | reg | None | i16 , i32 , f32 , i64 , f64 |
x86 | reg_byte | None | i8 |
x86 | xmm_reg | sse | i32 , f32 , i64 , f64 , i8x16 , i16x8 , i32x4 , i64x2 , f32x4 , f64x2 |
x86 | ymm_reg | avx | i32 , f32 , i64 , f64 , i8x16 , i16x8 , i32x4 , i64x2 , f32x4 , f64x2 i8x32 , i16x16 , i32x8 , i64x4 , f32x8 , f64x4 |
x86 | zmm_reg | avx512f | i32 , f32 , i64 , f64 , i8x16 , i16x8 , i32x4 , i64x2 , f32x4 , f64x2 i8x32 , i16x16 , i32x8 , i64x4 , f32x8 , f64x4 i8x64 , i16x32 , i32x16 , i64x8 , f32x16 , f64x8 |
x86 | kreg | axv512f | i8 , i16 |
x86 | kreg | axv512bw | i32 , i64 |
AArch64 | reg | None | i8 , i16 , i32 , f32 , i64 , f64 |
AArch64 | vreg | fp | i8 , i16 , i32 , f32 , i64 , f64 , i8x8 , i16x4 , i32x2 , i64x1 , f32x2 , f64x1 , i8x16 , i16x8 , i32x4 , i64x2 , f32x4 , f64x2 |
ARM | reg | None | i8 , i16 , i32 , f32 |
ARM | sreg | vfp2 | i32 , f32 |
ARM | dreg | vfp2 | i64 , f64 , i8x8 , i16x4 , i32x2 , i64x1 , f32x2 |
ARM | qreg | neon | i8x16 , i16x8 , i32x4 , i64x2 , f32x4 |
MIPS32 | reg | None | i8 , i16 , i32 , f32 |
MIPS32 | freg | None | f32 , f64 |
MIPS64 | reg | None | i8 , i16 , i32 , i64 , f32 , f64 |
MIPS64 | freg | None | f32 , f64 |
NVPTX | reg16 | None | i8 , i16 |
NVPTX | reg32 | None | i8 , i16 , i32 , f32 |
NVPTX | reg64 | None | i8 , i16 , i32 , f32 , i64 , f64 |
RISC-V32 | reg | None | i8 , i16 , i32 , f32 |
RISC-V64 | reg | None | i8 , i16 , i32 , f32 , i64 , f64 |
RISC-V | freg | f | f32 |
RISC-V | freg | d | f64 |
Hexagon | reg | None | i8 , i16 , i32 , f32 |
Note: For the purposes of the above table pointers, function pointers and
isize
/usize
are treated as the equivalent integer type (i16
/i32
/i64
depending on the target).
If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. The only exception is the freg
register class on RISC-V where f32
values are NaN-boxed in a f64
as required by the RISC-V architecture.
When separate input and output expressions are specified for an inout
operand, both expressions must have the same type. The only exception is if both operands are pointers or integers, in which case they are only required to have the same size. This restriction exists because the register allocators in LLVM and GCC sometimes cannot handle tied operands with different types.
Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:
Architecture | Base register | Aliases |
---|---|---|
x86 | ax | eax , rax |
x86 | bx | ebx , rbx |
x86 | cx | ecx , rcx |
x86 | dx | edx , rdx |
x86 | si | esi , rsi |
x86 | di | edi , rdi |
x86 | bp | bpl , ebp , rbp |
x86 | sp | spl , esp , rsp |
x86 | ip | eip , rip |
x86 | st(0) | st |
x86 | r[8-15] | r[8-15]b , r[8-15]w , r[8-15]d |
x86 | xmm[0-31] | ymm[0-31] , zmm[0-31] |
AArch64 | x[0-30] | w[0-30] |
AArch64 | x29 | fp |
AArch64 | x30 | lr |
AArch64 | sp | wsp |
AArch64 | xzr | wzr |
AArch64 | v[0-31] | b[0-31] , h[0-31] , s[0-31] , d[0-31] , q[0-31] |
ARM | r[0-3] | a[1-4] |
ARM | r[4-9] | v[1-6] |
ARM | r9 | rfp |
ARM | r10 | sl |
ARM | r11 | fp |
ARM | r12 | ip |
ARM | r13 | sp |
ARM | r14 | lr |
ARM | r15 | pc |
RISC-V | x0 | zero |
RISC-V | x1 | ra |
RISC-V | x2 | sp |
RISC-V | x3 | gp |
RISC-V | x4 | tp |
RISC-V | x[5-7] | t[0-2] |
RISC-V | x8 | fp , s0 |
RISC-V | x9 | s1 |
RISC-V | x[10-17] | a[0-7] |
RISC-V | x[18-27] | s[2-11] |
RISC-V | x[28-31] | t[3-6] |
RISC-V | f[0-7] | ft[0-7] |
RISC-V | f[8-9] | fs[0-1] |
RISC-V | f[10-17] | fa[0-7] |
RISC-V | f[18-27] | fs[2-11] |
RISC-V | f[28-31] | ft[8-11] |
Hexagon | r29 | sp |
Hexagon | r30 | fr |
Hexagon | r31 | lr |
Some registers cannot be used for input or output operands:
Architecture | Unsupported register | Reason |
---|---|---|
All | sp | The stack pointer must be restored to its original value at the end of an asm code block. |
All | bp (x86), x29 (AArch64), x8 (RISC-V), fr (Hexagon), $fp (MIPS) | The frame pointer cannot be used as an input or output. |
ARM | r7 or r11 | On ARM the frame pointer can be either r7 or r11 depending on the target. The frame pointer cannot be used as an input or output. |
ARM | r6 | r6 is used internally by LLVM as a base pointer and therefore cannot be used as an input or output. |
x86 | k0 | This is a constant zero register which can't be modified. |
x86 | ip | This is the program counter, not a real register. |
x86 | mm[0-7] | MMX registers are not currently supported (but may be in the future). |
x86 | st([0-7]) | x87 registers are not currently supported (but may be in the future). |
AArch64 | xzr | This is a constant zero register which can't be modified. |
ARM | pc | This is the program counter, not a real register. |
MIPS | $0 or $zero | This is a constant zero register which can't be modified. |
MIPS | $1 or $at | Reserved for assembler. |
MIPS | $26 /$k0 , $27 /$k1 | OS-reserved registers. |
MIPS | $28 /$gp | Global pointer cannot be used as inputs or outputs. |
MIPS | $ra | Return address cannot be used as inputs or outputs. |
RISC-V | x0 | This is a constant zero register which can't be modified. |
RISC-V | gp , tp | These registers are reserved and cannot be used as inputs or outputs. |
Hexagon | lr | This is the link register which cannot be used as an input or output. |
In some cases LLVM will allocate a “reserved register” for reg
operands even though this register cannot be explicitly specified. Assembly code making use of reserved registers should be careful since reg
operands may alias with those registers. Reserved registers are:
r6
on ARM.The placeholders can be augmented by modifiers which are specified after the :
in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.
The supported modifiers are a subset of LLVM‘s (and GCC’s) asm template argument modifiers, but do not use the same letter codes.
Architecture | Register class | Modifier | Example output | LLVM modifier |
---|---|---|---|---|
x86-32 | reg | None | eax | k |
x86-64 | reg | None | rax | q |
x86-32 | reg_abcd | l | al | b |
x86-64 | reg | l | al | b |
x86 | reg_abcd | h | ah | h |
x86 | reg | x | ax | w |
x86 | reg | e | eax | k |
x86-64 | reg | r | rax | q |
x86 | reg_byte | None | al / ah | None |
x86 | xmm_reg | None | xmm0 | x |
x86 | ymm_reg | None | ymm0 | t |
x86 | zmm_reg | None | zmm0 | g |
x86 | *mm_reg | x | xmm0 | x |
x86 | *mm_reg | y | ymm0 | t |
x86 | *mm_reg | z | zmm0 | g |
x86 | kreg | None | k1 | None |
AArch64 | reg | None | x0 | x |
AArch64 | reg | w | w0 | w |
AArch64 | reg | x | x0 | x |
AArch64 | vreg | None | v0 | None |
AArch64 | vreg | v | v0 | None |
AArch64 | vreg | b | b0 | b |
AArch64 | vreg | h | h0 | h |
AArch64 | vreg | s | s0 | s |
AArch64 | vreg | d | d0 | d |
AArch64 | vreg | q | q0 | q |
ARM | reg | None | r0 | None |
ARM | sreg | None | s0 | None |
ARM | dreg | None | d0 | P |
ARM | qreg | None | q0 | q |
ARM | qreg | e / f | d0 / d1 | e / f |
MIPS | reg | None | $2 | None |
MIPS | freg | None | $f0 | None |
NVPTX | reg16 | None | rs0 | None |
NVPTX | reg32 | None | r0 | None |
NVPTX | reg64 | None | rd0 | None |
RISC-V | reg | None | x1 | None |
RISC-V | freg | None | f0 | None |
Hexagon | reg | None | r0 | None |
Notes:
- on ARM
e
/f
: this prints the low or high doubleword register name of a NEON quad (128-bit) register.- on x86: our behavior for
reg
with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the full register size.- on x86
xmm_reg
: thex
,t
andg
LLVM modifiers are not yet implemented in LLVM (they are supported by GCC only), but this should be a simple change.
As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done by using a template modifier to use a subregister name in the asm code (e.g. ax
instead of rax
). Since this an easy pitfall, the compiler will suggest a template modifier to use where appropriate given the input type. If all references to an operand already have modifiers then the warning is suppressed for that operand.
Flags are used to further influence the behavior of the inline assembly block. Currently the following options are defined:
pure
: The asm
block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to) or values read from memory (unless the nomem
options is also set). This allows the compiler to execute the asm
block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used.nomem
: The asm
blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the asm
block since it knows that they are not read or written to by the asm
.readonly
: The asm
block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the asm
block since it knows that they are not written to by the asm
.preserves_flags
: The asm
block does not modify the flags register (defined in the rules below). This allows the compiler to avoid recomputing the condition flags after the asm
block.noreturn
: The asm
block never returns, and its return type is defined as !
(never). Behavior is undefined if execution falls through past the end of the asm code. A noreturn
asm block behaves just like a function which doesn't return; notably, local variables in scope are not dropped before it is invoked.nostack
: The asm
block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this option is not used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.att_syntax
: This option is only valid on x86, and causes the assembler to use the .att_syntax prefix
mode of the GNU assembler. Register operands are substituted in with a leading %
.The compiler performs some additional checks on options:
nomem
and readonly
options are mutually exclusive: it is a compile-time error to specify both.pure
option must be combined with either the nomem
or readonly
options, otherwise a compile-time error is emitted.pure
on an asm block with no outputs or only discarded outputs (_
).noreturn
on an asm block with outputs.undef
which can have a different value every time you read it (since such a concept does not exist in assembly code).lateout
may be allocated to the same register as an in
, in which case this rule does not apply. Code should not rely on this however since it depends on the results of register allocation.readonly
option is set, then only memory reads are allowed.nomem
option is set then no reads or writes to memory are allowed.asm!
as a black box and only take the interface specification into account, not the instructions themselves.nostack
option is set, asm code is allowed to use stack space below the stack pointer.noreturn
option is set then behavior is undefined if execution falls through to the end of the asm block.pure
option is set then behavior is undefined if the asm
has side-effects other than its direct outputs. Behavior is also undefined if two executions of the asm
code with the same inputs result in different outputs.nomem
option, “inputs” are just the direct inputs of the asm!
.readonly
option, “inputs” comprise the direct inputs of the asm!
and any memory that the asm!
block is allowed to read.preserves_flags
option is set:EFLAGS
(CF, PF, AF, ZF, SF, OF).MXCSR
(PE, UE, OE, ZE, DE, IE).CPSR
(N, Z, C, V)CPSR
(Q)CPSR
(GE).FPSCR
(N, Z, C, V)FPSCR
(QC)FPSCR
(IDC, IXC, UFC, OFC, DZC, IOC).NZCV
register).FPSR
register).fcsr
(fflags
).EFLAGS
) is clear on entry to an asm block and must be clear on exit.asm!
block.asm!
blocks that never return (even if not marked noreturn
) don't need to preserve these registers.asm!
block than you entered (e.g. for context switching), these registers must contain the value they had upon entering the asm!
block that you are exiting.asm!
block that has not been entered. Neither can you exit an asm!
block that has already been exited.asm!
blocks you entered and exited.asm!
block will appear exactly once in the output binary. The compiler is allowed to instantiate multiple copies of the asm!
block, for example when the function containing it is inlined in multiple places.Note: As a general rule, the flags covered by
preserves_flags
are those which are not preserved when performing a function call.