December 26, 2021
When you want to write part of your C code in assembly, you have two options. Either you can write and compile your assembly code separately, and then link it with the compiled C code, or you can write it inline. For the second option, GCC provides an extension that allows you to embed snippets of assembly code directly into your C code. However, its syntax is (perhaps infamously) complicated and unintuitive. This blog post will give you a more intuitive explanation for practical use, so that you can understand and use inline assembly if needed.
In this post, I assume you’re familiar with X86 assembly and C programming.
In assembly code, I will be using AT&T syntax – i.e., source before destination, registers are prefixed with percent signs (%
), and immediate values are prefixed with a dollar sign ($
).
GCC only supports AT&T assembly syntax.
I’ll be using specifically extended inline assembly.
Finally, all examples are intended for X86_64 GNU/Linux.
First, it’s good to look at the alternative: compiling C and assembly code separately, and linking them together. This may be preferred if you’re writing long functions in assembly, as it might be easier to read and maintain.
Let’s say we want a program that takes two numbers, adds them, and prints the result, with the ‘add’ function implemented in assembly.
First, write the assembly program, in the file add.s
:
.global add # Make sure the label is visible to the linker
add: # The label, i.e. the start of the function
push %rbp # Function prologue
mov %rsp, %rbp
mov %rdi, %rax # RAX will hold the result, place the first parameter there
add %rsi, %rax # Add the second parameter
pop %rbp # Epilogue: restore the base pointer
ret # Return from the function (jump to return address)
Then, write the ‘controlling’ C program that will call the function, in the file main.c
:
#include <stdio.h>
// Declare the assembly function: because we use rsi/rdi/rax (not esi/edi/eax), we use 64-bit values, so 'long' (not int).
extern long add(long a, long b);
int main() {
printf("%ld\n", add(7, 4)); // prints 11
}
Then, you can compile the program like this:
gcc add.s main.c -o main
You will get an executable called main
, and if you run it, the result will be printed on screen.
OK, now what if you don’t want entire functions in assembly, but only a few instructions?
Such as, for example, rdtscp
or cpuid
?
In most cases you might be better off using compiler intrinsics, but let’s say you really want to go with inline assembly.
In this section of the post, I’ll give you an overview of inline assembly for practical use (though its capabilities are larger than can be covered in a single post, feel free to check the references below).
asm
statementsAlthough it looks like a bit of a mess, asm
statements are actually quite simple.
A statement looks like this:
asm (instructions : output operands : input operands : clobbers);
Any part may be omitted if empty, so for example if you have no operands or clobbers, you can just write:
asm (instructions);
Or if you have no output operands or clobbers, you can write something like this:
asm (instructions : : input operands);
For more complex inline assembly, I prefer writing it on multiple lines, like this:
asm (instructions
: output operands
: input operands
: clobbers);
Keep in mind that the compiler might try to optimise asm
statements, and might move or effectively delete your code.
If you want your code to execute exactly the way you write it, you need to use the qualifier volatile
(so, instead of asm (...);
, you write asm volatile (...);
).
This is important if your code has side-effects that you want to preserve, otherwise the compiler might rewrite it in a way that doesn’t yield those side-effects.
The instructions are a string (or several concatenated strings), with placeholders for input and/or output operands.
Each instruction is terminated by a literal newline (\n
), or by a semicolon (I use semicolons).
Placeholders have two forms.
The first form is %N
, with N
being the number of the operand, starting from 0.
This includes both output and input operands, so if there are two output operands, the first input operand will be %2
.
The second form is %[label]
, where label
is a label you give an operand (see examples later in this post).
Because placeholders start with a percent sign, literal percent signs have to be doubled (e.g. %%rax
instead of %rax
).
Operands are separated by commas, and have the form:
[label] "constraints" (variable)
label
is optional, and can make it easier to identify operands (I usually prefer the %[label]
syntax over %N
).
variable
is the variable (or value for input operands) that’s substituted in place of the operand in the assembly instructions.
constraints
are what determines how a variable is treated when the final instruction stream is produced.
You can specify whether the operand goes into a register (and which register), or memory, or if it’s immediate.
Constraints I frequently use:
rax
, rbx
, rcx
, or rdx
, respectivelySpecifically for output, two constraint modifiers I use often:
Clobbers are a list of locations that are modified by the instructions, apart from those used in operands. Here, you list the registers that are modified (e.g. “rax” or “rdx”), “cc” if the flags register is modified, and “memory” if some other memory is modified.
Let’s say we want a function that contains inline assembly to add two numbers. Here’s a program that does that:
#include <stdio.h>
int add(int a, int b) {
int res;
asm ("add %1, %2;"
: "=r" (res)
: "r" (a), "0" (b));
return res;
}
int main() {
printf("8+4 == %d\n", add(8,4)); // prints 8+4 == 12
}
The addition is done in the asm
statement, which just executes an add
instruction, on operands 1 and 2 – the two input operands.
The two input operands are specified as "r" (a), "0" (b)
: the function arguments a
and b
.
The constraints say that a
should go in a register, and b
should go in the same location as operand 0 (the output operand).
The output operand is specified as "=r" (res)
, which says that res
will be overwritten, and will be in a register.
Since res
and b
will be in the same register (because of the 0
constraint on b
), res
will contain the result of the addition, which is then returned from the function.
Let’s take an example where we want to multiply two numbers, and the multiply function should use inline assembly for computation. You can implement it like this:
#include <stdio.h>
long mul(long a, long b) {
int res[2] = {0};
asm ("mul %[b];"
: "=a" (res[0]), "=d" (res[1])
: "0" (a), [b] "r" (b));
return *(long*)(res);
}
int main() {
printf("6*4 == %ld\n", mul(6, 4)); // prints 6*4 == 24
}
The mul
instruction takes one operand, and multiplies whatever is in rax
by that operand, placing the result in two registers: edx
and eax
.
This means the result will be two 32-bit numbers, which are to be interpreted as a 64-bit number.
So, we first start by declaring an array of two int
s, which are 32 bits in size.
The output operands are specified as "=a" (res[0]), "=d" (res[1])
, which says that eax
will overwrite the int
at res[0]
, and edx
will overwrite the int
at res[1]
(they are in reverse order because X86_64 Linux is little-endian).
The input operands are specified as "0" (a), [b] "r" (b)
, which says that the first input operand (function argument a
) should be stored in the same location as operand 0 (the first output operand, we specify rax
in this case), and the second input operand (function argument b
) will be stored in some register and referred to using the label b
(in the mul
instruction, as %[b]
).
Finally, after the asm
statement, we cast res
to a pointer to a long (to reinterpret the two 32-bit integers as a single 64-bit long) and dereference it to get the value.
cpuid
to get the vendor stringFor the third example, let’s say we want to get the CPU vendor string, using the cpuid
instruction.
This is how you’d do it with inline assembly:
#include <stdio.h>
void get_cpuid() {
int res[4] = {0};
asm ("movq $0, %%rax;"
"cpuid;"
: "=b" (res[0]), "=d" (res[1]), "=c" (res[2])
:: "eax");
printf("Vendor string: %s\n", (char*)(&res)); // outputs GenuineIntel on my CPU
}
int main() {
get_cpuid();
}
The behavior of cpuid
is selected by the value in rax
, and if rax
has the value 0, the vendor string is returned in registers ebx
, edx
, and ecx
, in that order.
So, we declare an array of four 32-bit values to hold these results and a null terminator for the string.
The first instruction initializes rax
to 0, to get the vendor string, and then the cpuid
instruction executes.
The output operands are specified as "=b" (res[0]), "=d" (res[1]), "=c" (res[2])
, which places the values from the registers ebx
, edx
, and ecx
into the result array.
There are no input operands, and we list eax
as a register that’s clobbered, because we explicitly modify it in the first movq
instruction (and it’s not an output/input operand).
Finally, we cast a pointer to the result array (containing three integers and a null terminator) to a character pointer, which allows printf
to read it as a string.
Here are some useful references for (inline) assembly: