Table of Contents

  1. Preamble
  2. The alternative: separate compilation
  3. Using GCC inline assembly
    1. The syntax of asm statements
    2. Example: add two numbers
    3. Example: multiplying two numbers
    4. Example: use cpuid to get the vendor string
    5. Useful references

A Practical Guide to GCC Inline Assembly

Guide, Programming

December 26, 2021

When you want to write part of your C code in assembly, you have two options. Either you can write and compile your assembly code separately, and then link it with the compiled C code, or you can write it inline. For the second option, GCC provides an extension that allows you to embed snippets of assembly code directly into your C code. However, its syntax is (perhaps infamously) complicated and unintuitive. This blog post will give you a more intuitive explanation for practical use, so that you can understand and use inline assembly if needed.


In this post, I assume you’re familiar with X86 assembly and C programming. In assembly code, I will be using AT&T syntax – i.e., source before destination, registers are prefixed with percent signs (%), and immediate values are prefixed with a dollar sign ($). GCC only supports AT&T assembly syntax. I’ll be using specifically extended inline assembly. Finally, all examples are intended for X86_64 GNU/Linux.

The alternative: separate compilation

First, it’s good to look at the alternative: compiling C and assembly code separately, and linking them together. This may be preferred if you’re writing long functions in assembly, as it might be easier to read and maintain.

Let’s say we want a program that takes two numbers, adds them, and prints the result, with the ‘add’ function implemented in assembly. First, write the assembly program, in the file add.s:

.global add         # Make sure the label is visible to the linker

add:                # The label, i.e. the start of the function
   push %rbp        # Function prologue
   mov %rsp, %rbp
   mov %rdi, %rax   # RAX will hold the result, place the first parameter there
   add %rsi, %rax   # Add the second parameter
   pop %rbp         # Epilogue: restore the base pointer
   ret              # Return from the function (jump to return address)

Then, write the ‘controlling’ C program that will call the function, in the file main.c:

#include <stdio.h>
// Declare the assembly function: because we use rsi/rdi/rax (not esi/edi/eax), we use 64-bit values, so 'long' (not int).
extern long add(long a, long b);
int main() {
    printf("%ld\n", add(7, 4)); // prints 11

Then, you can compile the program like this:

gcc add.s main.c -o main

You will get an executable called main, and if you run it, the result will be printed on screen.

Using GCC inline assembly

OK, now what if you don’t want entire functions in assembly, but only a few instructions? Such as, for example, rdtscp or cpuid? In most cases you might be better off using compiler intrinsics, but let’s say you really want to go with inline assembly. In this section of the post, I’ll give you an overview of inline assembly for practical use (though its capabilities are larger than can be covered in a single post, feel free to check the references below).

The syntax of asm statements

Although it looks like a bit of a mess, asm statements are actually quite simple. A statement looks like this:

asm (instructions : output operands : input operands : clobbers);

Any part may be omitted if empty, so for example if you have no operands or clobbers, you can just write:

asm (instructions);

Or if you have no output operands or clobbers, you can write something like this:

asm (instructions : : input operands);

For more complex inline assembly, I prefer writing it on multiple lines, like this:

asm (instructions
     : output operands
     : input operands
     : clobbers);

Keep in mind that the compiler might try to optimise asm statements, and might move or effectively delete your code. If you want your code to execute exactly the way you write it, you need to use the qualifier volatile (so, instead of asm (...);, you write asm volatile (...);). This is important if your code has side-effects that you want to preserve, otherwise the compiler might rewrite it in a way that doesn’t yield those side-effects.

Instructions (assembler templates)

The instructions are a string (or several concatenated strings), with placeholders for input and/or output operands. Each instruction is terminated by a literal newline (\n), or by a semicolon (I use semicolons).

Placeholders have two forms. The first form is %N, with N being the number of the operand, starting from 0. This includes both output and input operands, so if there are two output operands, the first input operand will be %2. The second form is %[label], where label is a label you give an operand (see examples later in this post). Because placeholders start with a percent sign, literal percent signs have to be doubled (e.g. %%rax instead of %rax).


Operands are separated by commas, and have the form:

[label] "constraints" (variable)

label is optional, and can make it easier to identify operands (I usually prefer the %[label] syntax over %N). variable is the variable (or value for input operands) that’s substituted in place of the operand in the assembly instructions.

constraints are what determines how a variable is treated when the final instruction stream is produced. You can specify whether the operand goes into a register (and which register), or memory, or if it’s immediate.

Constraints I frequently use:

Specifically for output, two constraint modifiers I use often:


Clobbers are a list of locations that are modified by the instructions, apart from those used in operands. Here, you list the registers that are modified (e.g. “rax” or “rdx”), “cc” if the flags register is modified, and “memory” if some other memory is modified.

Example: add two numbers

Let’s say we want a function that contains inline assembly to add two numbers. Here’s a program that does that:

#include <stdio.h>
int add(int a, int b) {
    int res;
    asm ("add %1, %2;"
         : "=r" (res)
         : "r" (a), "0" (b));
    return res;

int main() {
    printf("8+4 == %d\n", add(8,4)); // prints 8+4 == 12

The addition is done in the asm statement, which just executes an add instruction, on operands 1 and 2 – the two input operands. The two input operands are specified as "r" (a), "0" (b): the function arguments a and b. The constraints say that a should go in a register, and b should go in the same location as operand 0 (the output operand). The output operand is specified as "=r" (res), which says that res will be overwritten, and will be in a register. Since res and b will be in the same register (because of the 0 constraint on b), res will contain the result of the addition, which is then returned from the function.

Example: multiplying two numbers

Let’s take an example where we want to multiply two numbers, and the multiply function should use inline assembly for computation. You can implement it like this:

#include <stdio.h>
long mul(long a, long b) {
    int res[2] = {0};
    asm ("mul %[b];"
         : "=a" (res[0]), "=d" (res[1])
         : "0" (a), [b] "r" (b));
    return *(long*)(res);

int main() {
    printf("6*4 == %ld\n", mul(6, 4)); // prints 6*4 == 24

The mul instruction takes one operand, and multiplies whatever is in rax by that operand, placing the result in two registers: edx and eax. This means the result will be two 32-bit numbers, which are to be interpreted as a 64-bit number. So, we first start by declaring an array of two ints, which are 32 bits in size. The output operands are specified as "=a" (res[0]), "=d" (res[1]), which says that eax will overwrite the int at res[0], and edx will overwrite the int at res[1] (they are in reverse order because X86_64 Linux is little-endian). The input operands are specified as "0" (a), [b] "r" (b), which says that the first input operand (function argument a) should be stored in the same location as operand 0 (the first output operand, we specify rax in this case), and the second input operand (function argument b) will be stored in some register and referred to using the label b (in the mul instruction, as %[b]). Finally, after the asm statement, we cast res to a pointer to a long (to reinterpret the two 32-bit integers as a single 64-bit long) and dereference it to get the value.

Example: use cpuid to get the vendor string

For the third example, let’s say we want to get the CPU vendor string, using the cpuid instruction. This is how you’d do it with inline assembly:

#include <stdio.h>
void get_cpuid() {
    int res[4] = {0};
    asm ("movq $0, %%rax;"
         : "=b" (res[0]), "=d" (res[1]), "=c" (res[2])
         :: "eax");
    printf("Vendor string: %s\n", (char*)(&res)); // outputs GenuineIntel on my CPU

int main() {

The behavior of cpuid is selected by the value in rax, and if rax has the value 0, the vendor string is returned in registers ebx, edx, and ecx, in that order. So, we declare an array of four 32-bit values to hold these results and a null terminator for the string. The first instruction initializes rax to 0, to get the vendor string, and then the cpuid instruction executes. The output operands are specified as "=b" (res[0]), "=d" (res[1]), "=c" (res[2]), which places the values from the registers ebx, edx, and ecx into the result array. There are no input operands, and we list eax as a register that’s clobbered, because we explicitly modify it in the first movq instruction (and it’s not an output/input operand). Finally, we cast a pointer to the result array (containing three integers and a null terminator) to a character pointer, which allows printf to read it as a string.

Useful references

Here are some useful references for (inline) assembly: