There are some cases where writing some assembly code is preferred/needed to do certain operations (i.e. interrupts handling).
Nasm has a macro processor that supports conditional assembly, multi-level file inclusion, etc. A macro start with the '%' symbol.
There are two types of macros: single line (defined with %define
) and multiline wrapped around %macro
and %endmacro
. In this paragraph we will explain the multi-line macros.
A multi-line macro is defined as follows:
%macro my_first_macro 1
push ebp
mov ebp, esp
sub esp %1
%endmacro
A macro can be accessed from C if needed, in this case we need to add a global label to it, for example the macro above will become:
%macro my_first_macro 1
[global my_first_macro_label_%1]
my_first_macro_label_%1:
push ebp
mov ebp, esp
sub esp %1
%endmacro
In the code above we can see few new things:
- First we said the label
my_first_macro_label_%1
has to be set as global, this is pretty straightforward to understand. - the
%1
in the label definition, let us create different label using the first parameter passed in the macro.
So if now we add a new line with the following code:
my_first_macro 42
It creates the global label: my_first_macro_label_42
, and since it is global it will be visible also from our C code (of course if the files are linked)
Basically defining a macro with nasm is similar to use C define statement, these special "instruction" are evaluated by nasm preprocessor, and transformed at compile time.
So for example my_first_macro 42 is transformed in the following statement:
my_first_macro_label_42:
push ebp
mov ebp, esp
sub esp 42
In Nasm if we want to declare a "variable" initialized we can use the following directives:
Directive | Description |
---|---|
DB | Allocate a byte |
DW | Allocate 2 bytes (a word) |
DD | Allocate 4 bytes (a double word) |
DQ | Allocate 8 bytes (a quad word) |
These directive are intended to be used for initialized variables. The syntax is:
single_byte_var:
db 'y'
word_var:
dw 54321
double_var:
dd -54321
quad_var:
dq 133.463 ; Example with a real number
If we want to declare a string we need to use a different syntax for db:
string_var:
db "Hello", 10
The above code means that we are declaring a variable (string_variable
) that starts at 'H', and fill the consecutive bytes with the next letters. And what about the last number? It is just an extra byte, that represents the newline character, so what we are really storing is the string "Hello\n"
What we have seen so far is valid for a variable that can be initialized with a value, but what if we don't know the value yet, but we want just to "label" it with a variable name? Well is pretty simple, we have equivalent directives for reserving memory:
Directive | Description |
---|---|
RESB | Rserve a byte |
RESW | Rserve 2 bytes (a word) |
RESD | Rserve 4 bytes (a double word) |
RESQ | Rserve 8 bytes (a quad word) |
The syntax is similar as the previous examples:
single_byte_var:
resb 1
word_var:
resw 2
double_var:
resd 3
quad_var:
resq 4
One moment! What are those number after the directives? Well it's pretty simple, they indicate how many bytes/word/dword/qword we want to allocate. In the example above:
resb 1
Is reserving one byteresw 2
Is reserving 2 words, and each word is 2 bytes each, in total 4 bytesresd 3
Is reserving 3 dwords, again a dword is 4 bytes, in total we have 12 bytes reservedresq 4
Is reserving... well you should know it now...
In the asm code, if in 64bit mode, a call to cld is required before calling an external C function.
So for example if we want to call the following function from C:
void my_c_function(unsigned int my_value){
printf("My shiny function called from nasm worth: %d\n", my_value);
}
First thing is to let the compiler know that we want to reference an external function using extern
, and then just before calling the function, add the instruction cld.
Here an example:
[extern my_c_function]
; Some magic asm stuff that we don't care of...
mov rdi, 42
cld
call my_c_function
; other magic asm stuff that we don't care of...
As mentioned in the multiboot chapter, argument passing from asm to C in 64 bits is little bit different from 32 bits, so the first parameter of a C function is taken from rdi
(followed by: rsi
, rdx
, rcx
, r8
, r9
, then the stack), so the mov rdi, 42
is setting the value of my_value parameter to 42.
The output of the printf will be then:
My shiny function called from nasm worth: 42
Variable sizes are always important while coding, but while coding in asm they are even more important to understand how they works in assembly, and since there is no real type you can't rely on the variable type.
The important things to know when dealing with assembly code:
- when moving from memory to register, using the wrong register size will cause wrong value being loaded into the registry. Example:
mov rax, [memory_location_label]
is different from:
mov eax, [memory_location_label]
And it could potentially lead to two different values in the register. That because the size of rax is 8 bytes, while eax is only 4 bytes, so if we do a move from memory to register in the first case, the processor is going to read 8 memory locations, while in the second case only 4, and of course there can be differences (unless we are lucky enough and the extra 4 bytes are all 0s).
This is kind of misleading if we usually do mostly register to memory, or value to register, value to memory, where the size is "implicit".
Below an example showing a possible solution to a complex if statement. Let's assume that we have the following if
statement in C and we want to translate in assembly:
if ( var1==SOME_VALUE && var2 == SOME_VALUE2){
//do something
}
In asm we can do something like the following:
cmp [var1], SOME_VALUE
jne .else_label
cmp [var2], SOME_VALUE2
jne .else_label
;here code if both conditions are true
.else_label:
;the else part
And in a similar way we can have a if statement with a logic OR:
if (var1 == SOME_VALUE || var2 == SOME_VALUE){
//do_something
}
in asm it can be rendered with the following code
cmp [var1], SOME_VALUE
je .true_branch
cmp [var2], SOME_VALUE
je .true_branch
jmp .else_label
.true_branch
jne .else_label
The usual switch statement in C:
switch(variable){
case SOME_VALUE:
//do something
break;
case SOME_VALUE2:
//do something
break;
case SOME_VALUE3:
//do something
break;
}
can be rendered as:
cmp [var1], SOME_VALUE
je .value1_case
cmp [var1], SOME_VALUE2
je .value2_case
cmp [var1], SOME_VALUE3
je .value3_case
jmp .item_not_needed
.value1_case
;do stuff for value1
jmp .item_not_needed
.value2_case
;do stuff for value2
jmp .item_not_needed
.value3_case:
;do stuff for value3
.item_not_needed
;rest of the code
Another typical scenario are loops. For example imagine we have the following while loop in C:
unsigned int counter = 0;
while (counter < SOME_VALUE) {
//do something
counter++;
}
Again in assembly we can use the jmp
instructions family:
mov ecx, 0 ; Loop counter
.loop_cycle
; do sometehing
inc ecx
cmp ecx, SOME_VALUE
jne loop_cycle
The inc
instruction increase the value contained by the ecx
register.
Every language supports accessing data as a raw array of bytes, C provides an abstraction over this in the form of structs. NASM also happens to provide us with an abstraction over raw bytes, that is similar to how C does it.
This section will just introduce quickly how to define a basic struct, for more information and use cases is better to check the netwide assembler official documentation (see the useful links appendix)
Let's for example assume we have the following C struct:
struct task {
uint32_t id;
char name[8];
};
How nasm render a struct is basically declaring a list of offset labels, in this way we can use them to access the field starting from the struct memory location (Authors note: yeah it is a trick...)
To create a struct in nasm we use the struc
and endstruc
keywords, and the fields are defined between them.
The example above can be rendered in the following way:
struc task
id: resd 1
name: resb 8
endstruc
What this code is doing is creating three symbols: id
as 0 representing the offset from the beginning of a task structure and name
as 4 (still the offset) and the task
symbol that is 0 too. This notation has a drawback, it defines the labels as global constants, so you can't have another struct or label declared with same name, to solve this problem you can use the following notation:
struc task
.id: resd 1
.name: resb 8
endstruc
Now we can access the fields inside our struct in a familiar way: struct_name.field_name
. What's really happening here is the assembler will add the offset of field_name to the base address of struct_name to give us the real address of this variable.
Now if we have a memory location or register that contains our structure, for example let's say that we have the pointer to our structure stored in the register rax and we want to copy the id field in the register rbx:
mov rbx, dword [(rax + task.id)]
This is how to access a struct, basically we add the label representing an offset to its base address.
What if we want to create an instance of it? Well in this case we can use the macros istruc
and iend
, and using at
to access the fields. For example if we want create an instance of task with the values 1 for the id field and "hello123" for the name field, we can use the following syntax:
istruc task
at id dd 1
at name db 'hello123'
iend
In this way we have declared a struc
for the first of the two examples. But again this doesn't work with the second one, because the labels are different. In that case we have to use the full label name (that means adding the prefix task):
istruc task
at task.id dd 1
at task.name db 'hello123'
iend