Introduction to x86 Assembly
Last updated
Last updated
Shameless plug
This course is given to you for free by the Malcore team: https://m4lc.io/course/assembly/register
Consider registering, and using Malcore, so we can continue to provide free content for the entire community. You can also join our Discord server here: https://m4lc.io/course/assembly/discord
We offer free threat intel in our Discord via our custom designed Discord bot. Join the Discord to discuss this course in further detail or to ask questions.
You can also support us by buying us a coffee
NOTE: This course assumes that you are using Linux and have nasm installed.
In a nutshell assembly is a low-level programming language to write instructions that a CPU can directly execute. Each instruction in assembly is composed of a mnemonic (opcode), operand, and an address. Some instructions come with a prefix, suffix or flag.
x86 in the name specifies the architecture of the language. There are multiple types of assembly languages for each CPU architecture and for each CPU. For example, Intel chips have different instructions than ARM chips.
This course is a semi deep dive into the x86 programming language and should provide the user with enough information on how assembly works to build a program successfully.
In a sentence: registers are small storage locations in a CPU that's used to hold temporary data during execution.
Each register has a purpose. However, most of them can be used for general purposes or for various operations. In this section we will provide information on the registers in the x86 architecture.
EAX
Purpose:
The Accumulator register.
Common usage:
It is normally used for arithmetic operations, such as: add
, sub
, mul
. Also used to store calculations results.
Example of usage:
EBX
Purpose:
The Base register
Common usage:
Usually used as a pointer to data in memory, but can also be used for arithmetic purposes.
Example of usage:
ECX
Purpose:
The Counter register
Common usage:
Mostly used as a loop counter or for string/memory operations
Example of usage:
EDX
Purpose:
The Data register
Common usage:
Works with the eax
register for multiplication/division. Holds parts of large results/data.
Example of usage:
NOTE: It is worth mentioning that 32bit is smaller than 64bit to store which is why sometimes edx, and eax are used together to hold values.
ESI
Purpose:
The Source register
Common usage:
Mostly used for string operations to hold source address of mnemonics like: movsb
, movsw
, movsd
Example of usage:
EDI
Purpose:
The Destination Index register
Common usage:
String operations to hold the destination address in similar instructions
Example of usage:
EBP
Purpose:
Base Pointer register
Common usage:
Generally points to base of current stack frame. References function parameters and variables.
Example of usage:
ESP
Purpose
Stack Pointer register
Common usage:
Points to the top of the stack, automatically updated.
Example of usage:
EIP
Purpose:
Instruction pointer
Common usage:
Holds the address of the next instruction to be executed. This register is not directly modifiable by most instructions. Can only be modified through control flow instructions such as: jmp
, call
, ret
.
EFLAGS
Purpose
Flag register
Common usage:
Stores status flags that indicate operation results.
Flags:
Zero Flag (ZF): operation is zero
Carry Flag (CF): operation results in a carry or borrow
Sign Flag (SF): operation is negative
Now that we've gotten the registers out of the way, we need to learn about the stack. What's the stack? Well, the stack is basically a piece of memory that operates by doing the last in first out (LIFO) principle. This principle is a data structure where the last item added is the first item to be removed.
The stack is used to store temporary data such as addresses, local variables, and register states. The below image should provide you with a better understanding of the stack:
Now if you don't fully understand this yet that's okay there's a lot of information quickly. Let's go through it. Basic principles are as follows:
The stack grows downwards in memory, from higher address to lower address
The PUSH
mnemonic adds an item to the top of the stack.
The POP
mnemonic removes the most recent item that was added. For example:
The ESP
register is the stack pointer. This register tracks the top of the stack. So in the above example at the end, the register is 10
because we pushed 20 off the stack.
When a function is called the arguments and the return address of the function is pushed to the stack as well.
So basically, as the stack grows downward in memory the push
and pop
mnemonics manage the stack by adding and removing items from the top of the stack while the esp
register automatically tracks the top of the stack.
When writing code in assembly you will be adding sections to the assembly code. These sections are used to organize the code into specific areas of memory. These areas of memory have different purposes during runtime. Let's go through the sections and their responsibilities:
By breaking a program apart into sections, it allows the processor to access each type more efficiently. Such as how modern processors cache information differently to optimize speed.
When the operating system loads the program it does the following:
Loads the code marked as executable into memory
Loads the data into memory segments that's marked as writeable
Sets the permissions accordingly to help with performance and security
Basically, sections assemble code into readable, writeable, or readable and writeable segments to help the processor efficiently use memory, obtain higher levels of security, and provide easier management for both the assembler and the operating system. An example of sections is the following:
Now that we understand sections and what they are for, we can start writing some code. For this course we will write a basic 'Hello, World!' program using x86 assembly.
Now that we have written the program, we need to compile it. To compile it you will need NASM. You can see installation instructions on how to install NASM here. Assembly code needs to be compiled, and then linked to the correct format, save the above code into hello.asm
and follow the below steps:
Let’s breakdown what we just did starting with nasm -f elf32 -o hello.o hello.asm
:
There are plenty of other formats you can compile into. By running nasm -hf
you can see all of them. The next command is ld -m elf_i386 -o hello hello.o
. Same thing, lets break it down:
Once all these are done you will be able to call your output file by running it like so: ./hello
. This means that you have now successfully compiled and run an assembly program.
That's all there is to it! Assembly code can be daunting at times but is pretty simple once you get the grasp of it. In this course we have gone through the stack, the registers, building a simple program, and compiling that program successfully. We hope that this course has been useful to you and that you have learned something from it. Once again:
This course is given to you for free by the Malcore team: https://m4lc.io/course/assembly/register
Consider registering, and using Malcore, so we can continue to provide free content for the entire community. You can also join our Discord server here: https://m4lc.io/course/assembly/discord
We offer free threat intel in our Discord via our custom designed Discord bot. Join the Discord to discuss this course in further detail or to ask questions.
Register | Purpose |
---|---|
Section Name | Description | Info | Notes |
---|---|---|---|
EAX
Accumulator (arithmetic)
EBX
Base register
ECX
Counter (loops, shifts)
EDX
Data register
ESI
Source index
EDI
Destination index
ESP
Stack pointer
EBP
Base pointer (stack frame)
EIP
Instruction pointer
.text
Contains the code of the program
Executable by the CPU
Read-only to prevent modification
.data
Houses the static data of the program that will be modified by the program
Usually stores globals and initialized data
Most likely will be read-write
.bss
Stores uninitialized data
Variables sizes in the section are known but values are 0 at runtime
Saves memory because you don't need to initialize variables
.rdata
Stores read only data
Data is not modified during execution
Primary purpose is to hold constant data