Introduction
Hey there, fellow tech enthusiasts! Welcome to a wild ride through the world of assembly language programming. Before we dive in, let’s get comfortable. So, grab a cold beer, cup of coffee, or refreshing juice, and let’s embark on this journey together.
I know, I know. Assembly language sounds intimidating, like trying to understand ancient hieroglyphics or deciphering your doctor’s handwriting. But fear not! We will break down these dense concepts into bite-sized, digestible chunks. By the end of this post, you’ll not only understand assembly language but might even find it… fun! (Yes, I said it!)
Why Should You Stick Around?
I get it – low-level programming can seem tedious at first, but trust me, it’s worth the effort. Understanding assembly language will give you superpowers!
Explanation of Assembly Language
Assembly language is a low-level programming language closely related to machine code, the native language of a computer’s central processing unit (CPU). Each instruction in assembly language corresponds directly to a single machine code instruction, making it a powerful tool for understanding how software interacts directly with hardware. To illustrate this better, I will present you with a scenario. Imagine you have a recipe and are trying to communicate with someone who only speaks French to provide instructions for a dish. In this scenario, you have two friends; one speaks Mandarin and French, and the other speaks Mandarin and English. Assuming you only speak English, you would need to tell your second friend the instructions, and then he would translate it to your other friend in Mandarin so that he could tell the last person in French the steps. You probably can already tell where I am going with this; that’s basically what happens when you are programming on a higher-level language like Python and compiling it into machine code. Given the blasting speeds of computers nowadays, it might still be better to use this chain rather than learning Frech. Heck, you might never have to speak a word, and everything will still work amazingly. Still, if you ever want to look at a French recipe and your friends are nowhere near, that’s the end of it. You will never figure out what delicious meal you are missing, So learn Frech! Well, I mean Assemby…
Importance of Knowing Assembly
Learning assembly language gives you a solid understanding of how computers work under the hood. You will also learn to appreciate the sophistication and abstraction of modern programming languages. Still, knowing assembly is priceless for tasks that require fine-grained control over system resources, such as writing operating systems, developing embedded systems, and performing low-level hardware manipulation. Understanding assembly can help optimize code and debug complex issues, even for higher-level programming. Additionally, if you enjoy participating in CTFs, assembly knowledge will prove essential for understanding and solving any pwn/binary exploitation challenge.
Practical Uses of Assembly Language
Assembly language is used in various fields, such as:
- Embedded Systems Development: Programming microcontrollers and other embedded devices.
- Operating System Development: Writing kernels and other low-level system components.
- Reverse Engineering: Analyzing software to understand its functionality and discover vulnerabilities.
- Performance Optimization: Fine-tuning critical sections of code for maximum efficiency.
How Assembly Works
Assembly language works by providing human-readable mnemonics that correspond to machine code instructions. The process typically involves writing assembly code, which an assembler converts into machine code. This machine code can then be executed directly by the CPU.
Understanding Registers
Registers are small, fast storage locations within a CPU that hold data and addresses. They are crucial in executing instructions by providing quick access to operands and intermediate results. Different types of registers include:
- General Purpose Registers (GPRs): Used for various functions.
- Special Purpose Registers: Used for specific tasks like instruction pointers and status flags.
32-bit vs. 64-bit vs. x86 Architecture
The main differences between 32-bit and 64-bit architectures lie in the width of the data bus and the size of the registers. The bigger the bus, the more people you can carry around, right? A 64-bit architecture can handle more memory and perform operations on larger data sizes compared to a 32-bit architecture. The x86 architecture, originally 32-bit, has evolved to include x86-64, which supports 64-bit processing.
In the following sections, I will provide you with a basic understanding of registers, their use, and standard instructions for manipulating them in Assembly. The registers I am using are for 64-bit architecture, but if you understand them, you can easily transfer that knowledge to other types of architecture. Don’t let any of this intimidate you. Stick around; I promise to make it easier to understand further on.
Registers and Their Usage
Expand
Collapse
- %rip (Instruction Pointer):
- Purpose: Holds the address of the next instruction to be executed.
- Usage: Automatically incremented as the CPU fetches and executes instructions.
- %rsp (Stack Pointer):
- Purpose: Points to the top of the current stack frame.
- Usage: Used to manage the stack, especially during function calls and local variable storage.
- %rax (Return Value):
- Purpose: Stores the return value of functions.
- Usage: The primary register for storing function return values.
- %rdi (1st Argument):
- Purpose: Holds the first argument to functions.
- Usage: Used to pass the first argument to functions according to the calling convention.
- %rsi (2nd Argument):
- Purpose: Holds the second argument to functions.
- Usage: Used to pass the second argument to functions according to the calling convention.
- %rdx (3rd Argument):
- Purpose: Holds the third argument to functions.
- Usage: Used to pass the third argument to functions according to the calling convention.
- %rcx (4th Argument):
- Purpose: Holds the fourth argument to functions.
- Usage: Used to pass the fourth argument to functions according to the calling convention.
- %r8 (5th Argument):
- Purpose: Holds the fifth argument to functions.
- Usage: Used to pass the fifth argument to functions according to the calling convention.
- %r9 (6th Argument):
- Purpose: Holds the sixth argument to functions.
- Usage: Used to pass the sixth argument to functions according to the calling convention.
- %r10, %r11 (Callee-owned):
- Purpose: Temporary registers that the called function (callee) can freely use.
- Usage: The caller does not expect these registers to be preserved. They can be used for intermediate calculations and temporary storage within functions.
- %rbx, %rbp, %r12-%r15 (Caller-owned):
- Purpose: Registers that must be preserved across function calls.
- Usage: If a function (callee) wants to use these registers, it must save their original values and restore them before returning to the caller. This ensures that the caller’s context is not disrupted.
- %eax (Extended Accumulator Register):
- Purpose: General-purpose register used for arithmetic, logic, and data transfer operations.
- Usage: Often used for arithmetic operations and as a general data register in 32-bit operations.
- %ebx (Base Register):
- Purpose: General-purpose register for base addressing.
- Usage: Often used to hold base addresses in memory access, especially in older calling conventions.
- %ecx (Count Register):
- Purpose: General-purpose register used for loop counters and shift operations.
- Usage: Commonly used in loops and for the
rep
prefix in string operations.
- %edx (Extended Data Register):
- Purpose: General-purpose register used in arithmetic operations and I/O operations.
- Usage: Often paired with %eax for multiply/divide operations and used to hold data in 32-bit operations.
- %edi (Destination Index):
- Purpose: Used as a destination pointer for string and memory operations.
- Usage: Often used in string operations like
movs
,stos
,cmps
, andscas
.
- %esi (Source Index):
- Purpose: Used as a source pointer for string and memory operations.
- Usage: Often used in string operations like
movs
,lods
,cmps
, andscas
.
Detailed Usage in Function Calls:
- Function Prologue:
- When a function is called, the current
%rsp
(stack pointer) and%rbp
(base pointer) are typically saved to create a new stack frame. - Caller-owned registers like
%rbx
,%rbp
, and%r12-%r15
are saved if they are to be used.
- When a function is called, the current
- Function Epilogue:
- Before returning from a function, the stack frame is dismantled, restoring
%rsp
and%rbp
. - Caller-owned registers are restored to their original values if they were modified.
- Before returning from a function, the stack frame is dismantled, restoring
Assembly Most Frequent Commands
Expand
Collapse
Data Transfer Instructions
- MOV
- Purpose: Copy data from one location to another.
- Example:
MOV RAX, RBX ; Copy the value in RBX to RAX
MOV RAX, 10 ; Load the immediate value 10 into RAX
MOV [RAX], RBX ; Copy the value in RBX to the memory location pointed to by RAX
- LEA (Load Effective Address)
- Purpose: Load the address of the source operand into the destination register.
- Example:
LEA RAX, [RBX+RCX*4] ; Load the effective address of the expression into RAX LEA RDI, [RBP-0x20] ; Load the address of RBP-0x20 into RDI
- PUSH
- Purpose: Push a value onto the stack.
- Example:
PUSH RAX ; Push the value in RAX onto the stack
- POP
- Purpose: Pop a value from the stack.
- Example:
POP RAX ; Pop the top value from the stack into RAX
Arithmetic Instructions
- ADD
- Purpose: Add two values.
- Example:
ADD RAX, RBX ; Add the value in RBX to RAX
ADD RAX, 5 ; Add the immediate value 5 to RAX
- SUB
- Purpose: Subtract one value from another.
- Example:
SUB RAX, RBX ; Subtract the value in RBX from RAX
SUB RAX, 5 ; Subtract the immediate value 5 from RAX
- MUL
- Purpose: Multiply values.
- Example:
MUL RBX ; Multiply RAX by RBX, result in RAX
- DIV
- Purpose: Divide values.
- Example:
DIV RBX ; Divide RAX by RBX, quotient in RAX, remainder in RDX
- INC
- Purpose: Increment a value by 1.
- Example:
INC RAX ; Increment RAX by 1
- DEC
- Purpose: Decrement a value by 1.
- Example:
DEC RAX ; Decrement RAX by 1
Control Flow Instructions
- CALL
- Purpose: Call a procedure (function).
- Example:
CALL MyFunction ; Call the procedure MyFunction
- RET
- Purpose: Return from a procedure (function).
- Example:
RET ; Return from the current procedure
- JMP
- Purpose: Jump to a specified location.
- Example:
JMP MyLabel ; Jump to the label MyLabel
- JE / JZ (Jump if Equal / Zero)
- Purpose: Jump if the comparison resulted in equality/zero.
- Example:
JE MyLabel ; Jump to MyLabel if the comparison (CMP) resulted in equality
JZ MyLabel ; Jump to MyLabel if the comparison (CMP) resulted in zero
- JNE / JNZ (Jump if Not Equal / Not Zero)
- Purpose: Jump if the comparison did not result in equality/zero.
- Example:
JNE MyLabel ; Jump to MyLabel if the comparison (CMP) did not result in equality
JNZ MyLabel ; Jump to MyLabel if the comparison (CMP) did not result in zero
- JG / JNLE (Jump if Greater / Not Less or Equal)
- Purpose: Jump if the first operand is greater than the second.
- Example:
JG MyLabel ; Jump to MyLabel if the first operand is greater than the second JNLE MyLabel ; Jump to MyLabel if the first operand is not less or equal to the second
- JL / JNGE (Jump if Less / Not Greater or Equal)
- Purpose: Jump if the first operand is less than the second.
- Example:
JL MyLabel ; Jump to MyLabel if the first operand is less than the second JNGE MyLabel ; Jump to MyLabel if the first operand is not greater or equal to the second
Logical Instructions
- AND
- Purpose: Perform a bitwise AND.
- Example:
AND RAX, RBX ; Perform a bitwise AND on RAX and RBX, result in RAX
- OR
- Purpose: Perform a bitwise OR.
- Example:
OR RAX, RBX ; Perform a bitwise OR on RAX and RBX, result in RAX
- XOR
- Purpose: Perform a bitwise XOR.
- Example:
XOR RAX, RBX ; Perform a bitwise XOR on RAX and RBX, result in RAX
- NOT
- Purpose: Perform a bitwise NOT.
- Example:
NOT RAX ; Perform a bitwise NOT on RAX
Shift and Rotate Instructions
- SHL (Shift Left)
- Purpose: Shift bits left.
- Example:
SHL RAX, 1 ; Shift the bits in RAX left by 1
- SHR (Shift Right)
- Purpose: Shift bits right.
- Example:
SHR RAX, 1 ; Shift the bits in RAX right by 1
Comparison and Test Instructions
- CMP
- Purpose: Compare two values.
- Example:
CMP RAX, RBX ; Compare RAX and RBX
CMP RAX, 10 ; Compare RAX and the immediate value 10
Miscellaneous Instructions
- NOP
- Purpose: No operation.
- Example:
NOP ; Do nothing
- TEST
- Purpose: Perform a bitwise AND between two operands and set the flags based on the result, but do not store the result.
- Example:
TEST RAX, RAX ; Perform a bitwise AND of RAX with itself and set flags
- LEAVE
- Purpose: Set the stack pointer to the base pointer and then pop the base pointer from the stack.
- Example:
LEAVE ; Equivalent to MOV RSP, RBP and then POP RBP
SIMD (Single Instruction, Multiple Data) Instructions
- MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values)
- Purpose: Move 128-bit packed single-precision floating-point values from the source operand to the destination operand. Both operands must be aligned on a 16-byte boundary.
- Example:
MOVAPS XMM1, [RAX] ; Move 128-bit value from memory at address RAX to XMM1 MOVAPS [RAX], XMM1 ; Move 128-bit value from XMM1 to memory at address RAX MOVAPS XMM1, XMM2 ; Move 128-bit value from XMM2 to XMM1
Concepts to Understand Assembly Code
Now that we have all the jargon out of the way, I will attempt to make it easier to understand.
- Stack Frame: Imagine your program’s memory like a stack of plates at a buffet. Each plate represents a function call, with the top plate being the currently active function. When you call a new function, you add a new plate to the stack (push). When the function finishes, you remove the top plate (pop). The stack frame is like a section on that plate where the function keeps all its variables, return addresses, and other important data.
- Base Pointer (rbp): Think of the base pointer (rbp) as a bookmark in your favorite book. When you open a new chapter (function), you place a bookmark to remember where you started. This way, no matter how deep you go into sub-chapters (nested function calls), you can always find your way back to the beginning of the chapter.
- Stack Pointer (rsp): If the base pointer is the bookmark, the stack pointer (rsp) is your finger that moves along the page as you read. It points to the current position in the stack where the next piece of data will be added or removed. It’s constantly changing as you call and return from functions.
- Registers: Registers are like super-fast sticky notes that the CPU uses to keep track of important information. They’re limited in number but incredibly quick to access. General-purpose registers (like eax, ebx, etc.) can hold anything from data to addresses, while special-purpose registers have specific jobs (e.g., instruction pointer rip keeps track of the next instruction to execute).
- Prologue and Epilogue: These are the opening and closing acts of a function. The prologue sets up the stage (creates the stack frame, saves the previous base pointer), and the epilogue cleans up (restores the base pointer, resets the stack pointer) before the function exits. It’s like setting up and breaking down a party – you prepare before guests arrive and clean up after they leave.
Imagine you’re organizing a treasure hunt at a summer camp. Each camper represents a function in your program.
Step 1: Prologue (Preparing for the Hunt)
- When a camper (function) starts the hunt, they grab a new backpack (create a stack frame) and put a name tag inside (push rbp).
- They write down the map coordinates of the camp entrance (mov rbp, rsp) so they can find their way back.
Step 2: The Hunt (Executing the Function)
- The camper collects clues (operands and results) and stores them in their pockets (registers like eax, ebx).
- They keep track of where they found each clue using sticky notes (general-purpose registers).
Step 3: Nested Hunts (Calling Another Function)
- If a camper needs help, they call another camper (function call), giving them a new backpack (new stack frame) and sharing some clues (pushing arguments on the stack).
- The new camper follows the same preparation steps (push rbp, mov rbp, rsp) and goes off to gather more clues.
Step 4: Returning with Clues (Returning from Function)
- When a camper finishes their hunt, they return to the campfire (return from function), giving back their backpack (pop rbp) and sharing the final clue they found (return value in eax).
Step 5: Epilogue (Cleaning Up)
- They remove their name tag from the backpack (pop rbp), ensuring everything is back in place.
- The camp director (CPU) checks the final clue and decides the next step in the treasure hunt (next instruction).
By thinking of functions as campers on a treasure hunt, stack frames as their backpacks, base pointers as bookmarks, and stack pointers as their moving fingers, the concepts become more intuitive and fun to understand. Plus, who doesn’t love a good treasure hunt?
Applying the knowledge
C Code
// Takes two numbers and returns their sum
int add(int a, int b) {
return a + b;
}
// Main function and entry point of the program
int main() {
int x = 2;
int y = 10;
int result = add(x, y);
if(result > 11) {
return result;
} else {
return 0;
}
}
Assembly Code (x86-64)
; add:
push rbp
mov rbp,rsp
mov DWORD PTR [rbp-0x4],edi
mov DWORD PTR [rbp-0x8],esi
mov edx,DWORD PTR [rbp-0x4]
mov eax,DWORD PTR [rbp-0x8]
add eax,edx
pop rbp
ret
; main:
push rbp
mov rbp,rsp
sub rsp,0x10
mov DWORD PTR [rbp-0x4],0x2
mov DWORD PTR [rbp-0x8],0xa
mov edx,DWORD PTR [rbp-0x8]
mov eax,DWORD PTR [rbp-0x4]
mov esi,edx
mov edi,eax
call 0x555555555129 <add>
mov DWORD PTR [rbp-0xc],eax
cmp DWORD PTR [rbp-0xc],0xb
jle 0x555555555170 <main+51>
mov eax,DWORD PTR [rbp-0xc]
jmp 0x555555555175 <main+56>
mov eax,0x0
leave
ret
Let’s Break Down the Code!
Alright, let’s dive into our treasure hunt step by step. We’ll go over the assembly code, and I will explain what each part does in a friendly, easy-to-understand way.
And there you have it! You’ve taken your first steps into the fascinating world of assembly language. From registers and stack frames to the detailed mechanics of function calls, you’ve seen what really goes on under the hood of your programs.
Understanding assembly isn’t just for creating low-level magic—it’s about appreciating the intricacies of how your code actually runs. It’s like knowing how the gears and springs work in a watch; you get a deeper sense of the precision and complexity involved.
So, next time you write code in a high-level language, you’ll have a newfound respect for the tiny instructions and the diligent registers making it all happen. Whether you’re optimizing performance or just curious, assembly language gives you a closer look at the true nuts and bolts of computing.
Thanks for joining me on this journey through the basics of assembly. Keep exploring, stay curious, and happy coding!