You are currently viewing Intro to Assembly

Intro to Assembly

  • Post category:Resources / Tutorials
  • Post last modified:June 16, 2024
  • Reading time:33 mins read

Introduction

Hey there, fellow tech enthusiasts! Welcome to a wild ride through the world of assembly language programming. Before we dive in, let’s get comfortable. So, grab a cold beer, cup of coffee, or refreshing juice, and let’s embark on this journey together.

I know, I know. Assembly language sounds intimidating, like trying to understand ancient hieroglyphics or deciphering your doctor’s handwriting. But fear not! We will break down these dense concepts into bite-sized, digestible chunks. By the end of this post, you’ll not only understand assembly language but might even find it… fun! (Yes, I said it!)

Why Should You Stick Around?

I get it – low-level programming can seem tedious at first, but trust me, it’s worth the effort. Understanding assembly language will give you superpowers!

Explanation of Assembly Language

Assembly language is a low-level programming language closely related to machine code, the native language of a computer’s central processing unit (CPU). Each instruction in assembly language corresponds directly to a single machine code instruction, making it a powerful tool for understanding how software interacts directly with hardware. To illustrate this better, I will present you with a scenario. Imagine you have a recipe and are trying to communicate with someone who only speaks French to provide instructions for a dish. In this scenario, you have two friends; one speaks Mandarin and French, and the other speaks Mandarin and English. Assuming you only speak English, you would need to tell your second friend the instructions, and then he would translate it to your other friend in Mandarin so that he could tell the last person in French the steps. You probably can already tell where I am going with this; that’s basically what happens when you are programming on a higher-level language like Python and compiling it into machine code. Given the blasting speeds of computers nowadays, it might still be better to use this chain rather than learning Frech. Heck, you might never have to speak a word, and everything will still work amazingly. Still, if you ever want to look at a French recipe and your friends are nowhere near, that’s the end of it. You will never figure out what delicious meal you are missing, So learn Frech! Well, I mean Assemby…

Importance of Knowing Assembly

Learning assembly language gives you a solid understanding of how computers work under the hood. You will also learn to appreciate the sophistication and abstraction of modern programming languages. Still, knowing assembly is priceless for tasks that require fine-grained control over system resources, such as writing operating systems, developing embedded systems, and performing low-level hardware manipulation. Understanding assembly can help optimize code and debug complex issues, even for higher-level programming. Additionally, if you enjoy participating in CTFs, assembly knowledge will prove essential for understanding and solving any pwn/binary exploitation challenge.

Practical Uses of Assembly Language

Assembly language is used in various fields, such as:

  • Embedded Systems Development: Programming microcontrollers and other embedded devices.
  • Operating System Development: Writing kernels and other low-level system components.
  • Reverse Engineering: Analyzing software to understand its functionality and discover vulnerabilities.
  • Performance Optimization: Fine-tuning critical sections of code for maximum efficiency.

How Assembly Works

Assembly language works by providing human-readable mnemonics that correspond to machine code instructions. The process typically involves writing assembly code, which an assembler converts into machine code. This machine code can then be executed directly by the CPU.

Understanding Registers

Registers are small, fast storage locations within a CPU that hold data and addresses. They are crucial in executing instructions by providing quick access to operands and intermediate results. Different types of registers include:

  • General Purpose Registers (GPRs): Used for various functions.
  • Special Purpose Registers: Used for specific tasks like instruction pointers and status flags.

32-bit vs. 64-bit vs. x86 Architecture

The main differences between 32-bit and 64-bit architectures lie in the width of the data bus and the size of the registers. The bigger the bus, the more people you can carry around, right? A 64-bit architecture can handle more memory and perform operations on larger data sizes compared to a 32-bit architecture. The x86 architecture, originally 32-bit, has evolved to include x86-64, which supports 64-bit processing.

In the following sections, I will provide you with a basic understanding of registers, their use, and standard instructions for manipulating them in Assembly. The registers I am using are for 64-bit architecture, but if you understand them, you can easily transfer that knowledge to other types of architecture. Don’t let any of this intimidate you. Stick around; I promise to make it easier to understand further on.

Registers and Their Usage

Expand
  1. %rip (Instruction Pointer):
    • Purpose: Holds the address of the next instruction to be executed.
    • Usage: Automatically incremented as the CPU fetches and executes instructions.
  2. %rsp (Stack Pointer):
    • Purpose: Points to the top of the current stack frame.
    • Usage: Used to manage the stack, especially during function calls and local variable storage.
  3. %rax (Return Value):
    • Purpose: Stores the return value of functions.
    • Usage: The primary register for storing function return values.
  4. %rdi (1st Argument):
    • Purpose: Holds the first argument to functions.
    • Usage: Used to pass the first argument to functions according to the calling convention.
  5. %rsi (2nd Argument):
    • Purpose: Holds the second argument to functions.
    • Usage: Used to pass the second argument to functions according to the calling convention.
  6. %rdx (3rd Argument):
    • Purpose: Holds the third argument to functions.
    • Usage: Used to pass the third argument to functions according to the calling convention.
  7. %rcx (4th Argument):
    • Purpose: Holds the fourth argument to functions.
    • Usage: Used to pass the fourth argument to functions according to the calling convention.
  8. %r8 (5th Argument):
    • Purpose: Holds the fifth argument to functions.
    • Usage: Used to pass the fifth argument to functions according to the calling convention.
  9. %r9 (6th Argument):
    • Purpose: Holds the sixth argument to functions.
    • Usage: Used to pass the sixth argument to functions according to the calling convention.
  10. %r10, %r11 (Callee-owned):
    • Purpose: Temporary registers that the called function (callee) can freely use.
    • Usage: The caller does not expect these registers to be preserved. They can be used for intermediate calculations and temporary storage within functions.
  11. %rbx, %rbp, %r12-%r15 (Caller-owned):
    • Purpose: Registers that must be preserved across function calls.
    • Usage: If a function (callee) wants to use these registers, it must save their original values and restore them before returning to the caller. This ensures that the caller’s context is not disrupted.
  12. %eax (Extended Accumulator Register):
    • Purpose: General-purpose register used for arithmetic, logic, and data transfer operations.
    • Usage: Often used for arithmetic operations and as a general data register in 32-bit operations.
  13. %ebx (Base Register):
    • Purpose: General-purpose register for base addressing.
    • Usage: Often used to hold base addresses in memory access, especially in older calling conventions.
  14. %ecx (Count Register):
    • Purpose: General-purpose register used for loop counters and shift operations.
    • Usage: Commonly used in loops and for the rep prefix in string operations.
  15. %edx (Extended Data Register):
    • Purpose: General-purpose register used in arithmetic operations and I/O operations.
    • Usage: Often paired with %eax for multiply/divide operations and used to hold data in 32-bit operations.
  16. %edi (Destination Index):
    • Purpose: Used as a destination pointer for string and memory operations.
    • Usage: Often used in string operations like movs, stos, cmps, and scas.
  17. %esi (Source Index):
    • Purpose: Used as a source pointer for string and memory operations.
    • Usage: Often used in string operations like movs, lods, cmps, and scas.
Detailed Usage in Function Calls:
  • Function Prologue:
    • When a function is called, the current %rsp (stack pointer) and %rbp (base pointer) are typically saved to create a new stack frame.
    • Caller-owned registers like %rbx, %rbp, and %r12-%r15 are saved if they are to be used.
  • Function Epilogue:
    • Before returning from a function, the stack frame is dismantled, restoring %rsp and %rbp.
    • Caller-owned registers are restored to their original values if they were modified.

Assembly Most Frequent Commands

Expand

Data Transfer Instructions

  1. MOV
    • Purpose: Copy data from one location to another.
    • Example:
      MOV RAX, RBX ; Copy the value in RBX to RAX
      MOV RAX, 10 ; Load the immediate value 10 into RAX
      MOV [RAX], RBX ; Copy the value in RBX to the memory location pointed to by RAX
  2. LEA (Load Effective Address)
    • Purpose: Load the address of the source operand into the destination register.
    • Example:
      LEA RAX, [RBX+RCX*4] ; Load the effective address of the expression into RAX LEA RDI, [RBP-0x20] ; Load the address of RBP-0x20 into RDI
  3. PUSH
    • Purpose: Push a value onto the stack.
    • Example:
      PUSH RAX ; Push the value in RAX onto the stack
  4. POP
    • Purpose: Pop a value from the stack.
    • Example:
      POP RAX ; Pop the top value from the stack into RAX

Arithmetic Instructions

  1. ADD
    • Purpose: Add two values.
    • Example:
      ADD RAX, RBX ; Add the value in RBX to RAX
      ADD RAX, 5 ; Add the immediate value 5 to RAX
  2. SUB
    • Purpose: Subtract one value from another.
    • Example:
      SUB RAX, RBX ; Subtract the value in RBX from RAX
      SUB RAX, 5 ; Subtract the immediate value 5 from RAX
  3. MUL
    • Purpose: Multiply values.
    • Example:
      MUL RBX ; Multiply RAX by RBX, result in RAX
  4. DIV
    • Purpose: Divide values.
    • Example:
      DIV RBX ; Divide RAX by RBX, quotient in RAX, remainder in RDX
  5. INC
    • Purpose: Increment a value by 1.
    • Example:
      INC RAX ; Increment RAX by 1
  6. DEC
    • Purpose: Decrement a value by 1.
    • Example:
      DEC RAX ; Decrement RAX by 1

Control Flow Instructions

  1. CALL
    • Purpose: Call a procedure (function).
    • Example:
      CALL MyFunction ; Call the procedure MyFunction
  2. RET
    • Purpose: Return from a procedure (function).
    • Example:
      RET ; Return from the current procedure
  3. JMP
    • Purpose: Jump to a specified location.
    • Example:
      JMP MyLabel ; Jump to the label MyLabel
  4. JE / JZ (Jump if Equal / Zero)
    • Purpose: Jump if the comparison resulted in equality/zero.
    • Example:
      JE MyLabel ; Jump to MyLabel if the comparison (CMP) resulted in equality
      JZ MyLabel ; Jump to MyLabel if the comparison (CMP) resulted in zero
  5. JNE / JNZ (Jump if Not Equal / Not Zero)
    • Purpose: Jump if the comparison did not result in equality/zero.
    • Example:
      JNE MyLabel ; Jump to MyLabel if the comparison (CMP) did not result in equality
      JNZ MyLabel ; Jump to MyLabel if the comparison (CMP) did not result in zero
  6. JG / JNLE (Jump if Greater / Not Less or Equal)
    • Purpose: Jump if the first operand is greater than the second.
    • Example:
      JG MyLabel ; Jump to MyLabel if the first operand is greater than the second JNLE MyLabel ; Jump to MyLabel if the first operand is not less or equal to the second
  7. JL / JNGE (Jump if Less / Not Greater or Equal)
    • Purpose: Jump if the first operand is less than the second.
    • Example:
      JL MyLabel ; Jump to MyLabel if the first operand is less than the second JNGE MyLabel ; Jump to MyLabel if the first operand is not greater or equal to the second

Logical Instructions

  1. AND
    • Purpose: Perform a bitwise AND.
    • Example:
      AND RAX, RBX ; Perform a bitwise AND on RAX and RBX, result in RAX
  2. OR
    • Purpose: Perform a bitwise OR.
    • Example:
      OR RAX, RBX ; Perform a bitwise OR on RAX and RBX, result in RAX
  3. XOR
    • Purpose: Perform a bitwise XOR.
    • Example:
      XOR RAX, RBX ; Perform a bitwise XOR on RAX and RBX, result in RAX
  4. NOT
    • Purpose: Perform a bitwise NOT.
    • Example:
      NOT RAX ; Perform a bitwise NOT on RAX

Shift and Rotate Instructions

  1. SHL (Shift Left)
    • Purpose: Shift bits left.
    • Example:
      SHL RAX, 1 ; Shift the bits in RAX left by 1
  2. SHR (Shift Right)
    • Purpose: Shift bits right.
    • Example:
      SHR RAX, 1 ; Shift the bits in RAX right by 1

Comparison and Test Instructions

  1. CMP
    • Purpose: Compare two values.
    • Example:
      CMP RAX, RBX ; Compare RAX and RBX
      CMP RAX, 10 ; Compare RAX and the immediate value 10

Miscellaneous Instructions

  1. NOP
    • Purpose: No operation.
    • Example:
      NOP ; Do nothing
  2. TEST
    • Purpose: Perform a bitwise AND between two operands and set the flags based on the result, but do not store the result.
    • Example:
      TEST RAX, RAX ; Perform a bitwise AND of RAX with itself and set flags
  3. LEAVE
    • Purpose: Set the stack pointer to the base pointer and then pop the base pointer from the stack.
    • Example:
      LEAVE ; Equivalent to MOV RSP, RBP and then POP RBP

SIMD (Single Instruction, Multiple Data) Instructions

  1. MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values)
    • Purpose: Move 128-bit packed single-precision floating-point values from the source operand to the destination operand. Both operands must be aligned on a 16-byte boundary.
    • Example:
      MOVAPS XMM1, [RAX] ; Move 128-bit value from memory at address RAX to XMM1 MOVAPS [RAX], XMM1 ; Move 128-bit value from XMM1 to memory at address RAX MOVAPS XMM1, XMM2 ; Move 128-bit value from XMM2 to XMM1

Concepts to Understand Assembly Code

Now that we have all the jargon out of the way, I will attempt to make it easier to understand.
  1. Stack Frame: Imagine your program’s memory like a stack of plates at a buffet. Each plate represents a function call, with the top plate being the currently active function. When you call a new function, you add a new plate to the stack (push). When the function finishes, you remove the top plate (pop). The stack frame is like a section on that plate where the function keeps all its variables, return addresses, and other important data.
  2. Base Pointer (rbp): Think of the base pointer (rbp) as a bookmark in your favorite book. When you open a new chapter (function), you place a bookmark to remember where you started. This way, no matter how deep you go into sub-chapters (nested function calls), you can always find your way back to the beginning of the chapter.
  3. Stack Pointer (rsp): If the base pointer is the bookmark, the stack pointer (rsp) is your finger that moves along the page as you read. It points to the current position in the stack where the next piece of data will be added or removed. It’s constantly changing as you call and return from functions.
  4. Registers: Registers are like super-fast sticky notes that the CPU uses to keep track of important information. They’re limited in number but incredibly quick to access. General-purpose registers (like eax, ebx, etc.) can hold anything from data to addresses, while special-purpose registers have specific jobs (e.g., instruction pointer rip keeps track of the next instruction to execute).
  5. Prologue and Epilogue: These are the opening and closing acts of a function. The prologue sets up the stage (creates the stack frame, saves the previous base pointer), and the epilogue cleans up (restores the base pointer, resets the stack pointer) before the function exits. It’s like setting up and breaking down a party – you prepare before guests arrive and clean up after they leave.
Imagine you’re organizing a treasure hunt at a summer camp. Each camper represents a function in your program.

Step 1: Prologue (Preparing for the Hunt)

  • When a camper (function) starts the hunt, they grab a new backpack (create a stack frame) and put a name tag inside (push rbp).
  • They write down the map coordinates of the camp entrance (mov rbp, rsp) so they can find their way back.

Step 2: The Hunt (Executing the Function)

  • The camper collects clues (operands and results) and stores them in their pockets (registers like eax, ebx).
  • They keep track of where they found each clue using sticky notes (general-purpose registers).

Step 3: Nested Hunts (Calling Another Function)

  • If a camper needs help, they call another camper (function call), giving them a new backpack (new stack frame) and sharing some clues (pushing arguments on the stack).
  • The new camper follows the same preparation steps (push rbp, mov rbp, rsp) and goes off to gather more clues.

Step 4: Returning with Clues (Returning from Function)

  • When a camper finishes their hunt, they return to the campfire (return from function), giving back their backpack (pop rbp) and sharing the final clue they found (return value in eax).

Step 5: Epilogue (Cleaning Up)

  • They remove their name tag from the backpack (pop rbp), ensuring everything is back in place.
  • The camp director (CPU) checks the final clue and decides the next step in the treasure hunt (next instruction).

By thinking of functions as campers on a treasure hunt, stack frames as their backpacks, base pointers as bookmarks, and stack pointers as their moving fingers, the concepts become more intuitive and fun to understand. Plus, who doesn’t love a good treasure hunt?

Applying the knowledge

C Code
// Takes two numbers and returns their sum 
int add(int a, int b) {
    return a + b;
}

// Main function and entry point of the program
int main() {
    int x = 2;
    int y = 10;
    int result = add(x, y);
    if(result > 11) {
        return result;
    } else {
        return 0;
    }
}
Assembly Code (x86-64)
; add:
push   rbp
mov    rbp,rsp
mov    DWORD PTR [rbp-0x4],edi
mov    DWORD PTR [rbp-0x8],esi
mov    edx,DWORD PTR [rbp-0x4]
mov    eax,DWORD PTR [rbp-0x8]
add    eax,edx
pop    rbp
ret

; main:
push   rbp
mov    rbp,rsp
sub    rsp,0x10
mov    DWORD PTR [rbp-0x4],0x2
mov    DWORD PTR [rbp-0x8],0xa
mov    edx,DWORD PTR [rbp-0x8]
mov    eax,DWORD PTR [rbp-0x4]
mov    esi,edx
mov    edi,eax
call   0x555555555129 <add>
mov    DWORD PTR [rbp-0xc],eax
cmp    DWORD PTR [rbp-0xc],0xb
jle    0x555555555170 <main+51>
mov    eax,DWORD PTR [rbp-0xc]
jmp    0x555555555175 <main+56>
mov    eax,0x0
leave
ret

Let’s Break Down the Code!

Alright, let’s dive into our treasure hunt step by step. We’ll go over the assembly code, and I will explain what each part does in a friendly, easy-to-understand way.

Main

We start at the main function. For the rest of the slides the stack is shown to the right and to the left we have the assembly code. The top of the stack is defined by RSP pointing to 0xd98. Memory addresses are hex numbers; they are longer than that but for simplicity we are gonna be displaying only the last part.

RIP points to the next instruction, the effect of “push rbp” we will see it in the next slide.

1

push rbp: Saved the base pointer (rbp) of the previous stack frame. This is the standard prologue in function calls to maintain the stack frame. Each small square represents a byte, since rbp value is 8 bytes is going to take the space in memory from 0xd90 – 0xd98.

Remember, RIP is showing the next instruction so we are looking basically at what were the effects of the first line.

2

mov rbp,rsp: Set the base pointer (rbp) to the current stack pointer (rsp), establishing a new stack frame for this function.

3

sub rsp,0x10: Allocate (reserve) 16 bytes (0x10 in hexadecimal) of space on the stack for local variables. This adjusts the stack pointer (rsp) down by 16 bytes (0xd90 – 16 bytes = 0xd80).

4

mov DWORD PTR [rbp-0x4],0x2: Store the value 2 into the memory location at rbp-0x4. This corresponds to the variable x in the C code. I am not going to explain what a DWORD is but basically since we are storing the number 2, which is an integer we reserve 4 bytes for it, that’s why is rbp – 0x4.

5

mov DWORD PTR [rbp-0x8],0xa: Store the value 10 into the memory location at rbp-0x8. This corresponds to the variable y in the C code. Very similar to the previous slide. We are doing rbp – 8 bytes since we have to account for the 4 bytes where we stored the number 2.

6

mov edx,DWORD PTR [rbp-0x8]: Move the value of y (which is stored at rbp-0x8) into the edx register.

7

mov eax,DWORD PTR [rbp-0x4]: Move the value of x (which is stored at rbp-0x4) into the eax register.

8

mov esi,edx: Copy the value from the edx register into the esi register. In the System V AMD64 ABI, esi is used for the second argument of a function. Basically setting up the parameters to be sent to the add function.

9

call 0x555555555129 : Call the add function which is at memory address 0x555555555129. This instruction will push the return address onto the stack and jump to the add function.

add_1

Now we have moved into the add function. The return address was automatically pushed to the stack taking 8 bytes. This automatically updates the rsp pointer which moved down to the correspondent place.
If you are wondering why are there 8 bytes not being used (0xd88 – 0xd80), it is standard functionality in this systems and is done to keep the stack aligned. If you are interested in knowing the why, I encourage you to look for it and understand why this is necessary. Stack Alignment.

add_2

push rbp: Save the base pointer (rbp) of the previous stack frame. As routine we save the previous rbp value and next we start our stack frame.

add_3

mov rbp,rsp: Set the base pointer (rbp) to the current stack pointer (rsp), establishing a new stack frame for this function.

add_4

mov DWORD PTR [rbp-0x4],edi: Store the first argument (a) into the memory location at rbp-0x4. Saving the arguments to the stack so we can use them. Remember edi has our first argument and esi the second one.

add_5

mov DWORD PTR [rbp-0x8],esi: Store the second argument (b) into the memory location at rbp-0x8.

add_6

mov edx,DWORD PTR [rbp-0x4]: Move the value of “A” (stored at rbp-0x4) into the edx register. Now that we have saved the arguments to our stack frame the function puts them in the propper registers to perform operations with them.

add_7

mov eax,DWORD PTR [rbp-0x8]: Move the value of “B” (stored at rbp-0x8) into the eax register.

add_8

add eax,edx: Add the values in eax and edx, and store the result in eax. This is the result of the add function.

add_9

pop rbp: Restore the previous base pointer (rbp) from the stack, effectively cleaning up the stack frame. Notice rsp went back to the function’s stack frame.

main_11

ret: Return from the function. The return address is popped from the stack, and execution continues from there. Now we are back to the main function. Anything below rsp is consider garbage and if another function is called it can overwrite it.

main_12

mov DWORD PTR [rbp-0xc],eax: Store the result of the add function (which is now in eax) into the memory location at rbp-0xc. This corresponds to the variable “result” in the C code.

main_13

cmp DWORD PTR [rbp-0xc],0xb: Compare the value of result (stored at rbp-0xc) with 11 (0xb in hexadecimal). It does this by performing a substruction. Think of it as using the “sub” instruction but without overriding any of the variables. Based on this comparison our flags will be created which helps with the jump instructions further on.

main_14

jle 0x555555555170 : If the result is less than or equal to 11, jump to the address 0x555555555170, which corresponds to the code returning 0.

main_15

mov eax,DWORD PTR [rbp-0xc]: Since the result was greater than 11, we move the value of result into the eax register, which is used to store the return value of the main function.

main_16

jmp 0x555555555175 : Jump to the instruction at 0x555555555175, which is the function epilogue for cleanup and return. The “mov eax,0x0” was the instruction that would have been executed if the result was less or equal than 12. Storing a 0 in our eax as the code in C had a “return 0;” if the condition wasn’t met.

main_17

leave: This instruction is a shorthand for mov rsp, rbp and pop rbp. It cleans up the stack frame by restoring the stack pointer (rsp) and base pointer (rbp). Then we return from the main function. The return address is popped from the stack, and the program execution continues from there.

previous arrowprevious arrow
next arrownext arrow
 

And there you have it! You’ve taken your first steps into the fascinating world of assembly language. From registers and stack frames to the detailed mechanics of function calls, you’ve seen what really goes on under the hood of your programs.

Understanding assembly isn’t just for creating low-level magic—it’s about appreciating the intricacies of how your code actually runs. It’s like knowing how the gears and springs work in a watch; you get a deeper sense of the precision and complexity involved.

So, next time you write code in a high-level language, you’ll have a newfound respect for the tiny instructions and the diligent registers making it all happen. Whether you’re optimizing performance or just curious, assembly language gives you a closer look at the true nuts and bolts of computing.

Thanks for joining me on this journey through the basics of assembly. Keep exploring, stay curious, and happy coding!