Pwn.College- Computing 101
Your First Register
- Registers = containers for data. Data can be moved around registers
- In general, there are 10 to 20 general purpose (GP) registers that can be used by code and other registers used for special purposes
- In x86_64 (64bit) arch, programs can access 16 GP registers.
- Lets talk about rax, a single x86 register that can store a small amt of data
- if we want to store the value of 1337 into rax
mov rax, 1337 # destination, source
This challenge requires us to write a programme to move the value 60 into rax. the programme will have the extension .s (typical extension for assembly)
mov rax,60
When we run it against /challenge/check, we get a segmentation fault! This is because after moving the value there was no instructions for it stop execution but since there are no more instructions and there is no stop, it crashes.
First Syscalls
Instructions such as mov allows your program to interact with the CPU while syscall allows program to interact with the OS via the CPU. Each syscall has its own number starting from 0 and can be invoked in the following manner
mov rax, 42
syscall # syscall 42
The exit syscall is 60
mov rax, 60
syscall
Exit Codes
Every program exits with an exit code upon termination. We can pass parameters to the exit system call through registers. Syscall can take multiple parameters but the exit only takes one; the exit code.
The first parameter to a syscall is passed via the rdi register
mov rdi, 42 # set program exit code
mov rax, 60 # set syscall exit
syscall
Building Executables***
To build a binary:
- Write assembly in a file (often with .s)
- Assemble binary into executable object file (using as command)
- Link one or more executable object files into a final executable binary (using ld command)
.intel_syntax noprefix # tell the assembler that we are using intel asm syntax
.global _start
_start:
mov rdi, 42
mov rax, 60
syscall
save the above as asm.s. Now we assemble the code
as -o asm.o asm.s # creates assembled binary code asm.o
Typically you get a lot of of object files after assembling the code but in this case, there is only one. Still we need to link it to create the final executable
ld -o exe asm.o
Moving Between Registers
- rsi is another register to store data
- we can “move” data between registers. In the below example, we set 42 to rsi and then set rdi to the value in rsi which is 42. both rsi and rdi have the value of 42 now.
mov rsi,42
mov rdi,rsi
In this challenge, a secret value is stored in rsi and to pass the challenge the program must exit with that value. since the program will take the value inside rdi as the exit code, we need to set the rdi value to the value in rsi
.intel_syntax noprefix
.global _start
_start:
mov rdi,rsi
mov rax,60
syscall
Software Introspection
Tracing Syscalls
- use the syscall tracer strace
- strace will use the Linux OS to introspect and record every invoked syscall and its results
- when we strace our previous program, it reports two system calls. The first is the execve call which starts a new program while the other is the exit
- The alarm (syscall 37) sets a timer in the OS and when that timer (in terms of seconds) pass, Linux will terminate the program; alarm is often used to kill the program should it be frozen
- For this challenge, we will strace the /challenge/trace-me and figure to what value it possess as a parameter to the alarm system call
GDB (GNU Debugger)
- tool to monitor and introspect a process. gdb is the most common tool in the linux space
gdb <binary>
(gdb) starti # this will debug the program from the first instruction
Memory
We previously learnt about setting values into registers but we can also set values by providing a memory address. Lets say the address 31337 has the content of 42. To access the content at the memory address 31337, we can
mov rdi, [31337]
For this challenge a secret value is stored in the address [133700]
.intel_syntax noprefix
.global _start
_start:
mov rdi, [133700]
mov rax, 60
syscall
Dereferencing Pointers
Typically, mem addr are stored in registers, and we use the values in the registers to point to data in memory. Say for example
Address | Contents
133700 | 42
# consider the below
mov rax, 133700
Register | Contents
rax | 133700
# rax now holds a value that corresponds with the address of the data we want to load. The below references the value that is stored in the mem addr of 133700. rax is a pointer in this case
mov rdi, [rax]
When rax (or any other registers) is used in lieu of directly specifying the address, we call this dereferencing the pointer. We dereference rax to load the data it points to (which is 42) into rdi.
We dont have to specifically use rax as these are general purpose registers.
In this challenge, rax
contain the address of the secret data we’ve stored in memory. Dereference rax
to the secret data into rdi
and use it as the exit code of the program to get the flag!
.intel_syntax noprefix
.global _start
_start:
mov rdi, [rax]
mov rax, 60
syscall
Dereferencing yourself
Previously, we dereferenced rax to move data into rdi. rax is arbitrary and technically could be replaced with anything. Hell even rdi itself!
For e.g.
mov [133700] , 42 # set value of 42 into the memory addr of 133700
mov rax, 133700 # rax now has the value of 133700
mov rax, [rax] # rax now has the value of 42
In this challenge, you’ll explore this concept. Rather than initializing rax
, as before, we’ve made rdi
the pointer to the secret value! You’ll need to dereference it to load that value into rdi
, then exit
with that value as the exit code. Good luck!
.intel_syntax noprefix
.global _start
_start:
mov rdi, [rdi]
mov rax, 60
syscall
De-referencing with Offsets
Sometimes, pointers dont always point directly at the data you need. They may point to a collection of data and you may only need like a specific data. For e.g ( let say the rdi pointers point towards a bunch of numbers in sequence in memory):
Address | Contents
133700 | 50
133701 | 42
133702 | 99
133703 | 14
Register| Contents
rdi | 133700
if i want the contents of 42, i could do this
mov rax, [rdi+1]
Each mem address represents a byte of memory. We can access address 133701 by adding one byte to the address 133700. In memory terms , this is called an offset and thus this is an offset of 1 from the address pointed to by rdi
As before, we will initialize rdi
to point at the secret value, but not directly at it. This time, the secret value will have an offset of 8 bytes from where rdi
points, something analogous to this
Address │ Contents
+────────────────────+
┌▸│ 31337 │ 0 │
│ │ 31337+1 │ 0 │
│ │ 31337+2 │ 0 │
│ │ 31337+3 │ 0 │
│ │ 31337+4 │ 0 │
│ │ 31337+5 │ 0 │
│ │ 31337+6 │ 0 │
│ │ 31337+7 │ 0 │
│ │ 31337+8 │ ??? │
│ +────────────────────+
│
└────────────────────────┐
│
Register │ Contents │
+────────────────────+ │
│ rdi │ 31337 │─┘
+────────────────────+
Answer:
.intel_syntax noprefix
.global _start
_start:
mov rdi, [rdi+8]
mov rax, 60
syscall
Stored Addresses
mov rdi , 123400 # rdi now has a value of 123400
mov rdi, [rdi] # rdi now has a value of whatever that is inside the mem address of 123400 (which in this case is 133700)
mov rax, [rdi] # rdi now has a value of whatever that is inside the mem address of 133700 (which is 42)
For now, let’s practice dereferencing an address stored in memory. I’ll store a secret value at a secret address, then store that secret address at the address 567800
. You must read the address, dereference it, get the secret value, and then exit
with it as the exit code. You got this!
.intel_syntax noprefix
.global _start
_start:
mov rdi, [567800]
mov rdi, [rdi]
mov rax, 60
syscall
Double Dereference
Let’s put those last two together. In this challenge, we stored our SECRET_VALUE
in memory at the address SECRET_LOCATION_1
, then stored SECRET_LOCATION_1
in memory at the address SECRET_LOCATION_2
. Then, we put SECRET_LOCATION_2
into rax
! The result looks something like this, using 123400
for SECRET_LOCATION_1
and 133700
for SECRET_LOCATION_2
(not, in the real challenge, these values will be different and hidden from you!):
Address │ Contents
+────────────────────+
┌──▸│ 133700 │ 123400 │─┐
│ +────────────────────+ │
│ ┌▸│ 123400 │ 42 │ │
│ │ +────────────────────+ │
│ └────────────────────────┘
└──────────────────────────┐
│
Register │ Contents │
+────────────────────+ │
│ rax │ 133700 │─┘
+────────────────────+
Here, you will need to perform two memory reads: one dereferencing rax
to read SECRET_LOCATION_1
from the location that rax
is pointing to (which is SECRET_LOCATION_2
), and the second one dereferencing whatever register now holds SECRET_LOCATION_1
to read SECRET_VALUE
into rdi
, so you can use it as the exit code!
Answer
.intel_syntax noprefix
.global _start
_start:
mov rdi, [rax]
mov rdi, [rdi]
mov rax, 60
syscall
Triple Dereference
Address │ Contents
+────────────────────+
┌────▸│ 133700 │ 123400 │───┐
│ +────────────────────+ │
│ ┌──▸│ 123400 │ 100000 │─┐ │
│ │ +────────────────────+ │ │
│ │ ┌▸│ 100000 │ 42 │ │ │
│ │ │ +────────────────────+ │ │
│ │ └────────────────────────┘ │
│ └────────────────────────────┘
└──────────────────────────────┐
│
Register │ Contents │
+────────────────────+ │
│ rdi │ 133700 │───┘
+────────────────────+
.intel_syntax noprefix
.global _start
_start:
mov rdi, [rdi] # rdi is now the value inside [133700] (lets call it x)
mov rdi, [rdi] # rdi is now the value inside x (lets call it y)
mov rdi, [rdi] # rdi is now the value inside y
mov rax, 60
syscall
Hello Hackers🙄
Here we introduce the write syscall
Writing Output
The Write syscall is 1. It takes parameters on what data to write and where to write it to.
To recap, the below is the concept of File Descriptors (FD):
- FD 0: Stdin > Channel which the process takes input
- FD 1: StdOut > channel which process output normal data
- FD 2: StdErr > channel which process output error details
In the write syscall, the above is how you specify where to write the data to.
In write syscall, if you want to write to stdout, you need to set rdi to 1. for stderr, rdi is set to 2.
syscalls are generally computationally expensive and we dont want to keep invoking a write syscall to write a long sentence. hence the solution is to pass parameters on where to start writing and how many characters to write
write(FD, mem_addr, num_of_chars_to_write)
So how we do specify these parameters?
mov rdi, 1 # FD is 1 in this case
mov rsi, 133700 # rsi by convention takes the second parameter to syscalls. we pass the mem_addr of 133700 to rsi
mov rdx, 10 # rdx by convention takes the third parameter. we pass 10 chars
mov rax, 1 # write syscall is 1
syscall
Similar to before, we wrote a single secret character value into memory at address 1337000
. Call write
to that single character (for now! We’ll do multiple-character writes later) value onto standard out, and we’ll give you the flag!
.intel_syntax noprefix
.global _start
_start:
mov rdi, 1
mov rsi, 1337000
mov rdx, 1
mov rax, 1
syscall
The program executes and print “H” but crashes because there is no exit syscall!
Chaining Syscalls
Chaining is straightforward. Just do the write and then the exit syscall
.intel_syntax noprefix
.global _start
_start:
mov rdi, 1
mov rsi, 1337000
mov rdx, 1
mov rax, 1
syscall
mov rdi, 42 # set exit code to 42
mov rax, 60
syscall
Writing Strings
.intel_syntax noprefix
.global _start
_start:
mov rdi, 1
mov rsi, 1337000
mov rdx, 14
mov rax, 1
syscall
mov rdi, 42 # set exit code to 42
mov rax, 60
syscall
Reading Data
read (syscall number 0) is a syscall that “reads” data from stdin. It has the same parameter set as write
read(0,1337000, 5)
This reads five bytes from the stdin (FD 0 ) into memory starting from 1337000. If we type in HELLO HACKERS into stdin, the above read call would result in the following memory config:
Address │ Contents (contents are in HEXA)
+────────────────────+
│ 1337000 │ 48 │
│ 1337001 │ 45 │
│ 1337002 │ 4c │
│ 1337003 │ 4c │
│ 1337004 │ 51 │
+────────────────────+
In this level, we will combine read
with our previous write
abilities. Your program should:
- first
read
8 bytes from stdin to address1337000
- then
write
those 8 bytes from address1337000
to stdout - finally, exit with the exit code
42
.
.intel_syntax noprefix
.global _start
_start:
//read syscall code
mov rdi, 0 //fd stdin
mov rsi, 1337000
mov rdx, 8
mov rax, 0 //syscall code 0 which is read
syscall
//write syscall code
mov rdi, 1 //fd stdout
mov rsi, 1337000
mov rdx, 8
mov rax, 1 //syscall code 1 which is write
syscall
//exit syscall code
mov rdi, 42
mov rax, 60
syscall
Assembly Crash Course
To interact with any levels, you can either run the challenges with an ELF as an argument /challenge/run <elf file> or send raw bytes over stdin to this program
Building an ELF program
- same as the exe and write out assembly
- but compile with gcc
gcc -nostdlib -o <elf> reg.s
To disassemble the program
objdump -M intel -d <elf file>
set-register
Set rdi to 0x1337
.intel_syntax noprefix
.global _start
_start:
mov rdi, 0x1337
mov rax, 60
syscall
set-multiple registers
.intel_syntax noprefix
.global _start
_start:
mov rax, 0x1337
mov r12, 0xCAFED00D1337BEEF
mov rsp, 0x31337
Add to Register
Set some value in memory dynamically before each run. Thus on each run, the values will change. This means you will need to perform some formulaic operation with registers.
Many instructions exist in x86 that allows you to perfrom all the normal math ops on registers and monitors. Some useful instructions include
add reg1, reg2 //same as reg1 += reg2
sub reg1, reg2 // same as reg -= reg2
imul reg1, reg2 // same as reg1 *=reg2
// any of the regX registers can be replaced by a constant or memory location
In this challenge, we want to add 0x331337 to rdi
add rdi, 0x331337
Linear-equation-registers
f(x) = mx + b, where:
m = rdi
x = rsi
b = rdx
Place the result into rax.
mul (unsigned multiply) and imul (signed multiply) are different in terms of which registers are used. We will want to use imul in this case
imul rdi, rsi
add rdi, rdx
mov rax, rdi
integer-division
Division in x86 is more special than in normal math.
div reg2
//when we execute div reg2, the following happens:
rax = rdx:rax / reg
rdx = remainder
rdx:rax means that rdx is the upper 64-bits of the 128 bit dividend and rax is the lower 64-bit of the 128 bit dividend. Hence it is important to know what is in rdx and rax before you call div.
The challenge is to compute the following:
speed = distance / time
distance = rdi //at most distance is a 64-bit value and thus rdx is 0 when dividing
time = rsi
speed = rax
mov rax, rdi //set rax to distance
div rsi // divide whatever that is in rax (which is distance) by rsi. Output is stored in rax which is the speed
To get the modulus
mov rax, rdi
div rsi
mov rax, rdx // remainder is stored in rdx after the div operation