Pwn.College- Computing 101

Your First Register

  • Registers = containers for data. Data can be moved around registers
  • In general, there are 10 to 20 general purpose (GP) registers that can be used by code and other registers used for special purposes
  • In x86_64 (64bit) arch, programs can access 16 GP registers.
  • Lets talk about rax, a single x86 register that can store a small amt of data
  • if we want to store the value of 1337 into rax
mov rax, 1337  # destination, source

This challenge requires us to write a programme to move the value 60 into rax. the programme will have the extension .s (typical extension for assembly)

mov rax,60

When we run it against /challenge/check, we get a segmentation fault! This is because after moving the value there was no instructions for it stop execution but since there are no more instructions and there is no stop, it crashes.

First Syscalls

Instructions such as mov allows your program to interact with the CPU while syscall allows program to interact with the OS via the CPU. Each syscall has its own number starting from 0 and can be invoked in the following manner

mov rax, 42
syscall  # syscall 42

The exit syscall is 60

mov rax, 60
syscall

Exit Codes

Every program exits with an exit code upon termination. We can pass parameters to the exit system call through registers. Syscall can take multiple parameters but the exit only takes one; the exit code.

The first parameter to a syscall is passed via the rdi register

mov rdi, 42  # set program exit code
mov rax, 60 # set syscall exit
syscall

Building Executables***

To build a binary:

  1. Write assembly in a file (often with .s)
  2. Assemble binary into executable object file (using as command)
  3. Link one or more executable object files into a final executable binary (using ld command)
.intel_syntax noprefix # tell the assembler that we are using intel asm syntax
.global _start 
_start:
mov rdi, 42
mov rax, 60
syscall

save the above as asm.s. Now we assemble the code

as -o asm.o asm.s # creates assembled binary code asm.o

Typically you get a lot of of object files after assembling the code but in this case, there is only one. Still we need to link it to create the final executable

ld -o exe asm.o

Moving Between Registers

  • rsi is another register to store data
  • we can “move” data between registers. In the below example, we set 42 to rsi and then set rdi to the value in rsi which is 42. both rsi and rdi have the value of 42 now.
mov rsi,42
mov rdi,rsi

In this challenge, a secret value is stored in rsi and to pass the challenge the program must exit with that value. since the program will take the value inside rdi as the exit code, we need to set the rdi value to the value in rsi

.intel_syntax noprefix
.global _start
_start:
mov rdi,rsi 
mov rax,60
syscall

Software Introspection

Tracing Syscalls

  • use the syscall tracer strace
  • strace will use the Linux OS to introspect and record every invoked syscall and its results
  • when we strace our previous program, it reports two system calls. The first is the execve call which starts a new program while the other is the exit
  • The alarm (syscall 37) sets a timer in the OS and when that timer (in terms of seconds) pass, Linux will terminate the program; alarm is often used to kill the program should it be frozen
  • For this challenge, we will strace the /challenge/trace-me and figure to what value it possess as a parameter to the alarm system call

GDB (GNU Debugger)

  • tool to monitor and introspect a process. gdb is the most common tool in the linux space
gdb <binary>
(gdb) starti # this will debug the program from the first instruction

Memory

We previously learnt about setting values into registers but we can also set values by providing a memory address. Lets say the address 31337 has the content of 42. To access the content at the memory address 31337, we can

mov rdi, [31337]

For this challenge a secret value is stored in the address [133700]

.intel_syntax noprefix
.global _start
_start:
mov rdi, [133700]
mov rax, 60
syscall

Dereferencing Pointers

Typically, mem addr are stored in registers, and we use the values in the registers to point to data in memory. Say for example

Address | Contents
133700  | 42

# consider the below
mov rax, 133700

Register | Contents
rax      | 133700

# rax now holds a value that corresponds with the address of the data we want to load. The below references the value that is stored in the mem addr of 133700. rax is a pointer in this case

mov rdi, [rax]

When rax (or any other registers) is used in lieu of directly specifying the address, we call this dereferencing the pointer. We dereference rax to load the data it points to (which is 42) into rdi.

We dont have to specifically use rax as these are general purpose registers.

In this challenge, rax contain the address of the secret data we’ve stored in memory. Dereference rax to the secret data into rdi and use it as the exit code of the program to get the flag!

.intel_syntax noprefix
.global _start
_start:
mov rdi, [rax]
mov rax, 60
syscall

Dereferencing yourself

Previously, we dereferenced rax to move data into rdi. rax is arbitrary and technically could be replaced with anything. Hell even rdi itself!

For e.g.

mov [133700] , 42 # set value of 42 into the memory addr of 133700
mov rax, 133700 # rax now has the value of 133700
mov rax, [rax] # rax now has the value of 42

In this challenge, you’ll explore this concept. Rather than initializing rax, as before, we’ve made rdi the pointer to the secret value! You’ll need to dereference it to load that value into rdi, then exit with that value as the exit code. Good luck!

.intel_syntax noprefix
.global _start
_start:
mov rdi, [rdi]
mov rax, 60
syscall

De-referencing with Offsets

Sometimes, pointers dont always point directly at the data you need. They may point to a collection of data and you may only need like a specific data. For e.g ( let say the rdi pointers point towards a bunch of numbers in sequence in memory):

Address | Contents
133700  | 50
133701  | 42
133702  | 99
133703  | 14

Register| Contents
rdi     | 133700

if i want the contents of 42, i could do this

mov rax, [rdi+1] 

Each mem address represents a byte of memory. We can access address 133701 by adding one byte to the address 133700. In memory terms , this is called an offset and thus this is an offset of 1 from the address pointed to by rdi

As before, we will initialize rdi to point at the secret value, but not directly at it. This time, the secret value will have an offset of 8 bytes from where rdi points, something analogous to this

    Address │ Contents
  +────────────────────+
┌▸│ 31337   │ 0        │
│ │ 31337+1 │ 0        │
│ │ 31337+2 │ 0        │
│ │ 31337+3 │ 0        │
│ │ 31337+4 │ 0        │
│ │ 31337+5 │ 0        │
│ │ 31337+6 │ 0        │
│ │ 31337+7 │ 0        │
│ │ 31337+8 │ ???      │
│ +────────────────────+
│
└────────────────────────┐
                         │
   Register │ Contents   │
  +────────────────────+ │
  │ rdi     │ 31337    │─┘
  +────────────────────+

Answer:

.intel_syntax noprefix
.global _start
_start:
mov rdi, [rdi+8]
mov rax, 60
syscall

Stored Addresses

mov rdi , 123400 # rdi now has a value of 123400
mov rdi, [rdi] # rdi now has a value of whatever that is inside the mem address of 123400 (which in this case is 133700)

mov rax, [rdi] # rdi now has a value of whatever that is inside the mem address of 133700 (which is 42)

For now, let’s practice dereferencing an address stored in memory. I’ll store a secret value at a secret address, then store that secret address at the address 567800. You must read the address, dereference it, get the secret value, and then exit with it as the exit code. You got this!

.intel_syntax noprefix
.global _start
_start:
mov rdi, [567800]
mov rdi, [rdi]
mov rax, 60
syscall

Double Dereference

Let’s put those last two together. In this challenge, we stored our SECRET_VALUE in memory at the address SECRET_LOCATION_1, then stored SECRET_LOCATION_1 in memory at the address SECRET_LOCATION_2. Then, we put SECRET_LOCATION_2 into rax! The result looks something like this, using 123400 for SECRET_LOCATION_1 and 133700 for SECRET_LOCATION_2 (not, in the real challenge, these values will be different and hidden from you!):

      Address │ Contents
    +────────────────────+
┌──▸│ 133700  │ 123400   │─┐
│   +────────────────────+ │
│ ┌▸│ 123400  │ 42       │ │
│ │ +────────────────────+ │
│ └────────────────────────┘
└──────────────────────────┐
                           │
     Register │ Contents   │
    +────────────────────+ │
    │ rax     │ 133700   │─┘
    +────────────────────+

Here, you will need to perform two memory reads: one dereferencing rax to read SECRET_LOCATION_1 from the location that rax is pointing to (which is SECRET_LOCATION_2), and the second one dereferencing whatever register now holds SECRET_LOCATION_1 to read SECRET_VALUE into rdi, so you can use it as the exit code!

Answer

.intel_syntax noprefix
.global _start
_start:
mov rdi, [rax]
mov rdi, [rdi]
mov rax, 60
syscall

Triple Dereference

        Address │ Contents
      +────────────────────+
┌────▸│ 133700  │ 123400   │───┐
│     +────────────────────+   │
│ ┌──▸│ 123400  │ 100000   │─┐ │
│ │   +────────────────────+ │ │
│ │ ┌▸│ 100000  │ 42       │ │ │
│ │ │ +────────────────────+ │ │
│ │ └────────────────────────┘ │
│ └────────────────────────────┘
└──────────────────────────────┐
                               │
       Register │ Contents     │
      +────────────────────+   │
      │ rdi     │ 133700   │───┘
      +────────────────────+
.intel_syntax noprefix
.global _start
_start:
mov rdi, [rdi] # rdi is now the value inside [133700] (lets call it x)
mov rdi, [rdi] # rdi is now the value inside x (lets call it y)
mov rdi, [rdi] # rdi is now the value inside y
mov rax, 60
syscall

Hello Hackers🙄

Here we introduce the write syscall

Writing Output

The Write syscall is 1. It takes parameters on what data to write and where to write it to.

To recap, the below is the concept of File Descriptors (FD):

  • FD 0: Stdin > Channel which the process takes input
  • FD 1: StdOut > channel which process output normal data
  • FD 2: StdErr > channel which process output error details

In the write syscall, the above is how you specify where to write the data to.

In write syscall, if you want to write to stdout, you need to set rdi to 1. for stderr, rdi is set to 2.

syscalls are generally computationally expensive and we dont want to keep invoking a write syscall to write a long sentence. hence the solution is to pass parameters on where to start writing and how many characters to write

write(FD, mem_addr, num_of_chars_to_write)

So how we do specify these parameters?

mov rdi, 1 # FD is 1 in this case
mov rsi, 133700 # rsi by convention takes the second parameter to syscalls. we pass the mem_addr of 133700 to rsi
mov rdx, 10 # rdx by convention takes the third parameter. we pass 10 chars
mov rax, 1 # write syscall is 1
syscall

Similar to before, we wrote a single secret character value into memory at address 1337000. Call write to that single character (for now! We’ll do multiple-character writes later) value onto standard out, and we’ll give you the flag!

.intel_syntax noprefix
.global _start
_start:
mov rdi, 1
mov rsi, 1337000
mov rdx, 1
mov rax, 1
syscall

The program executes and print “H” but crashes because there is no exit syscall!

Chaining Syscalls

Chaining is straightforward. Just do the write and then the exit syscall

.intel_syntax noprefix
.global _start
_start:
mov rdi, 1
mov rsi, 1337000
mov rdx, 1
mov rax, 1
syscall
mov rdi, 42 # set exit code to 42
mov rax, 60
syscall

Writing Strings

.intel_syntax noprefix
.global _start
_start:
mov rdi, 1
mov rsi, 1337000
mov rdx, 14
mov rax, 1
syscall
mov rdi, 42 # set exit code to 42
mov rax, 60
syscall

Reading Data

read (syscall number 0) is a syscall that “reads” data from stdin. It has the same parameter set as write

read(0,1337000, 5)

This reads five bytes from the stdin (FD 0 ) into memory starting from 1337000. If we type in HELLO HACKERS into stdin, the above read call would result in the following memory config:

  Address │ Contents (contents are in HEXA)
+────────────────────+
│ 1337000 │ 48       │
│ 1337001 │ 45       │
│ 1337002 │ 4c       │
│ 1337003 │ 4c       │
│ 1337004 │ 51       │
+────────────────────+

In this level, we will combine read with our previous write abilities. Your program should:

  1. first read 8 bytes from stdin to address 1337000
  2. then write those 8 bytes from address 1337000 to stdout
  3. finally, exit with the exit code 42.
.intel_syntax noprefix
.global _start
_start:
//read syscall code
mov rdi, 0  //fd stdin
mov rsi, 1337000
mov rdx, 8
mov rax, 0 //syscall code 0 which is read
syscall
//write syscall code
mov rdi, 1 //fd stdout
mov rsi, 1337000
mov rdx, 8
mov rax, 1 //syscall code 1 which is write
syscall
//exit syscall code
mov rdi, 42
mov rax, 60
syscall

Assembly Crash Course

To interact with any levels, you can either run the challenges with an ELF as an argument /challenge/run <elf file> or send raw bytes over stdin to this program

Building an ELF program

  • same as the exe and write out assembly
  • but compile with gcc
gcc -nostdlib -o <elf> reg.s

To disassemble the program

objdump -M intel -d <elf file>

set-register

Set rdi to 0x1337

.intel_syntax noprefix
.global _start
_start:
mov rdi, 0x1337
mov rax, 60
syscall

set-multiple registers

.intel_syntax noprefix
.global _start
_start:
mov rax, 0x1337
mov r12, 0xCAFED00D1337BEEF
mov rsp, 0x31337

Add to Register

Set some value in memory dynamically before each run. Thus on each run, the values will change. This means you will need to perform some formulaic operation with registers.

Many instructions exist in x86 that allows you to perfrom all the normal math ops on registers and monitors. Some useful instructions include

add reg1, reg2  //same as reg1 += reg2
sub reg1, reg2 // same as reg -= reg2
imul reg1, reg2 // same as reg1 *=reg2

// any of the regX registers can be replaced by a constant or memory location

In this challenge, we want to add 0x331337 to rdi

add rdi, 0x331337

Linear-equation-registers


    f(x) = mx + b, where:
        m = rdi
        x = rsi
        b = rdx

Place the result into rax.

mul (unsigned multiply) and imul (signed multiply) are different in terms of which registers are used. We will want to use imul in this case

imul rdi, rsi
add rdi, rdx
mov rax, rdi

integer-division

Division in x86 is more special than in normal math.

div reg2

//when we execute div reg2, the following happens:
rax = rdx:rax / reg
rdx = remainder

rdx:rax means that rdx is the upper 64-bits of the 128 bit dividend and rax is the lower 64-bit of the 128 bit dividend. Hence it is important to know what is in rdx and rax before you call div.

The challenge is to compute the following:

speed = distance / time

distance = rdi //at most distance is a 64-bit value and thus rdx is 0 when dividing

time = rsi

speed = rax

mov rax, rdi  //set rax to distance
div rsi  // divide whatever that is in rax (which is distance) by rsi. Output is stored in rax which is the speed

To get the modulus

mov rax, rdi
div rsi
mov rax, rdx  // remainder is stored in rdx after the div operation