Symbolic execution

Execute programs with symbols instead of values

program state (memory, registers, etc.) is symbolic and bound by constraints
- have set of possible values
- subject to constraints, i.e. mathematical restrictions on allowed values
- symbolic variables/expressions are immutable
perform all arithmetic & boolean operations symbolically
- output symbolic expressions encoding operation performed
fork state when branch depends on symbol
- unconditional branch – direct control flow
- concrete conditional branch – follow condition
- symbolic conditional branch – take both paths
  - fork program state and set IP to either path
  - add appropriate branch condition as path constraint
  - eventually explores all possible paths
solve constraints to get values when needed (‘concretization’)

Problems with symbolic execution:

state explosion: number of possible paths in a program is huge, increases with number of branches
dealing with I/O: do nothing? output?
modelling environment and hardware (and simulated hardware isn’t the same)
constraint solving: not all constraints are linear, NP-complete

Basic optimisations:

Concolic (concrete + symbolic) execution:

repeat:
- dynamic analysis to capture trace for concrete run
- symbolic execution on traced instructions, yielding path constraints
- dynamic analysis to capture trace for concrete run
- symbolic execution on traced instructions, yielding path constraints
- adjust constraints by flipping last encountered symbolic branch
- solve constraints of flipped branch, yielding inputs for new path
advantages: no state explosion so much faster, symbolize only what you need
drawbacks: need good initial input, can only flip branches controlled by symbolic variables, constraint solving is still hard

SMT solvers: Z3, CVC4, STP

SymbEx engines: KLEE, S2E, Angr, Triton

Binary and Malware Analysis