Tracking control flow
Explore execution paths in the program.
Control flow graph (CFG) of each function:
- represent potential flow of control inside function
Call graph
- represent potential flow of control between functions
Basic block: maximal sequence of instructions that execute one-by-one in order
Control flow graph:
- nodes are basic blocks with one entry point and one exit point
- directed edges indicate possible control flow
Call graph
- nodes are functions
- directed edges show potential for one function to invoke another
Identifying a function
- ideally:
- set a of basic blocks with single entry point
- reached using a call instruction
- ends with ret instructions
- in reality, might be reached using jump, might share blocks with other functions, might have multiple entries
Problems
- Pointer-based control transfer: conditionally set a function pointer, then call function pointer
- non-returning calls: some functions won’t return, code following call site may not be valid
- non-contiguous code sections: could have jump tables, data, unparsed code, junk bytes…
- tail calls:
return f(x)
- gaps in binary: might contain undiscovered functions
- shared code and multiple entry representation
You should:
- run the program multiple times, observe targets of indirect jumps and calls
- look for function prologue sequences (
push %rbp
, mov %rsp, %rbp
)
Loops
Loop: subgraph of CFG that is strongly connected (path from any node to any other node), single entry
Two loops are either disjoint, or one is completely nested in the other.
Identifying loops:
- dominance: node a dominates b if all paths to b go through a (strictly if a and b are different, immediate if it’s right before)
Natural loops: single head node
Dominator set D(i) = {i} ∪ (intersection for p ∈ Pred(i) of D(p)), where Pred is set of immediate predecessors
Dominator tree: each node’s children are nodes it immediately dominates
Computing loops:
- compute dominator info, store it in a dominator tree
- identify back edges
- construct natural loops corresponding to back edges
Data-flow analysis
Flow sensitivity
- flow-sensitive analysis computes for each program point whatever you want to analyze
- flow-insensitive pointer analysis computes what memory locations pointers may refer to, at any time in program execution
Context-sensitive analysis considers calling context when analyzing target of function call
Reaching definitions
Which data definitions can reach a point in the program.
- For each basic block determine:
- which definitions the block generates (gen set)
- which it kills (kill set)
- from this local solution build global one. determine:
- which definitions from anywhere can reach the start of a basic block
- in[B] = Union for p ∈ pred[B] of out[p]
- which definitions can still be alive after the basic block
- out[B] = gen[B] ∪ (in[B] - kill[B])
- need to solve iteratively
Definition-Use chains: definition of variable and all uses, reachable from that definition without other intervening definitions
- compute: for each definition D, compute chain of uses that D may reach
Use-definitions chains: use of variable and all definitions of that variable, that can reach that use without other intervening definitions
- compute: for each use U, compute chain of definitions that reach U