Tracking control flow

Explore execution paths in the program.

Control flow graph (CFG) of each function:

Call graph

Basic block: maximal sequence of instructions that execute one-by-one in order

Control flow graph:

Call graph

Identifying a function

ideally:
- set a of basic blocks with single entry point
- reached using a call instruction
- ends with ret instructions
in reality, might be reached using jump, might share blocks with other functions, might have multiple entries

Problems

Pointer-based control transfer: conditionally set a function pointer, then call function pointer
non-returning calls: some functions won’t return, code following call site may not be valid
non-contiguous code sections: could have jump tables, data, unparsed code, junk bytes…
tail calls: return f(x)
gaps in binary: might contain undiscovered functions
shared code and multiple entry representation

You should:

Loop: subgraph of CFG that is strongly connected (path from any node to any other node), single entry

Two loops are either disjoint, or one is completely nested in the other.

Identifying loops:

dominance: node a dominates b if all paths to b go through a (strictly if a and b are different, immediate if it’s right before)

Natural loops: single head node

Dominator set D(i) = {i} ∪ (intersection for p ∈ Pred(i) of D(p)), where Pred is set of immediate predecessors

Dominator tree: each node’s children are nodes it immediately dominates

Computing loops:

Flow sensitivity

flow-sensitive analysis computes for each program point whatever you want to analyze
flow-insensitive pointer analysis computes what memory locations pointers may refer to, at any time in program execution

Context-sensitive analysis considers calling context when analyzing target of function call

Which data definitions can reach a point in the program.

For each basic block determine:
- which definitions the block generates (gen set)
- which it kills (kill set)
from this local solution build global one. determine:
- which definitions from anywhere can reach the start of a basic block
  - in[B] = Union for p ∈ pred[B] of out[p]
- which definitions can still be alive after the basic block
  - out[B] = gen[B] ∪ (in[B] - kill[B])
need to solve iteratively

Definition-Use chains: definition of variable and all uses, reachable from that definition without other intervening definitions

Use-definitions chains: use of variable and all definitions of that variable, that can reach that use without other intervening definitions