Binary and Malware Analysis

Table of Contents

Tracking control flow

Explore execution paths in the program.

Control flow graph (CFG) of each function:

Call graph

Basic block: maximal sequence of instructions that execute one-by-one in order

Control flow graph:

Call graph

Identifying a function

Problems

You should:

Loops

Loop: subgraph of CFG that is strongly connected (path from any node to any other node), single entry

Two loops are either disjoint, or one is completely nested in the other.

Identifying loops:

Natural loops: single head node

Dominator set D(i) = {i} ∪ (intersection for p ∈ Pred(i) of D(p)), where Pred is set of immediate predecessors

Dominator tree: each node’s children are nodes it immediately dominates

Computing loops:

  1. compute dominator info, store it in a dominator tree
  2. identify back edges
  3. construct natural loops corresponding to back edges

Data-flow analysis

Flow sensitivity

Context-sensitive analysis considers calling context when analyzing target of function call

Reaching definitions

Which data definitions can reach a point in the program.

Definition-Use chains: definition of variable and all uses, reachable from that definition without other intervening definitions

Use-definitions chains: use of variable and all definitions of that variable, that can reach that use without other intervening definitions