Binary and Malware Analysis

Table of Contents

Parser identification in embedded systems

Data flow graph: graph representing data dependencies between operations

Create DFG: transform code to SSA (static single assignment), then draw graph from that. you get a partial ordering of operations.

Solution:

  1. Transform binary code to another representation for better reasoning
    • LLVM intermediate language: data flow tracking is possible, SSA form
    • recursive disassembler
    • data flow normalization: model memory as array, replace QEMU load/store with LLVM load/store, detect access to stack and make them SSA
  2. Use heuristics to select best candidates for parser functions
    • compute score according to each feature
    • combine (weighted sum) scores to single score, per function
    • heuristics: looped switch statement, data flow analysis on conditional statements, and others