Parser identification in embedded systems
Data flow graph: graph representing data dependencies between operations
- directed graph
- fire when input data are ready, consume data from input ports and produce data to output ports
Create DFG: transform code to SSA (static single assignment), then draw graph from that. you get a partial ordering of operations.
Solution:
- Transform binary code to another representation for better reasoning
- LLVM intermediate language: data flow tracking is possible, SSA form
- recursive disassembler
- data flow normalization: model memory as array, replace QEMU load/store with LLVM load/store, detect access to stack and make them SSA
- Use heuristics to select best candidates for parser functions
- compute score according to each feature
- combine (weighted sum) scores to single score, per function
- heuristics: looped switch statement, data flow analysis on conditional statements, and others