Dynamic taint analysis

“Tracking interesting things”

Idea:

label info with tags (trusted/untrusted, interesting/boring, public/secret)
control how data and labels propagate:
- when copying data, also copy flag
- clean a tag when you know the associated data is no longer “untrusted”
- policies to check for interesting/unsafe usage of tainted data

Access policies:

Preventing leakage of classified data
- Bell-LaPadula: no read up, no write down
Preserve integrity
- Biba: no read down, no write up

Tainting to detect attacks

taint all data from network as tainted
check whether tainted values influence control flow
- raise alert when a return instruction is executed with tainted address
- also raise alert on
  - other calls/jumps made with tainted addresses
  - calls, rets, jumps that are made to tainted instructions

For exploits:

let’s say you have arbitrary write
taint all data in memory, then observe whether tainted data makes it to argument of stuff like execve

Questions for tainting:

What to taint

For control of information flow, taint everything.

For attack detection, taint everything from untrusted source, and see if it ends up where it shouldn’t.

For binary analysis, taint anything possible, like data typed by user and config files.

For privacy breaches: taint privacy sensitive data, like passwords and credit card number.

For vulnerability detection, taint everything that attacker can control.

Generally, these rules hold:

Cleaning the taint:

Propagating:

on direct moves
maybe on arithmetic operations
what about implicit flows (variable that’s determined by tainted value, but not set directly)
- in attack detection, mostly not, which works OK
- in leakage detection, also not, but this is not fine because malware can launder taint and escape detection
what about pointer tainting – e.g. if you use a tainted value to index a toupper() table, it’s the result is tainted, but it might not for some other table

check whether address in ret/jump/call is not tainted (cannot deal with non-control flow diverting attacks)
mark secret data as tainted, keep track of it to see whether it doesn’t leave the app (but do we propagate on pointers and implicit flows?)
reverse engineer structure of config/input file, mark as tainted, monitor arguments of syscalls for file names and IPs, and monitor arguments of strcmp
taint all writable data in memory, observe whether tainted data makes it to arguments of syscalls like execve