Dynamic taint analysis
“Tracking interesting things”
Idea:
- label info with tags (trusted/untrusted, interesting/boring, public/secret)
- control how data and labels propagate:
- when copying data, also copy flag
- clean a tag when you know the associated data is no longer “untrusted”
- policies to check for interesting/unsafe usage of tainted data
Access policies:
- Preventing leakage of classified data
- Bell-LaPadula: no read up, no write down
- Preserve integrity
- Biba: no read down, no write up
Tainting to detect attacks
- taint all data from network as tainted
- check whether tainted values influence control flow
- raise alert when a return instruction is executed with tainted address
- also raise alert on
- other calls/jumps made with tainted addresses
- calls, rets, jumps that are made to tainted instructions
For exploits:
- let’s say you have arbitrary write
- taint all data in memory, then observe whether tainted data makes it to argument of stuff like execve
Questions for tainting:
- what to taint?
- how to propagate taint, and how to clean it?
- how to use taint?
- track bits, bytes, words, blocks…in single color or multiple colors?
- tainting boundaries – only registers, or also memory? what about disk?
What to taint
For control of information flow, taint everything.
For attack detection, taint everything from untrusted source, and see if it ends up where it shouldn’t.
For binary analysis, taint anything possible, like data typed by user and config files.
For privacy breaches: taint privacy sensitive data, like passwords and credit card number.
For vulnerability detection, taint everything that attacker can control.
Taint propagation
Generally, these rules hold:
- untainted + untainted = untainted
- untainted + tainted = tainted
- tainted + tainted = tainted
Cleaning the taint:
- when storing constant in a destination
- maybe with MMX or floating point instructions
Propagating:
- on direct moves
- maybe on arithmetic operations
- what about implicit flows (variable that’s determined by tainted value, but not set directly)
- in attack detection, mostly not, which works OK
- in leakage detection, also not, but this is not fine because malware can launder taint and escape detection
- what about pointer tainting – e.g. if you use a tainted value to index a toupper() table, it’s the result is tainted, but it might not for some other table
Using the taint
- check whether address in ret/jump/call is not tainted (cannot deal with non-control flow diverting attacks)
- mark secret data as tainted, keep track of it to see whether it doesn’t leave the app (but do we propagate on pointers and implicit flows?)
- reverse engineer structure of config/input file, mark as tainted, monitor arguments of syscalls for file names and IPs, and monitor arguments of strcmp
- taint all writable data in memory, observe whether tainted data makes it to arguments of syscalls like execve