lossless encoding of characters, the ones that occur more frequently have a shorter encoding
e.g. encode “mississippi”
to generate a tree:
Build a table of values with frequencies:
m | i | s | p |
1 | 4 | 4 | 2 |
Repeat until tree is complete: take two smallest nodes from list, create a new node that is sum of two smallest, add the two smallest as children (right > left), then add the parent node back into the list.
Assign weights to left-right branches — left is 0, right is 1.
Traverse tree to generate a code for each character:
letter | code |
---|---|
a | 1100 |
b | 1101 |
c | 100 |
d | 101 |
e | 111 |
f | 0 |
with a min-heap-based min-priority queue, in O(nlogn)