Systems Architecture

Table of Contents

Branch delays

Unconditional branches

the two instructions that are fetched during decode and compute of first instruction (a branch) have to be discarded

this two cycle-delay — “branch penalty”

screenshot.png

to reduce the penalty, branch target address must be computed earlier than pipeline — in the decode stage

this reduces the penalty to one cycle:

screenshot.png

this needs hardware modification — PC has to be incremented in every cycle, and a second adder is needed in decode stage to compute branch target address for every instruction

Conditional branches

branch condition must be tested as early as possible comparator to test condition can be moved to decode stage it would use values from register file outputs A and B directly

Branch delay slot — compiler reorganises instructions

branch delays slot — the location that follows a branch instruction

compiler tries to find an instruction that it always executed, independent of whether or not the program branches

data dependencies must be preserved if the compiler can find a useful instruction, there’s no branch penalty otherwise, it NOPs out and there’s a penalty of one cycle

screenshot.png

Branch prediction

Static:

Dynamic: