another cause of pipeline stalls is a delay from memory access
for example, because of a cache miss
instructions:
even if data for load is found in cache, operand forwarding can’t be done the same way — data read from cache are not available until they are in RY at start of cycle 5
subtract must be stalled for one cycle to delay ALU operation
eliminating the one-cycle stall: