,

Contents · Superscalar, out-of-order, reorder buffer


Superscalar front-end and width

  • Superscalar = fetch/decode/rename/issue multiple instructions per cycle (width W).
  • Front-end must supply enough micro-ops: I-cache, BTB, predictor, decoders, uop cache.
  • Back-end must have sufficient functional units (FUs) and bandwidth to sustain W IPC.
Cycle:   F D R I X C   (F=fetch, D=decode, R=rename, I=issue, X=execute, C=commit)
Width:   4 4 4 4 4 4   (idealized steady-state)

Out-of-order execution: rename → dispatch → issue → execute

  • OoO executes ready instructions early while preserving in-order commit.
  • Pipeline: decode → rename → allocate RS/ROB → dispatch → wakeup/select → execute.
  • Speculation on control and memory dependencies; mis-speculation triggers recovery.
Decode  Rename  RS/ROB alloc  Dispatch  Issue  Execute  Writeback  Commit

Reorder Buffer (ROB) and precise exceptions

  • ROB holds in-flight instructions in program order with status/result and destination.
  • Commit walks ROB head in order; exceptions/interrupts handled precisely at commit.
  • Speculative state is contained until commit; recovery via ROB tail reset and rename map restore.
// Conceptual ROB entry structure
class ROBEntry {
  constructor(id, destPRF, ready=false, exception=null) {
    this.id = id; this.destPRF = destPRF;
    this.ready = ready; this.exception = exception;
  }
}

Register renaming and physical register file

  • Rename map targets: architectural → physical registers; removes WAR/WAW.
  • Free list supplies PRFs; old versions reclaimed at commit via ROB-provided mapping.
  • Unified or split PRFs for int/FP; checkpointing assists fast branch recovery.
// Simplified rename map update on destination write
function renameDest(map, freeList, archReg){
  const newPRF = freeList.pop();
  const oldPRF = map[archReg];
  map[archReg] = newPRF;
  return {newPRF, oldPRF}; // oldPRF freed at commit
}

Reservation stations, wakeup-select

  • RS buffers hold operands (or tags) and opcodes near FUs; when operands arrive, entry becomes ready.
  • Wakeup: broadcast tag matches on CDB; Select: choose ready ops per FU with fairness/policies.
  • Scaling challenge: wakeup-select energy and delay rise with window size and width.

Memory ordering, LS queues, fences

  • Load/Store queues track memory ops for forwarding, dependency checks, and ordering.
  • Speculative loads may execute early with memory disambiguation; violations trigger replay.
  • Architectural models: TSO, RC, ARM weak ordering; fences enforce visibility and order.

IPC limits, critical paths, and bottlenecks

  • Limits: front-end bandwidth, RS/ROB sizes, FU counts, cache/TLB misses, branch accuracy.
  • Critical paths: predictor lookup, rename, wakeup-select, bypass networks, L1 hit latency.
  • Micro-ops fusion, macro-op cache, and clustering reduce pressure on critical structures.

Exercises

  1. Design a rename+ROB scheme for 16 architectural regs and 64 physical regs; show commit/free behavior.
  2. Given predictor accuracy and ROB/RS sizes, estimate sustainable IPC for a mix with 20% branches.
  3. Propose a wakeup-select policy for a 6-wide core and discuss fairness vs throughput trade-offs.
OoO with ROB and renaming extracts ILP while guaranteeing precise exceptions via in-order commit.