,

Contents · Speculative execution and side channels


Why speculate? Performance vs. correctness

  • Speculation executes likely paths early to hide control/memory latency, then commits in order.
  • Architectural state remains precise; microarchitectural state (caches, predictors) can be changed and observed.
  • Side channels exploit timing/contention differences induced by speculation.

Mechanics: branch, memory, and value speculation

  • Branch prediction, load speculation (disambiguation), and value prediction (research) drive early execution.
  • Misprediction/mis-speculation triggers pipeline flush and state rollback at commit.
  • However, caches/TLBs/BTB/RSB and port usage keep microarchitectural footprints.
Front-end: Predict → Fetch → Decode → Rename
Back-end:  Dispatch → Issue → Execute → Writeback → Commit (precise)

Speculation window and microarchitectural state

  • Window bounded by ROB/RS sizes and depth to resolution points (e.g., branch execute, load address).
  • Transient execution can touch many lines/sets in L1/L2 and BTB entries.
  • Attackers craft gadgets to steer speculation into victim data-dependent accesses.

Spectre-class attacks (v1/v2/v4, BTB/RSB, STL)

  • v1: Bounds check bypass; mistrain predictor to speculatively access out-of-bounds and encode via cache.
  • v2: Branch target injection (BTB poisoning); mitigated with retpoline or IBPB/IBRS.
  • v4: Speculative store bypass (STL); loads bypass older stores incorrectly.
  • RSB underflow and return misprediction enable cross-privilege gadgets.
// v1-style index masking to prevent out-of-bounds disclosure
uint8_t safe_read(uint8_t *arr, size_t len, size_t idx) {
  size_t mask = (idx < len) ? ~0ULL : 0ULL;         // all-ones if in-bounds
  idx &= mask;                                      // force idx to 0 if OOB
  return arr[idx];
}

Meltdown-type faults (Foreshadow/L1TF, MDS)

  • Meltdown: transiently forward data after a fault before retirement; encode via cache.
  • Foreshadow/L1TF: L1 data sampling of SGX/VM secrets; mitigated by L1 flush on domain switch.
  • MDS: microarchitectural data sampling (buffers/ports) across hyperthreads.

Leakage vectors: caches, TLBs, predictors, ports

  • Caches: Flush+Reload, Prime+Probe, Evict+Time; shared inclusive LLCs amplify effects.
  • Predictors/BTB/RSB: aliasing enables mistraining across contexts.
  • Ports/buffers: contention timing and sampling (MDS, SMoTherSpectre).

Mitigations: fences, retpoline, masking, serialization

  • Fences and serialization: LFENCE/SFENCE/SSB barriers to block speculation past checks.
  • Retpoline for indirect branch isolation; hardware IBRS/IBPB/STIBP.
  • Index masking and constant-time patterns; Site isolation and SameSite policies in browsers.
; x86 example fence around a bounds check
cmp rdi, rsi           ; idx vs len
jae .oob
lfence                 ; prevent speculation past check
mov al, byte [rcx+rdi]
jmp .done
.oob:
xor eax, eax
.done:

OS/compiler hardening and ABI surfaces

  • Kernels: KPTI, L1D flush on context switch, scheduler STIBP control, retpoline builds.
  • Compilers: -mindirect-branch, speculation barriers, hardened memcpy/memset.
  • ABIs: prctl/sysctl toggles for IBRS/SSB; per-process mitigation policies.

Performance impact and tuning

  • Mitigations can reduce IPC and boost latencies; measure on your workload.
  • Prefer targeted hardening (hot paths with user-controlled indices/targets).
  • Use ISA-specific features (e.g., CSV2/CSV3 on ARM) when available.

Exercises

  1. Implement a flush+reload probe and measure cache hit/miss timing on your platform.
  2. Harden a bounds-checked lookup using masking and LFENCE; benchmark the overhead.
  3. Map which mitigations (IBRS/IBPB/SSBD) your kernel enables and test toggles.
Speculation boosts ILP but must be bounded with careful software and system-level hardening.