- Why SIMD? Data-level parallelism
- ISAs overview: SSE, AVX(2/512), NEON/SVE
- Execution model: lanes, masks, widening/narrowing
- Loads/stores: alignment, gather/scatter
- Core ops: arithmetic, shuffle, permute, horizontal
- Masking and predication
- Throughput, latency, and memory bandwidth
- Intrinsics and auto-vectorization
- Portability, feature detection
- Exercises