,

Contents · Memory management (allocators, arenas, bump)


Goals and constraints

  • Throughput vs latency, fragmentation (internal/external), memory overhead.
  • Locality (cache/TLB), predictability (RT), and multi-thread scalability.
  • APIs: malloc/free, new/delete, custom pools; alignment guarantees.

Bump/linear allocators

  • Maintain a pointer; allocate by moving the pointer; O(1) time, great locality.
  • No per-object free; reset or pop to checkpoints; ideal for frame/phase alloc.
  • Combine with arenas for lifetime-based reclamation.
ptr = align(ptr, A); out = ptr; ptr += size; // free via reset()

Arena/region allocators

  • Group allocations by lifetime; free the whole region at once.
  • Grow with linked chunks; recycle arenas to amortize cost.
  • Great for compilers, games, request-scoped services.

Free-list allocators (first/best-fit), fragmentation

  • Maintain bins of free blocks; coalescing and splitting policies.
  • Metadata choices (headers/footers) affect speed and overhead.
  • Mitigate fragmentation with size classes and rebalancing.
block(size) → split if too large; on free, coalesce adjacent buddies

Slab, segregated-fit, buddy systems

  • Slab: pre-initialized objects, per-CPU caches; predictable and fast.
  • Segregated-fit: size-class bins; tcmalloc/jemalloc-style span management.
  • Buddy: power-of-two blocks, fast coalescing; internal fragmentation trade-offs.

Thread-local and lock-free designs

  • Per-thread caches reduce contention; background rebalancing to global pools.
  • Hazard pointers/epoch reclamation for safe concurrent frees.
  • NUMA-aware placement and large pages for performance.

Interaction with the OS (virtual memory, pages)

  • Map/unmap via syscalls (mmap/VirtualAlloc); manage page alignment and guard pages.
  • Use arenas over page-granularity chunks; return memory to OS lazily.
  • Handle commit/decommit and overcommit behaviors.

Tuning, debugging, and profiling

  • Add guards, red zones, and poisoning to catch overruns and use-after-free.
  • Track allocation sites; sample to bound overhead; integrate with profilers.
  • Measure locality (cachegrind, PMU) and fragmentation over time.

Exercises

  1. Implement a bump allocator and benchmark against malloc for frame-style workloads.
  2. Build a simple arena with chunk growth and reset; measure fragmentation.
  3. Implement segregated-fit bins with per-thread caches; evaluate scalability.
Choosing the right allocator simplifies code and accelerates systems—match lifetime patterns to allocator design.