,

Contents · Randomized Algorithms (Reservoir Sampling, Hash-based)


Overview and Models

  • Las Vegas: always correct, runtime random (e.g., randomized quicksort pivoting).
  • Monte Carlo: bounded runtime, small error probability (e.g., Bloom filters, primality tests).
  • Tools: pairwise-independent hash families, Chernoff bounds, linearity of expectation.

Reservoir Sampling

// Sample k items uniformly at random from a stream of unknown length
function reservoirSample(streamIterable, k) {
  const R = [];
  let i = 0;
  for (const x of streamIterable) {
    i++;
    if (R.length < k) R.push(x);
    else {
      const j = Math.floor(Math.random() * i); // in [0, i-1]
      if (j < k) R[j] = x;
    }
  }
  return R;
}

// Correctness sketch: For any item t at position i, Pr[it remains] = (k/i) * Π_{j=i+1..n} (1 - k/j * 1/k) = k/n
  • Space: O(k). Time: O(n) single pass.
  • Weighted sampling: use Efraimidis–Spirakis: key = u^{1/w} and take top-k by key.

Hash-based Techniques

// Simple universal hashing family over 32-bit integers using 53-bit JS numbers
// h_{a,b}(x) = ((a*x + b) mod P) mod m, with P large prime, a in [1..P-1], b in [0..P-1]
function makeUniversalHash(m, seedA, seedB, P = 4294967311n) { // 2^32 prime near
  const a = BigInt(seedA)||1n, b = BigInt(seedB)||0n, bigM = BigInt(m);
  return (x) => Number((a * BigInt(x >>> 0) + b) % P % bigM);
}
  • Separate chaining/linear probing: expected O(1) ops with good load α and universal hashing.
  • Two-choice hashing: choose min-load of two bins to reduce max load to O(log log n).

Bloom Filter

// Bloom filter with k hash functions over m bits
class Bloom {
  constructor(m, k, seeds) { this.m=m; this.k=k; this.bits=new Uint8Array(m); this.h=[]; for(let i=0;iint hash (FNV-1a like)
  const s = String(x); let h=2166136261>>>0; for(let i=0;i>>0;
}
  • FP rate: ≈ (1 - e^{-kn/m})^k. Optimal k ≈ (m/n) ln 2.
  • Notes: no false negatives; deletions require counting Bloom filters.

Analysis Patterns

  • Linearity of expectation: E[ΣX_i] = ΣE[X_i] even with dependence.
  • Indicator variables: turn events into 0/1 variables to simplify expectation.
  • Concentration: Chernoff/Hoeffding bounds control tail probabilities.

Exercises

  1. Implement reservoir sampling for k=1 and k>1; validate uniformity empirically.
  2. Build a Bloom filter and measure empirical false positive rates vs. theory.
  3. Implement two-choice hashing and compare max bin load vs. single-choice hashing.
Randomization buys simplicity and speed. Always quantify error and memory trade-offs.