,

Contents · Edit Distance and Alignment


Overview and Applications

  • Levenshtein distance: minimum edits to turn string A into B using insert, delete, substitute.
  • Alignments: show how the edits map characters; used in diff tools, spell-checkers, bioinformatics.
  • Needleman–Wunsch: global alignment maximizing a score with match/mismatch/gap penalties.

Levenshtein Distance

// O(n*m) time, O(n*m) space Levenshtein distance
function levenshtein(a, b) {
  const n = a.length, m = b.length;
  const dp = Array.from({length: n+1}, () => Array(m+1).fill(0));
  for (let i = 0; i <= n; i++) dp[i][0] = i;
  for (let j = 0; j <= m; j++) dp[0][j] = j;
  for (let i = 1; i <= n; i++) {
    for (let j = 1; j <= m; j++) {
      const cost = a[i-1] === b[j-1] ? 0 : 1;
      dp[i][j] = Math.min(
        dp[i-1][j] + 1,      // delete
        dp[i][j-1] + 1,      // insert
        dp[i-1][j-1] + cost  // substitute or match
      );
    }
  }
  return dp[n][m];
}
  • Complexity: O(nm). Can be reduced to O(min(n,m)) space for distance only.
  • Variants: Damerau–Levenshtein adds transposition; different operation costs.

Recovering an Alignment

function levenshteinAlignment(a, b) {
  const n = a.length, m = b.length;
  const dp = Array.from({length: n+1}, () => Array(m+1).fill(0));
  for (let i = 0; i <= n; i++) dp[i][0] = i;
  for (let j = 0; j <= m; j++) dp[0][j] = j;
  for (let i = 1; i <= n; i++) {
    for (let j = 1; j <= m; j++) {
      const cost = a[i-1] === b[j-1] ? 0 : 1;
      dp[i][j] = Math.min(
        dp[i-1][j] + 1,
        dp[i][j-1] + 1,
        dp[i-1][j-1] + cost
      );
    }
  }
  // Backtrack to build alignment strings
  let i = n, j = m; let A = '', B = '';
  while (i > 0 || j > 0) {
    if (i > 0 && dp[i][j] === dp[i-1][j] + 1) { A = a[i-1] + A; B = '-' + B; i--; }
    else if (j > 0 && dp[i][j] === dp[i][j-1] + 1) { A = '-' + A; B = b[j-1] + B; j--; }
    else { A = a[i-1] + A; B = b[j-1] + B; i--; j--; }
  }
  return { distance: dp[n][m], A, B };
}

Needleman–Wunsch (Global Alignment)

// Global alignment with affine-like simple gap (linear gap penalty). For affine gaps, use 3 matrices.
function needlemanWunsch(a, b, match = 1, mismatch = -1, gap = -1) {
  const n = a.length, m = b.length;
  const dp = Array.from({length: n+1}, () => Array(m+1).fill(0));
  for (let i = 1; i <= n; i++) dp[i][0] = dp[i-1][0] + gap;
  for (let j = 1; j <= m; j++) dp[0][j] = dp[0][j-1] + gap;
  for (let i = 1; i <= n; i++) {
    for (let j = 1; j <= m; j++) {
      const scoreDiag = dp[i-1][j-1] + (a[i-1] === b[j-1] ? match : mismatch);
      const scoreUp = dp[i-1][j] + gap;   // delete a[i-1]
      const scoreLeft = dp[i][j-1] + gap; // insert b[j-1]
      dp[i][j] = Math.max(scoreDiag, scoreUp, scoreLeft);
    }
  }
  // Backtrack
  let i = n, j = m; let A = '', B = '';
  while (i > 0 || j > 0) {
    if (i > 0 && j > 0 && dp[i][j] === dp[i-1][j-1] + (a[i-1] === b[j-1] ? match : mismatch)) {
      A = a[i-1] + A; B = b[j-1] + B; i--; j--; 
    } else if (i > 0 && dp[i][j] === dp[i-1][j] + gap) {
      A = a[i-1] + A; B = '-' + B; i--; 
    } else {
      A = '-' + A; B = b[j-1] + B; j--; 
    }
  }
  return { score: dp[n][m], A, B };
}
  • Affine gaps: use three matrices (match, gap in A, gap in B) with open/extend penalties.
  • Smith–Waterman: local alignment variant (max with 0, track best cell).

Space and Speed Optimizations

  • Space O(min(n,m)): keep only two rows when only distance/score is needed.
  • Myers' bitset: fast edit distance for small alphabets and m ≤ word size.
  • Banded DP / Ukkonen: if distance ≤ k, limit computation to a diagonal band.

Exercises

  1. Implement Levenshtein with O(min(n,m)) space and verify distance equality.
  2. Extend Needleman–Wunsch to affine gaps with separate open/extend penalties.
  3. Implement banded DP that early-terminates when cost exceeds k.
Choosing costs matters: set gap/mismatch to reflect your domain (text vs. DNA vs. protein).