,

Contents · Data compression (Huffman, arithmetic coding, LZ family)


Overview

Lossless compression reduces redundancy to represent data with fewer bits. Classic techniques include prefix codes (Huffman), arithmetic coding for near-entropy rates, and dictionary methods (LZ77/LZ78/LZW).


Details

  • Huffman coding: optimal prefix code for known symbol probabilities; average length ≤ H(X) + 1.
  • Arithmetic coding: encodes entire message into an interval; achieves rates close to entropy, supports adaptive models.
  • LZ family: dictionary-based. LZ77 (sliding window), LZ78 (growing dictionary), LZW (variant used in GIF).
  • Modeling: static vs adaptive; context models; trade-off between complexity and compression ratio.
  • Practical formats: DEFLATE (LZ77 + Huffman), PNG, GIF, ZIP; interaction with entropy coding.

Exercises

  1. Build a Huffman code for a small symbol set and compute average code length vs entropy.
  2. Arithmetic-code a short binary sequence with given probabilities; show interval refinement steps.
  3. Demonstrate LZ77 encoding with a fixed window on a short text and reconstruct the original.