,

Contents · Information theory (entropy, mutual information)


Overview

Information theory quantifies uncertainty and information. Entropy measures average surprise; mutual information captures shared information between variables and underpins coding, ML, and communications.


Details

  • Entropy H(X) = −∑ p(x) log p(x); conditional entropy H(Y|X); joint entropy H(X,Y).
  • Mutual information I(X;Y) = H(X) + H(Y) − H(X,Y) = H(X) − H(X|Y) ≥ 0; equals KL divergence D_KL(P_{XY} || P_X P_Y).
  • Chain rule: H(X,Y) = H(X) + H(Y|X); I(X;Y,Z) = I(X;Y) + I(X;Z|Y).
  • Data processing inequality: processing cannot increase mutual information.
  • Source coding: Shannon entropy lower bound; Huffman coding achieves near-optimal average length.
  • Channel capacity: C = max_{p(x)} I(X;Y); binary symmetric channel example.

Exercises

  1. Compute H(X), H(Y), H(X,Y), and I(X;Y) for a small joint distribution.
  2. Design a Huffman code for symbols with given probabilities and compare average length to H(X).
  3. For a BSC with flip prob p, derive capacity C = 1 − H_2(p) and evaluate for p = 0.1.