The Union-Find data structure (also called Disjoint Set Union or DSU) maintains a collection of disjoint (non-overlapping) sets. It efficiently supports two operations: finding which set an element belongs to, and merging two sets together.
Core Concept:
Elements are partitioned into disjoint sets. Each set has a representative element (root). The data structure efficiently answers: "Are elements x and y in the same set?" and "Merge the sets containing x and y."
Key Operations:
Why It's Important:
Real-World Analogy:
Think of social network friend groups. Each person belongs to one friend group (set). You can check if two people are in the same group (Find), and when groups merge (e.g., through a common friend), they become one larger group (Union).
The simplest implementation uses an array where each element points to its parent. The root of a set points to itself.
Data Structure:
Array parent[n] where parent[i] is the parent of element i.
parent[i] == i, then i is a root (representative)MakeSet(x):
parent[x] = x // x is its own parent (root)
Time: O(1)
Find(x):
function Find(x):
if parent[x] == x:
return x
return Find(parent[x]) // Recursively find root
Time: O(height of tree) - can be O(n) in worst case
Union(x, y):
function Union(x, y):
rootX = Find(x)
rootY = Find(y)
if rootX != rootY:
parent[rootX] = rootY // Make rootY parent of rootX
Time: O(height of tree)
Example:
Elements: {0, 1, 2, 3, 4}
Initial: Each element is its own set
parent = [0, 1, 2, 3, 4]
After Union(0, 1):
parent = [1, 1, 2, 3, 4] // 0 points to 1
After Union(2, 3):
parent = [1, 1, 3, 3, 4] // 2 points to 3
After Union(1, 3):
parent = [1, 3, 3, 3, 4] // 1 points to 3
Now elements {0, 1, 2, 3} are in the same set with root 3.
Problem with Naive Implementation:
Trees can become unbalanced, creating long chains. In the worst case (linear chain), Find and Union take O(n) time. We need optimizations!
Path compression is an optimization that flattens the tree structure during Find operations. When finding the root of an element, we make all nodes along the path point directly to the root.
Key Idea:
After finding the root, update all nodes on the path to point directly to the root. This makes future Find operations faster.
Implementation:
function Find(x):
if parent[x] != x:
parent[x] = Find(parent[x]) // Path compression
return parent[x]
This recursive implementation compresses the path as it unwinds from the recursion.
Iterative Version:
function Find(x):
root = x
// Find root
while parent[root] != root:
root = parent[root]
// Compress path
while parent[x] != root:
next = parent[x]
parent[x] = root
x = next
return root
Example:
Before path compression:
4
|
3
|
2
|
1
|
0
After Find(0) with path compression:
4
/|\\\
0 1 2 3
All nodes now point directly to root 4!
Benefits:
Two-Pass vs One-Pass:
Path Halving (Alternative):
A simpler variant that makes every node point to its grandparent:
function Find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] // Point to grandparent
x = parent[x]
return x
Slightly less aggressive but still very effective and requires only one pass.
Union by rank (or union by size) ensures that we always attach the smaller tree under the root of the larger tree. This keeps trees balanced and prevents them from becoming too tall.
Rank Definition:
Rank is an upper bound on the height of the tree. Initially, all ranks are 0. When uniting two trees of the same rank, the rank of the result increases by 1.
Data Structure:
Add a rank[n] array to track the rank of each root.
Implementation:
function MakeSet(x):
parent[x] = x
rank[x] = 0
function Union(x, y):
rootX = Find(x)
rootY = Find(y)
if rootX == rootY:
return // Already in same set
// Union by rank
if rank[rootX] < rank[rootY]:
parent[rootX] = rootY
else if rank[rootX] > rank[rootY]:
parent[rootY] = rootX
else:
parent[rootY] = rootX
rank[rootX] += 1 // Same rank, increase by 1
Why It Works:
Union by Size (Alternative):
Instead of rank, track the size of each set:
function Union(x, y):
rootX = Find(x)
rootY = Find(y)
if rootX == rootY:
return
// Union by size
if size[rootX] < size[rootY]:
parent[rootX] = rootY
size[rootY] += size[rootX]
else:
parent[rootY] = rootX
size[rootX] += size[rootY]
Both approaches achieve similar performance. Union by rank is slightly simpler since rank doesn't need to be updated in all cases.
Example:
Union by rank prevents this:
Bad: 0 -> 1 -> 2 -> 3 -> 4 (linear chain)
Instead creates this:
Good: 2
/ \
0 3
/ \
1 4
Height is O(log n) instead of O(n)!
Rank Properties:
Without Optimizations:
With Union by Rank Only:
With Path Compression Only:
With Both Optimizations:
where α(n) is the inverse Ackermann function.
Inverse Ackermann Function α(n):
This function grows extremely slowly:
Practical Implications:
Space Complexity:
Comparison Table:
| Implementation | Find | Union | m Operations |
|---|---|---|---|
| Naive | O(n) | O(n) | O(mn) |
| Union by Rank | O(log n) | O(log n) | O(m log n) |
| Path Compression | O(log n)* | O(log n)* | O(m log n)* |
| Both | O(α(n))* | O(α(n))* | O(m α(n)) |
* Amortized time complexity
1. Kruskal's Minimum Spanning Tree
Find the minimum cost tree connecting all vertices in a weighted graph:
2. Cycle Detection in Undirected Graphs
Detect if an undirected graph has a cycle:
3. Connected Components
Find connected components in a graph:
4. Network Connectivity
Determine if network nodes can communicate:
5. Image Processing
Find connected regions in images:
6. Least Common Ancestor (LCA)
Find LCA in trees using offline algorithm:
7. Percolation Theory
Model fluid flow through porous materials:
8. Social Network Analysis
Find friend groups and communities:
9. Equivalent Class Partitioning
Group equivalent elements:
10. Game Development
Terrain generation and pathfinding: