,

Contents · Union-Find / Disjoint Sets


Overview

The Union-Find data structure (also called Disjoint Set Union or DSU) maintains a collection of disjoint (non-overlapping) sets. It efficiently supports two operations: finding which set an element belongs to, and merging two sets together.

Core Concept:

Elements are partitioned into disjoint sets. Each set has a representative element (root). The data structure efficiently answers: "Are elements x and y in the same set?" and "Merge the sets containing x and y."

Key Operations:

  • MakeSet(x): Create a new set containing only element x
  • Find(x): Return the representative (root) of the set containing x
  • Union(x, y): Merge the sets containing x and y

Why It's Important:

  • Essential for Kruskal's minimum spanning tree algorithm
  • Detects cycles in undirected graphs
  • Solves connectivity problems efficiently
  • Near-constant time operations with optimizations
  • Simple to implement yet powerful

Real-World Analogy:

Think of social network friend groups. Each person belongs to one friend group (set). You can check if two people are in the same group (Find), and when groups merge (e.g., through a common friend), they become one larger group (Union).


Basic Operations

The simplest implementation uses an array where each element points to its parent. The root of a set points to itself.

Data Structure:

Array parent[n] where parent[i] is the parent of element i.

  • If parent[i] == i, then i is a root (representative)
  • Otherwise, follow parent pointers to reach the root

MakeSet(x):

parent[x] = x  // x is its own parent (root)

Time: O(1)

Find(x):

function Find(x):
    if parent[x] == x:
        return x
    return Find(parent[x])  // Recursively find root

Time: O(height of tree) - can be O(n) in worst case

Union(x, y):

function Union(x, y):
    rootX = Find(x)
    rootY = Find(y)
    if rootX != rootY:
        parent[rootX] = rootY  // Make rootY parent of rootX

Time: O(height of tree)

Example:

Elements: {0, 1, 2, 3, 4}

Initial: Each element is its own set

parent = [0, 1, 2, 3, 4]

After Union(0, 1):

parent = [1, 1, 2, 3, 4]  // 0 points to 1

After Union(2, 3):

parent = [1, 1, 3, 3, 4]  // 2 points to 3

After Union(1, 3):

parent = [1, 3, 3, 3, 4]  // 1 points to 3

Now elements {0, 1, 2, 3} are in the same set with root 3.

Problem with Naive Implementation:

Trees can become unbalanced, creating long chains. In the worst case (linear chain), Find and Union take O(n) time. We need optimizations!


Path Compression

Path compression is an optimization that flattens the tree structure during Find operations. When finding the root of an element, we make all nodes along the path point directly to the root.

Key Idea:

After finding the root, update all nodes on the path to point directly to the root. This makes future Find operations faster.

Implementation:

function Find(x):
    if parent[x] != x:
        parent[x] = Find(parent[x])  // Path compression
    return parent[x]

This recursive implementation compresses the path as it unwinds from the recursion.

Iterative Version:

function Find(x):
    root = x
    // Find root
    while parent[root] != root:
        root = parent[root]
    
    // Compress path
    while parent[x] != root:
        next = parent[x]
        parent[x] = root
        x = next
    
    return root

Example:

Before path compression:

    4
    |
    3
    |
    2
    |
    1
    |
    0

After Find(0) with path compression:

    4
   /|\\\
  0 1 2 3

All nodes now point directly to root 4!

Benefits:

  • Dramatically reduces tree height
  • Makes subsequent Find operations nearly O(1)
  • Amortizes the cost of Find over multiple operations
  • Simple to implement (just one line change)

Two-Pass vs One-Pass:

  • Two-Pass (iterative): First find root, then compress path
  • One-Pass (recursive): Compress while unwinding recursion
  • Both achieve the same result
  • Recursive is more elegant but uses stack space

Path Halving (Alternative):

A simpler variant that makes every node point to its grandparent:

function Find(x):
    while parent[x] != x:
        parent[x] = parent[parent[x]]  // Point to grandparent
        x = parent[x]
    return x

Slightly less aggressive but still very effective and requires only one pass.


Union by Rank

Union by rank (or union by size) ensures that we always attach the smaller tree under the root of the larger tree. This keeps trees balanced and prevents them from becoming too tall.

Rank Definition:

Rank is an upper bound on the height of the tree. Initially, all ranks are 0. When uniting two trees of the same rank, the rank of the result increases by 1.

Data Structure:

Add a rank[n] array to track the rank of each root.

Implementation:

function MakeSet(x):
    parent[x] = x
    rank[x] = 0

function Union(x, y):
    rootX = Find(x)
    rootY = Find(y)
    
    if rootX == rootY:
        return  // Already in same set
    
    // Union by rank
    if rank[rootX] < rank[rootY]:
        parent[rootX] = rootY
    else if rank[rootX] > rank[rootY]:
        parent[rootY] = rootX
    else:
        parent[rootY] = rootX
        rank[rootX] += 1  // Same rank, increase by 1

Why It Works:

  • Smaller tree has fewer nodes, so less work to update
  • Keeps trees balanced (logarithmic height)
  • Rank increases only when merging equal-rank trees
  • Maximum rank is O(log n)

Union by Size (Alternative):

Instead of rank, track the size of each set:

function Union(x, y):
    rootX = Find(x)
    rootY = Find(y)
    
    if rootX == rootY:
        return
    
    // Union by size
    if size[rootX] < size[rootY]:
        parent[rootX] = rootY
        size[rootY] += size[rootX]
    else:
        parent[rootY] = rootX
        size[rootX] += size[rootY]

Both approaches achieve similar performance. Union by rank is slightly simpler since rank doesn't need to be updated in all cases.

Example:

Union by rank prevents this:

Bad: 0 -> 1 -> 2 -> 3 -> 4  (linear chain)

Instead creates this:

Good:     2
         / \
        0   3
       /     \
      1       4

Height is O(log n) instead of O(n)!

Rank Properties:

  • A tree with rank r has at least 2^r nodes
  • Maximum rank is ⌊log₂ n⌋
  • Rank only increases when merging equal-rank trees
  • With path compression, rank becomes an upper bound (not exact height)

Time Complexity Analysis

Without Optimizations:

  • Find: O(n) worst case (linear chain)
  • Union: O(n) worst case (calls Find)
  • m operations: O(mn) worst case

With Union by Rank Only:

  • Find: O(log n) worst case
  • Union: O(log n) worst case
  • m operations: O(m log n)

With Path Compression Only:

  • Find: O(log n) amortized
  • Union: O(log n) amortized
  • m operations: O(m log n) amortized

With Both Optimizations:

  • Find: O(α(n)) amortized
  • Union: O(α(n)) amortized
  • m operations: O(m α(n))

where α(n) is the inverse Ackermann function.

Inverse Ackermann Function α(n):

This function grows extremely slowly:

  • α(n) ≤ 4 for all practical values of n
  • α(2^65536) = 5
  • Effectively constant for all real-world inputs
  • Grows slower than log n, log log n, log* n

Practical Implications:

  • With both optimizations, operations are nearly O(1)
  • For n ≤ 10^80 (atoms in universe), α(n) ≤ 4
  • In practice, treat as constant time
  • One of the most efficient data structures

Space Complexity:

  • O(n) space for parent array
  • O(n) space for rank/size array
  • Total: O(n) space
  • Very memory efficient

Comparison Table:

Implementation Find Union m Operations
Naive O(n) O(n) O(mn)
Union by Rank O(log n) O(log n) O(m log n)
Path Compression O(log n)* O(log n)* O(m log n)*
Both O(α(n))* O(α(n))* O(m α(n))

* Amortized time complexity


Applications

1. Kruskal's Minimum Spanning Tree

Find the minimum cost tree connecting all vertices in a weighted graph:

  • Sort edges by weight
  • For each edge (u, v), if Find(u) ≠ Find(v), add edge and Union(u, v)
  • Union-Find prevents cycles efficiently
  • Time: O(E log E) for sorting + O(E α(V)) for Union-Find

2. Cycle Detection in Undirected Graphs

Detect if an undirected graph has a cycle:

  • For each edge (u, v), if Find(u) == Find(v), cycle exists
  • Otherwise, Union(u, v)
  • Time: O(E α(V))
  • Much simpler than DFS for this specific problem

3. Connected Components

Find connected components in a graph:

  • Union all edges
  • Elements with same root are in same component
  • Count distinct roots for number of components
  • Dynamic connectivity: handle edge additions efficiently

4. Network Connectivity

Determine if network nodes can communicate:

  • Each node starts as separate set
  • Union nodes when connection established
  • Check connectivity with Find
  • Used in network routing and reliability analysis

5. Image Processing

Find connected regions in images:

  • Each pixel is an element
  • Union adjacent pixels with similar colors
  • Connected components become regions
  • Used in image segmentation and object detection

6. Least Common Ancestor (LCA)

Find LCA in trees using offline algorithm:

  • Tarjan's offline LCA algorithm uses Union-Find
  • Process queries after DFS traversal
  • Efficient for batch LCA queries

7. Percolation Theory

Model fluid flow through porous materials:

  • Grid of sites (open or blocked)
  • Union adjacent open sites
  • System percolates if top and bottom are connected
  • Used in physics and materials science

8. Social Network Analysis

Find friend groups and communities:

  • Union friends
  • Find which group a person belongs to
  • Merge groups when connections form
  • Analyze network structure and clustering

9. Equivalent Class Partitioning

Group equivalent elements:

  • Union elements that are equivalent
  • Transitive closure of equivalence relation
  • Used in compiler optimization and symbolic computation

10. Game Development

Terrain generation and pathfinding:

  • Generate connected regions (caves, islands)
  • Ensure map connectivity
  • Dynamic obstacle handling
  • Efficient region-based queries

Exercises

  1. Basic Implementation: Implement Union-Find with both path compression and union by rank. Test with various operations.
  2. Kruskal's Algorithm: Implement Kruskal's MST algorithm using Union-Find. Compare with Prim's algorithm.
  3. Number of Islands: Given a 2D grid of 1s (land) and 0s (water), count the number of islands using Union-Find.
  4. Friend Circles: Given a friendship matrix, find the number of friend circles (connected components).
  5. Redundant Connection: Given edges that form a tree plus one extra edge, find the redundant edge using cycle detection.
  6. Accounts Merge: Merge accounts that share common emails using Union-Find.
  7. Percolation: Implement a percolation simulation. Determine the percolation threshold.
  8. Dynamic Connectivity: Handle queries: add edge, check if two nodes are connected. Measure performance vs. DFS.
  9. Optimize Union-Find: Implement and compare: path compression, path halving, path splitting, union by size vs. rank.
  10. Offline Queries: Given a graph and queries about connectivity at different times, answer all queries efficiently using Union-Find.