Hash tables are one of the most important data structures in computer science, providing average-case O(1) time complexity for insertion, deletion, and lookup operations. They map keys to values using a hash function that computes an index into an array of buckets.
Core Concept:
A hash table uses a hash function h(k) to map a key k to an index in an array. When multiple keys hash to the same index (a collision), different strategies handle the conflict.
Key Operations:
Collision Resolution: The primary challenge in hash table design is handling collisions when multiple keys hash to the same index. Three major approaches exist:
A good hash function is crucial for hash table performance. It should distribute keys uniformly across the table to minimize collisions.
Properties of Good Hash Functions:
Common Hash Functions:
Division Method:
h(k) = k mod m, where m is table size (preferably prime)
Multiplication Method:
h(k) = ⌊m × (k × A mod 1)⌋, where 0 < A < 1
Universal Hashing:
h(k) = ((a × k + b) mod p) mod m
String Hashing:
For strings, polynomial rolling hash is common:
h(s) = (s[0] × p^(n-1) + s[1] × p^(n-2) + ... + s[n-1]) mod m
Modern Hash Functions:
Separate chaining handles collisions by maintaining a linked list (or other data structure) at each array index. All keys that hash to the same index are stored in that index's list.
Structure:
Operations:
Insert(key, value):
Search(key):
Delete(key):
Load Factor:
α = n/m (n = number of elements, m = table size)
Advantages:
Disadvantages:
Optimizations:
Applications:
Open addressing stores all elements directly in the hash table array. When a collision occurs, it probes for the next available slot using a deterministic sequence.
Core Idea:
Instead of storing colliding elements externally, find another location within the table itself. The probe sequence determines which slots to check.
Probe Sequences:
1. Linear Probing:
h(k, i) = (h(k) + i) mod m
2. Quadratic Probing:
h(k, i) = (h(k) + c₁×i + c₂×i²) mod m
3. Double Hashing:
h(k, i) = (h₁(k) + i × h₂(k)) mod m
Operations:
Insert(key, value):
Search(key):
Delete(key):
Load Factor Constraints:
Advantages:
Disadvantages:
Practical Considerations:
Cuckoo hashing is a scheme that guarantees O(1) worst-case lookup time by using multiple hash functions and allowing elements to "kick out" other elements during insertion.
Core Concept:
Each key can be stored in one of two (or more) possible locations, determined by different hash functions. If both locations are occupied, evict one element and reinsert it in its alternate location.
Structure:
Operations:
Search(key):
Insert(key, value):
Delete(key):
Insertion Example:
Inserting key x when both h₁(x) and h₂(x) are occupied:
This creates a "cuckoo" effect where elements kick each other out like cuckoo birds.
Cycle Detection:
Load Factor:
Variants:
d-ary Cuckoo Hashing:
Buckets:
Advantages:
Disadvantages:
Applications:
Complexity Table:
| Method | Search (Avg) | Search (Worst) | Insert (Avg) | Delete |
|---|---|---|---|---|
| Separate Chaining | O(1 + α) | O(n) | O(1) | O(1 + α) |
| Linear Probing | O(1/(1-α)) | O(n) | O(1/(1-α)) | O(1/(1-α)) |
| Double Hashing | O(1/(1-α)) | O(n) | O(1/(1-α)) | O(1/(1-α)) |
| Cuckoo Hashing | O(1) | O(1) | O(1) amort | O(1) |
Space Comparison:
Cache Performance:
Practical Guidelines:
Use Separate Chaining when:
Use Open Addressing (Linear Probing) when:
Use Cuckoo Hashing when:
Real-World Usage: