,

Contents · Arrays and Dynamic Arrays


Overview

Arrays are the most fundamental data structure in computer science, providing contiguous storage for elements of the same type. They serve as the building block for many higher-level data structures and are essential for understanding memory layout and access patterns.

Key Characteristics:

  • Contiguous memory - Elements stored sequentially
  • Constant-time indexing - O(1) access by index
  • Fixed or dynamic size - Static vs dynamic variants
  • Cache-friendly - Excellent spatial locality

Dynamic arrays (also called resizable arrays, array lists, or vectors) extend static arrays with automatic resizing capability, making them practical for scenarios where the size isn't known in advance.


Static Arrays

Definition

A static array is a fixed-size collection of elements stored in contiguous memory locations. The size is determined at creation and cannot change.

Memory Layout

// Array of 5 integers
int arr[5] = {10, 20, 30, 40, 50};

Memory:
Address:  1000  1004  1008  1012  1016
Value:     10    20    30    40    50
Index:     [0]   [1]   [2]   [3]   [4]

Index Calculation

Element address = base_address + (index × element_size)

This formula enables O(1) random access, the defining feature of arrays.

Operations

  • Access: O(1) - arr[i]
  • Update: O(1) - arr[i] = value
  • Search (unsorted): O(n) - linear scan
  • Search (sorted): O(log n) - binary search
  • Insert: Not supported (fixed size)
  • Delete: Not supported (fixed size)

Limitations

  • Size must be known at compile-time or allocation-time
  • Cannot grow or shrink
  • Wasted space if allocated larger than needed
  • Overflow if more elements needed than allocated

Dynamic Arrays

Concept

Dynamic arrays maintain a backing static array but automatically resize when capacity is exceeded. This combines array efficiency with flexibility.

Core Properties

  • Capacity: Size of underlying static array
  • Size (length): Number of elements currently stored
  • Growth factor: Multiplier when resizing (typically 1.5× or 2×)

Resizing Strategy

Doubling Strategy (Growth Factor = 2):

Initial capacity: 4
After 5 insertions: resize to 8
After 9 insertions: resize to 16
After 17 insertions: resize to 32

Why Doubling Works:

  • Amortized O(1) append operation
  • Balances resize frequency vs memory waste
  • Each resize doubles capacity, so fewer resizes overall

Amortized Analysis

Although a single append may take O(n) time during resize, the average cost per append is O(1) when analyzed over a sequence of operations.

Proof Sketch:

  • Insert n elements into empty dynamic array
  • Resizes occur at sizes: 1→2, 2→4, 4→8, ..., n/2→n
  • Total copies: 1 + 2 + 4 + 8 + ... + n/2 = n - 1 ≈ n
  • Average per insertion: n/n = O(1) amortized

Growth Factor Comparison

Factor Pros Cons
2.0 Simple, fast amortization Memory waste up to 50%
1.5 Better memory reuse More frequent resizes
φ (1.618) Optimal reuse (Fibonacci) Complex to implement

Shrinking

Some implementations also shrink capacity when size drops below 1/4 capacity (to maintain load factor ≥ 0.25). This prevents pathological memory usage but adds complexity.


Implementation Details

Basic Structure (C-style)

typedef struct {
    int* data;      // Pointer to backing array
    size_t size;    // Current number of elements
    size_t capacity; // Current capacity
} DynamicArray;

Core Operations

1. Initialize

DynamicArray* create(size_t initial_capacity) {
    DynamicArray* arr = malloc(sizeof(DynamicArray));
    arr->data = malloc(initial_capacity * sizeof(int));
    arr->size = 0;
    arr->capacity = initial_capacity;
    return arr;
}

2. Append (with resize)

void append(DynamicArray* arr, int value) {
    if (arr->size == arr->capacity) {
        // Resize: allocate 2× capacity
        size_t new_capacity = arr->capacity * 2;
        int* new_data = malloc(new_capacity * sizeof(int));
        
        // Copy old data
        memcpy(new_data, arr->data, arr->size * sizeof(int));
        
        // Free old array, update pointer
        free(arr->data);
        arr->data = new_data;
        arr->capacity = new_capacity;
    }
    arr->data[arr->size++] = value;
}

3. Insert at Index

void insert(DynamicArray* arr, size_t index, int value) {
    if (arr->size == arr->capacity) {
        resize(arr); // Same as append
    }
    
    // Shift elements right
    for (size_t i = arr->size; i > index; i--) {
        arr->data[i] = arr->data[i - 1];
    }
    
    arr->data[index] = value;
    arr->size++;
}

4. Delete at Index

void delete(DynamicArray* arr, size_t index) {
    // Shift elements left
    for (size_t i = index; i < arr->size - 1; i++) {
        arr->data[i] = arr->data[i + 1];
    }
    arr->size--;
    
    // Optional: shrink if size < capacity/4
    if (arr->size < arr->capacity / 4) {
        shrink(arr);
    }
}

Language Implementations

  • C++: std::vector
  • Java: ArrayList
  • Python: list (built-in)
  • C#: List<T>
  • JavaScript: Array (built-in)
  • Rust: Vec<T>
  • Go: slices (built-in)

Time & Space Complexity

Time Complexity Summary

Operation Static Array Dynamic Array (Worst) Dynamic Array (Amortized)
Access by index O(1) O(1) O(1)
Update by index O(1) O(1) O(1)
Append (end) N/A O(n) O(1)
Insert (beginning) N/A O(n) O(n)
Insert (middle) N/A O(n) O(n)
Delete (end) N/A O(1) O(1)
Delete (beginning) N/A O(n) O(n)
Search (unsorted) O(n) O(n) O(n)
Search (sorted) O(log n) O(log n) O(log n)

Space Complexity

  • Static Array: O(n) - exactly n elements
  • Dynamic Array: O(n) - between n and 2n elements (50% waste worst-case)

Cache Performance

Arrays excel at cache utilization due to spatial locality:

  • Sequential access patterns fetch entire cache lines
  • Prefetchers can predict and load ahead
  • Iteration is typically 2-10× faster than pointer-based structures

Cache Line Example:

// Modern CPUs: 64-byte cache lines
// int = 4 bytes → 16 ints per cache line
// Accessing arr[0] likely loads arr[0..15] into cache

Use Cases & Trade-offs

When to Use Arrays

✅ Good For:

  • Frequent random access by index
  • Iteration over all elements
  • Stack-like operations (push/pop at end)
  • When size is known or grows predictably
  • Cache-sensitive applications
  • Implementing other data structures (heap, stack, queue)

❌ Poor For:

  • Frequent insertions/deletions in middle
  • Unpredictable size with strict memory constraints
  • Needing guaranteed O(1) insert anywhere

Comparison with Alternatives

vs Linked Lists

Aspect Array Linked List
Access O(1) O(n)
Insert (front) O(n) O(1)
Memory overhead Low High (pointers)
Cache locality Excellent Poor

Rule of thumb: Prefer arrays unless you need frequent O(1) insertions at arbitrary positions.

Real-World Applications

  • Image processing: Pixel buffers
  • Databases: Row storage in pages
  • Graphics: Vertex/index buffers
  • ML: Dense tensors and matrices
  • Systems: Buffer pools, ring buffers
  • Compilers: Symbol tables, token streams

Advanced Variants

  • Circular arrays: Ring buffers for queues
  • Sparse arrays: Only store non-default values
  • Bit arrays: Compact boolean storage
  • Jagged arrays: Array of arrays with varying lengths
  • Multi-dimensional: Matrices and tensors

Exercises

Implementation Challenges

  1. Implement a dynamic array from scratch with insert, delete, and resize operations. Support both growth and shrinking.
  2. Circular buffer: Implement a fixed-size queue using a circular array with O(1) enqueue and dequeue.
  3. Multi-dimensional array: Implement a 2D matrix using a 1D array with row-major and column-major layouts.

Algorithm Problems

  1. Two Sum: Given an array and target, find two numbers that sum to target. (O(n) solution exists!)
  2. Remove duplicates: Remove duplicates from sorted array in-place, return new length. O(n) time, O(1) space.
  3. Rotate array: Rotate array right by k steps. Can you do it in O(n) time and O(1) extra space?
  4. Find median: Find median of two sorted arrays. O(log(min(n,m))) solution exists.
  5. Maximum subarray: Find contiguous subarray with largest sum (Kadane's algorithm).

Analysis Questions

  1. Why is doubling better than increasing capacity by a constant (e.g., +10) for amortized performance?
  2. Calculate the exact number of element copies for n insertions starting with capacity 1 and growth factor 2.
  3. What is the memory overhead of std::vector<bool> vs std::vector<int> per element?
  4. Design a dynamic array that minimizes memory waste while maintaining O(1) amortized append. What growth factor would you choose?

Advanced

  1. Cache-oblivious algorithms: Analyze why array iteration is faster than linked list traversal on modern CPUs.
  2. SIMD optimization: How can you use SIMD instructions (e.g., AVX) to sum an array 8× faster?
  3. Persistent array: Design a persistent (immutable) dynamic array with O(log n) updates using path copying.