Data Structures
1. Arrays and Linked Lists
1.1 Arrays
An array is a contiguous block of memory storing elements of the same type, accessed by index in time.
Operations:
| Operation | Time |
|---|---|
| Access by index | |
| Search | |
| Insert at end | amortized |
| Insert at front | |
| Delete at index |
Dynamic arrays (e.g., Python list, C++ vector) double capacity when full:
INSERT(arr, x): if arr.size == arr.capacity: new_cap = 2 * arr.capacity copy arr to new array of size new_cap arr.capacity = new_cap arr[arr.size] = x arr.size += 11.2 Linked Lists
A linked list stores elements in nodes, each containing data and a pointer to the next node.
Singly Linked List:
NODE = { data, next }
INSERT_HEAD(list, x): node = NODE(x, list.head) list.head = node
DELETE(list, x): prev = null curr = list.head while curr != null and curr.data != x: prev = curr curr = curr.next if curr != null: if prev == null: list.head = curr.next else: prev.next = curr.nextDoubly Linked List:
NODE = { data, prev, next }| Operation | Array | Singly LL | Doubly LL |
|---|---|---|---|
| Access by index | |||
| Insert at head | |||
| Delete at node | * | ||
| Search |
*Requires pointer to predecessor unless node pointer given.
2. Stacks and Queues
2.1 Stacks
A stack follows LIFO (Last In, First Out) order.
PUSH(stack, x): stack.top += 1 stack[stack.top] = x
POP(stack): if stack.top == -1: error "underflow" x = stack[stack.top] stack.top -= 1 return x
PEEK(stack): return stack[stack.top]All operations are . Applications: expression evaluation, call stacks, DFS, bracket matching.
2.2 Queues
A queue follows FIFO (First In, First Out) order.
ENQUEUE(queue, x): queue.rear = (queue.rear + 1) % queue.capacity queue[queue.rear] = x queue.size += 1
DEQUEUE(queue): if queue.size == 0: error "underflow" x = queue[queue.front] queue.front = (queue.front + 1) % queue.capacity queue.size -= 1 return xImplemented with a circular array. All operations are .
2.3 Priority Queues
A priority queue orders elements by priority. Best implemented with a heap (see Section 5).
| Operation | Unsorted Array | Sorted Array | Heap |
|---|---|---|---|
| Insert | |||
| Extract-Min | |||
| Peek |
2.4 Deques
A deque (double-ended queue) allows insertion and deletion at both ends, all with a circular array.
3. Hash Tables
3.1 Hash Functions
A hash function maps keys from universe to table indices.
Division method: . Choose prime, not near a power of 2.
Multiplication method: , where , ideally .
Universal hashing: Pick randomly from a family such that for any distinct :
3.2 Chaining
Each slot holds a linked list of all elements that hash to that index.
Expected chain length: (load factor).
Expected search time: .
3.3 Open Addressing
All elements stored in the table itself. On collision, probe for the next empty slot.
Probing strategies:
- Linear probing: . Suffers from primary clustering.
- Quadratic probing: . Secondary clustering.
- Double hashing: . Best distribution.
Insert:
HASH_INSERT(T, k): i = 0 while true: j = h(k, i) if T[j] == NIL: T[j] = k return j i += 1 if i == m: error "table full"3.4 Load Factor and Resizing
Load factor: .
- Chaining: can exceed 1. Expected lookup .
- Open addressing: . Expected lookup .
Resizing: When exceeds threshold (e.g., 0.7 for open addressing, 0.75 for chaining), allocate a larger table and rehash all entries.
3.5 Comparison
| Aspect | Chaining | Open Addressing |
|---|---|---|
| Load factor | OK | |
| Memory | Extra for pointers | Compact |
| Cache performance | Poor (pointer chasing) | Good (contiguous) |
| Deletion | Simple | Requires tombstone |
4. Trees
4.1 Binary Search Trees (BST)
BST Property: For every node , all keys in the left subtree and all keys in the right subtree .
BST_INSERT(T, z): y = NIL x = T.root while x != NIL: y = x if z.key < x.key: x = x.left else: x = x.right z.parent = y if y == NIL: T.root = z elif z.key < y.key: y.left = z else: y.right = z
BST_SEARCH(x, k): while x != NIL and k != x.key: if k < x.key: x = x.left else: x = x.right return xWorst case: where for a degenerate tree.
Expected height (random insertions): .
4.2 Tree Traversals
INORDER(x): if x != NIL: INORDER(x.left) visit(x) INORDER(x.right) // produces sorted order for BST
PREORDER(x): if x != NIL: visit(x) PREORDER(x.left) PREORDER(x.right)
POSTORDER(x): if x != NIL: POSTORDER(x.left) POSTORDER(x.right) visit(x)4.3 AVL Trees
An AVL tree is a self-balancing BST where for every node, the heights of left and right subtrees differ by at most 1.
Balance factor: .
Rotations:
LEFT_ROTATE(x): y = x.right T2 = y.left y.left = x x.right = T2 update heights(x, y) return y
RIGHT_ROTATE(y): x = y.left T2 = x.right x.right = y y.left = T2 update heights(y, x) return x| Case | Balance Factor | Fix |
|---|---|---|
| Left-Left | bf(node) = +2, bf(left) = +1 | Right rotate |
| Right-Right | bf(node) = -2, bf(right) = -1 | Left rotate |
| Left-Right | bf(node) = +2, bf(left) = -1 | Left rotate left, right rotate |
| Right-Left | bf(node) = -2, bf(right) = +1 | Right rotate right, left rotate |
Height: , so all operations are .
4.4 Red-Black Trees
A red-black tree satisfies:
- Every node is red or black.
- Root is black.
- Leaves (NIL) are black.
- Red nodes have black children (no two consecutive reds).
- Every path from root to leaf has the same number of black nodes (black-height).
Height: .
RB_INSERT(T, z): BST_INSERT(T, z) z.color = RED RB_INSERT_FIXUP(T, z)
RB_INSERT_FIXUP(T, z): while z.parent.color == RED: if z.parent == z.parent.parent.left: y = z.parent.parent.right // uncle if y.color == RED: // Case 1: recolor z.parent.color = BLACK y.color = BLACK z.parent.parent.color = RED z = z.parent.parent else: if z == z.parent.right: // Case 2: left rotate z = z.parent LEFT_ROTATE(T, z) z.parent.color = BLACK // Case 3: right rotate z.parent.parent.color = RED RIGHT_ROTATE(T, z.parent.parent) else: // symmetric cases ... T.root.color = BLACK4.5 B-Trees
A B-tree of order satisfies:
- Every node (except root) has at least keys.
- Every node has at most keys.
- Every internal node with keys has children.
- All leaves are at the same depth.
Height: .
Search, insert, delete: , but with disk-friendly I/O: disk accesses.
Designed for disk-based storage. Each node in most cases occupies a disk block.
Split operation (insert):
B_TREE_SPLIT_CHILD(T, x, i): y = x.child[i] // full node with 2t-1 keys z = new node z.leaf = y.leaf z.n = t - 1 z.keys[1..t-1] = y.keys[t..2t-1] if not y.leaf: z.children[1..t] = y.children[t+1..2t] y.n = t - 1 shift x.keys and x.children right at position i x.keys[i] = y.keys[t] // median key moves up x.children[i+1] = z x.n += 15. Heaps
5.1 Binary Heaps
A binary heap is a complete binary tree satisfying the heap property:
- Max-heap:
- Min-heap:
Array representation: For index (1-based):
- Parent:
- Left child:
- Right child:
MAX_HEAPIFY(A, i, n): l = 2*i r = 2*i + 1 largest = i if l <= n and A[l] > A[largest]: largest = l if r <= n and A[r] > A[largest]: largest = r if largest != i: swap A[i], A[largest] MAX_HEAPIFY(A, largest, n)
BUILD_MAX_HEAP(A, n): for i = floor(n/2) downto 1: MAX_HEAPIFY(A, i, n)
HEAPSORT(A): BUILD_MAX_HEAP(A, A.length) for i = A.length downto 2: swap A[1], A[i] MAX_HEAPIFY(A, 1, i-1)Priority queue operations:
| Operation | Time |
|---|---|
| Build-Heap | |
| Insert | |
| Extract-Max/Min | |
| Peek | |
| Increase-Key |
5.2 Binomial Heaps
A binomial heap is a collection of binomial trees where each has nodes.
Binomial tree : Two trees linked, one as leftmost child of the other’s root.
| Operation | Time |
|---|---|
| Make-Heap | |
| Insert | amortized |
| Find-Min | |
| Extract-Min | |
| Union |
5.3 Fibonacci Heaps
A Fibonacci heap is a collection of trees (not necessarily binomial) supporting lazy operations.
| Operation | Amortized | Worst Case |
|---|---|---|
| Insert | ||
| Find-Min | ||
| Extract-Min | ||
| Decrease-Key | * | |
| Delete |
* amortized, worst case.
Key insight: Decrease-key is amortized, making Dijkstra .
6. Graphs
6.1 Representations
Adjacency Matrix: if edge exists.
- Space:
- Edge lookup:
- Iterate neighbors:
Adjacency List: Array of linked lists, one per vertex.
- Space:
- Edge lookup: (or worst case)
- Iterate neighbors:
When to use which:
| Factor | Adjacency Matrix | Adjacency List |
|---|---|---|
| Dense graphs | Better | |
| Sparse graphs | Better | |
| Edge queries | ||
| Space |
6.2 Graph Traversals
BFS(G, s): for each v in G.V: v.color = WHITE, v.dist = INF, v.parent = NIL s.color = GRAY, s.dist = 0, s.parent = NIL Q = ENQUEUE({s}) while Q is not empty: u = DEQUEUE(Q) for each v in G.Adj[u]: if v.color == WHITE: v.color = GRAY, v.dist = u.dist + 1, v.parent = u ENQUEUE(Q, v) u.color = BLACK
DFS(G): for each u in G.V: u.color = WHITE, u.parent = NIL time = 0 for each u in G.V: if u.color == WHITE: DFS_VISIT(G, u)
DFS_VISIT(G, u): time += 1; u.d = time; u.color = GRAY for each v in G.Adj[u]: if v.color == WHITE: v.parent = u; DFS_VISIT(G, v) time += 1; u.f = time; u.color = BLACK7. Tries
A trie (prefix tree) stores strings character by character from root to leaf.
TRIE_NODE = { children: MAP<char, TRIE_NODE>, is_end: bool }
TRIE_INSERT(root, word): node = root for each char c in word: if c not in node.children: node.children[c] = new TRIE_NODE node = node.children[c] node.is_end = true
TRIE_SEARCH(root, word): node = root for each char c in word: if c not in node.children: return false node = node.children[c] return node.is_end
TRIE_PREFIX(root, prefix): node = root for each char c in prefix: if c not in node.children: return false node = node.children[c] return true| Operation | Trie | Hash Table | BST/Map |
|---|---|---|---|
| Insert | avg | ||
| Search | avg | ||
| Prefix search | Not supported | Not supported | |
| Space |
Where = string length, = number of strings.
8. Amortized Analysis
8.1 Aggregate Method
Total cost of operations divided by .
Example: Dynamic array. insertions cost at most . Amortized cost per insert: .
8.2 Accounting (Banker’s) Method
Assign each operation an amortized cost. Maintain a credit balance. The credit must never go negative.
Example: Dynamic array. Charge \3$ per insert:
- \1$ pays for the insert itself.
- \2$ is saved as credit for future copying.
When we double and copy elements, the elements already paid \2$2k \geq k$ to pay for copying.
8.3 Potential (Physicist’s) Method
Define a potential function on the data structure. Amortized cost of operation :
Requirements: and for all .
Example: Dynamic array. Let where = elements, = capacity.
- Insert (no resize): , . .
- Insert (resize): , . , but amortized over operations since potential drops: after resize, , so , making .
Amortized cost: per insert.
8.4 Common Amortized Bounds
| Data Structure | Operations | Amortized Cost |
|---|---|---|
| Dynamic array | Append, Pop | |
| Splay tree | Access, Insert, Delete | |
| Fibonacci heap | Insert, Decrease-Key | , |
9. Common Pitfalls
Confusing worst-case and amortized complexity. Amortized does not guarantee per operation. A single operation can be , but averaged over a sequence, each is .
Forgetting to rebalance BSTs after insertion/deletion. A BST without rebalancing can degenerate to a linked list, making all operations . Always use AVL, red-black, or another self-balancing variant.
Choosing the wrong hash table collision resolution. Open addressing is cache-friendly but degrades quickly at high load factors. Chaining handles high load factors better but uses more memory and has poor cache locality.
Using adjacency matrices for sparse graphs. An adjacency matrix uses space regardless of edge count. For sparse graphs (), always use adjacency lists.
Implementing a trie with fixed-size arrays per node. Using a 256-element array per node wastes enormous memory. Use hash maps or arrays sized to the actual alphabet for space efficiency.
Ignoring the potential function’s non-negativity requirement. In amortized analysis, if can be negative, the bound is invalid. Always verify that for all reachable states.
Misunderstanding B-tree parameter . The minimum degree determines the range of keys per node ( to ). Setting too small yields tall trees; setting it too large wastes memory per node. Choose based on disk block size.
Worked Examples
Example 1: Building a Binary Search Tree
Problem: Insert the sequence 5, 3, 7, 1, 4, 6, 8 into an initially empty BST. What is the height and the inorder traversal? Solution: Root=5. 3 goes left, 7 goes right. 1 goes left of 3, 4 goes right of 3. 6 goes left of 7, 8 goes right of 7. Height = 3 (levels 0-3). Inorder traversal: 1, 3, 4, 5, 6, 7, 8 (sorted order). The BST is balanced in this case.
Example 2: Hash Table Collision Resolution
Problem: Insert keys 10, 22, 31, 4, 15, 28, 17 into a hash table of size 7 using chaining with h(k) = k mod 7. Solution: h(10)=3, h(22)=1, h(31)=3, h(4)=4, h(15)=1, h(28)=0, h(17)=3. Slot 0: [28]. Slot 1: [22, 15]. Slot 3: [10, 31, 17]. Slot 4: [4]. Slots 2, 5, 6: empty. Load factor = 7/7 = 1. Average search time with chaining = O(1 + alpha) = O(2).
Summary
- Arrays give random access; linked lists give insert/delete at known positions.
- Stacks (LIFO) and queues (FIFO) are fundamental ADTs with operations.
- Hash tables achieve average-case operations via good hash functions and collision resolution.
- BSTs (AVL, red-black) guarantee operations; B-trees optimize for disk I/O.
- Heaps (binary, binomial, Fibonacci) implement priority queues; Fibonacci heaps give amortized decrease-key.
- Graphs are represented via adjacency lists (sparse) or matrices (dense).
- Tries enable efficient prefix search in time.
- Amortized analysis (aggregate, accounting, potential) bounds total cost of a sequence, not individual operations.
Cross-References
| Topic | Link |
|---|---|
| Algorithm Design | View |
| Graph Algorithms | View |
| Complexity Theory | View |