Algorithms and Data Structures
1. Algorithm Analysis
1.1 Asymptotic Notation
Definition. if there exist constants and such that for all .
Definition. if there exist constants and such that for all .
Definition. if and .
Definition. if .
Definition. if .
Theorem 1.1. if and only if .
Proof. Suppose . Then there exist such that for all Hence for all So . The converse follows by symmetry.
Theorem 1.2. if and only if there exist constants and such that for all .
Proof. By definition, means and . The former gives for some And the latter gives for some . Combining yields the stated inequality.
Theorem 1.3 (Limit Rule). If where Then . If Then . If Then .
Proof. If Then for any There exists such that for all So Establishing . If Take ; then for sufficiently large Giving . The case is symmetric.
Proposition 1.4. Asymptotic notation is transitive: if and Then .
Proof. There exist with for And with for . Then for .
Proposition 1.5. -notation is reflexive: for all .
Proof. Take and . Then for all .
Worked Example: Proving $n^2 + 3n + 1 = O(n^2)$
We must find and such that for all .
Note that for all (since implies and ).
So take and .
To show tightness, note for all So .
Worked Example: Proving $2^n \neq O(n^k)$ for any constant $k$
By the limit rule: for any fixed (this follows from repeated application of L”Hôpital’s rule, or from the fact that grows faster than ). Therefore for all And in particular .
Worked Example: Sum of Harmonic Series and Its Use in Analysis
The -th harmonic number is . We show .
Upper bound: .
Lower bound: .
Therefore . This is used extensively in analysing algorithms like binary search on unbounded domains and the expected depth of elements in a random BST.
Worked Example: Analysing a Nested Loop Algorithm
Consider the algorithm:
for i = 1 to n: for j = i to n: constant-time workThe total number of iterations is .
Now consider:
for i = 1 to n: j = 1 while j < n: j = j * 2 constant-time workThe inner loop runs times, and the outer loop runs times. Total: .
Now consider the triple loop:
for i = 1 to n: for j = 1 to i: for k = 1 to j: constant-time workTotal iterations: .
1.2 Common Complexity Classes
| Class | Name | Example |
|---|---|---|
| Constant | Array access by index | |
| Logarithmic | Binary search | |
| Linear | Linear search | |
| Log-linear | Merge sort, heap sort | |
| Quadratic | Bubble sort, insertion sort | |
| Cubic | Naive matrix multiplication | |
| Exponential | Naive Fibonacci, subset problems | |
| Factorial | Generating all permutations |
Proposition 1.6. These classes form a strict hierarchy:
1.3 Recurrences and the Master Theorem
Many divide-and-conquer algorithms yield recurrences of the form:
Where is the number of subproblems, is the factor by which the input is divided, and is the cost of dividing and combining.
Theorem 1.7 (Master Theorem). Let and be constants, let be a function, and let be defined on the nonnegative integers by the recurrence Where we interpret to mean either or . Let . Then:
- If for some Then .
- If for some Then .
- If for some And if for some constant and all sufficiently large (the regularity condition), then .
Proof (Case 1 sketch). The recursion tree has depth . At level There are nodes, each costing . Since The cost at level is . The total cost is dominated by the leaves (level ), which contribute . The internal levels contribute a geometric series with ratio So the leaf level dominates.
Worked Example: Merge Sort Recurrence
Merge sort divides into 2 subproblems of size and combines in time.
Here , So . We have Which is Case 2 with .
Therefore .
Worked Example: Binary Search Recurrence
Here , So . We have . This matches Case 2 with .
Therefore .
Worked Example: Strassen's Matrix Multiplication
Strassen’s algorithm divides into 7 subproblems of size and combines in time.
Here , So . We have with Which is Case 1.
Therefore .
1.4 Amortised Analysis
Amortised analysis gives the average cost per operation over a sequence of operations, even when Individual operations may be expensive.
Methods:
- Aggregate analysis: Compute total cost of operations, divide by .
- Accounting method: Assign an amortised cost to each operation; the sum of amortised costs must be an upper bound on the total actual cost.
- Potential method: Define a potential function ; the amortised cost of the -th operation is .
Theorem 1.8 (Potential Method). If for all Then the total amortised cost is an upper bound on the total actual cost .
Proof. .
Example (Dynamic Array). A dynamic array doubles in size when full. Insertion is Amortised: most insertions cost ; occasional resizing costs But is paid for by the Surplus from previous insertions.
Worked Example: Dynamic Array Amortised Analysis (Accounting Method)
Assign an amortised cost of 3 per insertion. The actual cost of insertion without resizing is 1. We store the surplus of 2 as “credit” on each element in the array.
When the array of capacity is full and we must resize to capacity :
- The resizing cost is (copy all elements).
- We have credits stored (2 per element).
- We spend credits on the copy, leaving credits.
- The new array has elements, so we need more credits, but we already have .
- Subsequent insertions build up credits again.
Since the total credits are always non-negative, the amortised cost of 3 per insertion covers all actual costs.
Worked Example: Dynamic Array Amortised Analysis (Potential Method)
Define the potential function where is the number of elements and is the capacity. We require So (since ).
Case 1: No resize. .
Case 2: Resize from to . .
In both cases, the amortised cost is .
:::caution Common Pitfall Amortised analysis gives a guarantee over a sequence of operations, not a per-operation worst-case bound. A single operation can still be expensive (e.g., a resize in a dynamic array costs ). Amortised bounds are meaningful only when the sequence length is not bounded by a constant. :::
2. Fundamental Data Structures
2.1 Arrays and Linked Lists
Array. Contiguous memory, access by index, insertion/deletion (shift required).
| Operation | Array | Singly Linked List | Doubly Linked List |
|---|---|---|---|
| Access by index | |||
| Search (by value) | |||
| Insert at head | |||
| Insert at tail | * | ||
| Delete at head | |||
| Delete at tail | * | ||
| Insert at position | * | ||
| Space overhead |
*Given a pointer to the node or its predecessor.
Linked List. Each node stores data and a pointer to the next node. insertion/deletion at Head (given pointer), access by position.
Doubly Linked List. Each node has pointers to both next and previous nodes. insertion And deletion at any position (given pointers to the node and its neighbours).
Worked Example: Reversing a Singly Linked List
To reverse a singly linked list in time and space:
prev = nullcurr = headwhile curr != null: next = curr.next curr.next = prev prev = curr curr = nexthead = prevAt each iteration, we move one node to the front of the reversed list. After iterations, all nodes are reversed.
2.2 Stacks and Queues
Stack. Last-in, first-out (LIFO). Operations: push, pop, peek, all .
Queue. First-in, first-out (FIFO). Operations: enqueue, dequeue, peek, all .
Implementation. Both can be implemented with arrays (with circular buffer) or linked lists.
Worked Example: Implementing a Queue with Two Stacks
A queue can be implemented using two stacks in_stack and out_stack.
enqueue(x): push ontoin_stack. .dequeue(): ifout_stackis empty, pop all elements fromin_stackand push ontoout_stack. Then pop fromout_stack.
The dequeue operation is amortised: each element is moved from in_stack to out_stack at most once across all operations. Over operations, the total number of element moves is at most Giving amortised per dequeue.
2.3 Hash Tables
A hash table maps keys to values using a hash function .
Collision resolution:
- Chaining: Each bucket is a linked list. Average case: where is the load factor.
- Open addressing (linear probing): Insert into the next available slot. Average case: .
Theorem 2.1 (Uniform Hashing). Under the assumption of simple uniform hashing, the expected Length of a chain in a hash table with chaining is .
Proof. Under simple uniform hashing, each of the keys is equally likely to hash to any of the slots. For any given key The expected number of other keys that hash to the same slot as is . Including itself, the expected chain length is .
Theorem 2.2 (Successful Search with Chaining). Under simple uniform hashing, the expected time for a successful search in a hash table with chaining is .
Proof. The expected length of the list containing the searched element is Since other elements are distributed uniformly. On average, the searched element is halfway through its list, giving comparisons (we examine elements before the target plus the target itself). This equals .
Double Hashing. Uses two hash functions: Where is relatively prime to . This avoids the clustering problems of linear probing.
Worked Example: Choosing a Good Hash Function
For integer keys, a common choice is the multiplication method: where (the inverse golden ratio).
For string keys, the polynomial rolling hash: where is a prime (e.g., 31 or 127).
Both avoid clustering better than simple modulo hashing when the input distribution is adversarial.
2.4 Trees
Binary Search Tree (BST). For each node: left subtree values node value, right subtree values node value.
- Search, insert, delete: where is the height.
- For a balanced BST, .
Theorem 2.3. The expected height of a randomly built BST with distinct keys is .
Proof. Let be the height of a BST built from random keys. Let . Then . Using the indicator random variable technique, for some constant . Taking logs gives by Jensen’s inequality.
2.4.1 AVL Trees
Definition. An AVL tree is a BST where for every node, the heights of its left and right subtrees differ by at most 1. The balance factor of a node is the height of its left subtree minus the height of its right subtree; valid balance factors are .
Rotations. After insertion or deletion, the balance factor may become . This is fixed by rotations:
Right rotation (LL case): Balance factor at node Balance factor at left child .
A B/ \ / \B C → D A/ \ / \D E E CLeft rotation (RR case): Mirror of right rotation.
Left-Right rotation (LR case): Balance factor at Balance factor at left child . First left-rotate Then right-rotate .
Right-Left rotation (RL case): Mirror of LR case.
Theorem 2.4. An AVL tree with nodes has height .
Proof. Let be the minimum number of nodes in an AVL tree of height . We have , And for . This is the Fibonacci recurrence, giving . Using where We get So .
Corollary 2.5. All AVL tree operations (search, insert, delete) run in time.
Proof. Each operation visits nodes along a root-to-leaf path, and each rotation is . Since at most rotations are needed (at most 2 per level), the total cost is .
2.4.2 Red-Black Trees
Definition. A red-black tree is a BST satisfying:
- Every node is either red or black.
- The root is black.
- Every leaf (NIL sentinel) is black.
- If a node is red, both its children are black.
- For every node, all simple paths from the node to descendant leaves contain the same number of black nodes (the black-height).
Theorem 2.6. A red-black tree with internal nodes has height .
Proof. Let be the root. By property 5, at least half the nodes on any root-to-leaf path are black (by property 4, no two reds are adjacent), so the black-height . Let be the minimum number of internal nodes in a subtree of height and black-height . Then (proved by induction on Using the fact that each child has black-height at least and the root is black). Since and We get .
Insertion. Insert as in a standard BST (colour the new node red), then fix violations by recolouring and rotating up to times.
Deletion. More complex than insertion. When a black node is removed, we may need to fix the black-height property by recolouring and rotating. The fix-up procedure takes time.
2.4.3 B-Trees
Definition. A B-tree of minimum degree is a rooted tree satisfying:
- Every node has at most keys.
- Every non-root node has at least keys (hence at least children).
- The root has at least 1 key.
- All leaves appear at the same depth.
Theorem 2.7. A B-tree with keys and minimum degree has height .
Proof. The root has at least 1 key and 2 children. At depth 1, each node has at least keys and children. At depth The number of nodes is at least Each with at least keys. So the total number of keys is at least . Setting gives .
Worked Example: Red-Black Tree Insertion Trace
Insert keys 7, 3, 18, 10, 22, 8, 11, 26 into an initially empty red-black tree.
Insert 7: Root is coloured black.
Insert 3: Left child of 7, coloured red. . Valid (root is black, no red-red violation).
Insert 18: Right child of 7, coloured red. . Valid.
Insert 10: Insert as left child of 18, coloured red. Now 18 has red child 10. Uncle of 10 is 3(R) — Case 1 (uncle is red): recolour uncle 3 black, parent 18 black, grandparent 7 red. Grandparent 7 is root, so recolour 7 black.
Result: with under .
Insert 22: Right child of 18, coloured red. Uncle of 22 is 10(R) — Case 1 again: recolour 10 black, 22 black, 18 red. Parent 18 is not root, grandparent is 7(B). Now 7 has right child 18(R). Check 7’s left child: 3(B). No violation.
Insert 8: Left child of 10, coloured red. Uncle of 8 is 22(R) — Case 1: recolour 8 black, 22 black, 10 red. Now 10(R) is left child of 18(B). Check grandparent 7: right child 18(B) with left child 10(R). No red-red violation.
Insert 11: Right child of 10, coloured red. Now 10(R) has both children red — wait, 10 is red and 11 is red: violation. Uncle of 11 is 8(R) — Case 1: recolour 8 black, 11 black, 10 red. Grandparent 18(B): 10(R) is left child, 22(B) is right child. No violation.
Insert 26: Right child of 22, coloured red. Uncle of 26 is 10(R) — Case 1: recolour 10 black, 26 black, 22 red. Now 22(R) is right child of 18(B). Check left child of 18: 10(B). Valid.
2.5 Skip Lists
A skip list is a probabilistic data structure that provides expected-time search, insertion, and deletion. It consists of multiple levels of linked lists.
Structure:
- Level 0 is a standard linked list containing all elements in sorted order.
- Each element is promoted to level 1 with probability ( ).
- Each level- element is promoted to level independently with probability .
- A sentinel head node links to the first element at every level.
Theorem 2.8. The expected number of levels in a skip list with elements is .
Proof. An element reaches level with probability . The expected number of elements at level is . The highest non-empty level is approximately .
Theorem 2.9. Search, insert, and delete in a skip list take expected time.
Proof. The search path at each level moves right to find an element whose successor at level is greater than the target, then drops to level . At each level, the expected number of rightward steps is at most (a geometric distribution argument). With levels, the total expected work is .
Worked Example: Skip List Search
Search for key 25 in a skip list containing keys with the following level structure (probability ):
Level 3: [HEAD] -------------------------> [31] ---------->Level 2: [HEAD] ---------> [12] -------------------------->Level 1: [HEAD] -> [7] -> [12] -> [19] -> [25] -> [31] -> [42]Level 0: [HEAD] -> [7] -> [12] -> [19] -> [25] -> [31] -> [42]Search starts at HEAD, level 3:
- Move right to 31. Drop to level 2.
- At level 2, move right to 12. Move right — next is NIL. Drop to level 1.
- At level 1, move right to 19. Move right to 25. . Found!
The search examined keys 31, 12, 19, 25 — 4 comparisons across 3 levels.
Worked Example: Skip List Insertion
Insert key 15 into the skip list above.
- Search for insertion position: traverse levels 3, 2, 1 to find that 15 goes between 12 and 19.
- Insert 15 at level 0 (always). Randomly promote: suppose coin flip says promote to level 1. Insert at level 1. Flip again: promote to level 2. Insert at level 2. Flip again: don’t promote. Stop.
Level 3: [HEAD] -------------------------> [31] ---------->Level 2: [HEAD] ---------> [12] -> [15] ------------------>Level 1: [HEAD] -> [7] -> [12] -> [15] -> [19] -> [25] -> [31] -> [42]Level 0: [HEAD] -> [7] -> [12] -> [15] -> [19] -> [25] -> [31] -> [42]Insertion takes expected time (search path + insertions at promoted levels).
2.6 Bloom Filters
A Bloom filter is a space-efficient probabilistic data structure for set membership testing. It may produce false positives but never false negatives.
Structure: A bit array of bits (initially all 0) and independent hash functions .
Operations:
- Insert : set bits to 1.
- Query : return true if all are set to 1.
Theorem 2.10. After inserting elements into a Bloom filter of size with hash functions, the probability of a false positive is approximately .
Proof. After inserting elements, each of the bits is set to 0 with probability . The probability that all bits for a query element are set is .
Optimal number of hash functions. The false positive rate is minimised when Giving a minimum rate of approximately .
Worked Example: Bloom Filter Configuration
We want to store URLs with a false positive rate of at most 1%.
From the formula: .
Optimal . At the optimal point, the false positive rate is .
We need So .
Take bits KB. Then So use hash functions.
This requires only 11.5 KB of memory versus approximately 600 KB for storing the URLs as 60-byte strings.
Worked Example: Bloom Filter Operations Trace
A Bloom filter has bits, hash functions: , .
Insert 15: , . Set bit 5. Bit array: 0000010000
Insert 22: , . Set bits 2, 6. Bit array: 0010010100
Insert 37: , . Set bits 1, 7. Bit array: 0110010110
Query 22: (set), (set). Result: possibly in set (true positive).
Query 42: (set), (set). Result: possibly in set (false positive!).
Query 50: (not set). Result: definitely not in set (true negative).
2.7 Heaps
A binary heap is a complete binary tree satisfying the heap property:
- Max-heap: parent children.
- Min-heap: parent children.
Implemented as an array: parent of node is at ; children at and .
Operations: insert Extract-max/min Peek .
Theorem 2.11 (Build-Heap). buildHeap on an array of elements runs in time.
Proof. buildHeap calls heapify on each of the non-leaf nodes. The cost of heapify at a node of height is . The total cost is Since .
Worked Example: Build-Heap Step by Step
Build a max-heap from the array .
The array represents the tree:
4 / \ 10 3 / \ / \ 5 1 8 2Non-leaf nodes (indices down to 0): nodes 3, 10, 4.
Heapify index 2 (value 3): Children are 8 and 2. Swap 3 with 8. Array: .
4 / \ 10 8 / \ / \ 5 1 3 2Heapify index 1 (value 10): Children are 5 and 1. 10 is largest. No swap.
Heapify index 0 (value 4): Children are 10 and 8. Swap with 10. Array: .
10 / \ 4 8 / \ / \ 5 1 3 2Now heapify index 1 (value 4): Children are 5 and 1. Swap with 5. Array: .
10 / \ 5 8 / \ / \ 4 1 3 2This is a valid max-heap. Total operations: 3 swaps = .
2.8 Union-Find (Disjoint Set Union)
The Union-Find data structure maintains a partition of a set into disjoint subsets, supporting:
- MakeSet(): Create a singleton set .
- Find(): Return the representative of the set containing .
- Union(): Merge the sets containing and .
Implementation with path compression and union by rank:
- Each set is represented as a rooted tree.
Findfollows parent pointers and compresses the path (makes every node on the path point directly to the root).Unionattaches the shorter tree under the root of the taller tree (by rank/size).
Theorem 2.12. With both path compression and union by rank, the amortised time per operation is Where is the inverse Ackermann function.
Proof (outline). The inverse Ackermann function grows so slowly that for all practical values of (). The proof uses a potential function argument on the levels of the tree. Each node is assigned a level based on its rank. The key insight is that path compression prevents the number of nodes at high levels from growing. The total number of Find operations that can charge to any level- node is bounded by where is a fast-growing function. Summing over all levels gives total, hence amortised per operation.
Worked Example: Union-Find for Connected Components
Given an undirected graph with vertices and edges, determine the connected components.
for each vertex v: MakeSet(v)for each edge (u, v): if Find(u) != Find(v): Union(u, v)After processing all edges, vertices in the same connected component have the same representative. Total time: .
2.9 Graphs
A graph can be represented by:
- Adjacency matrix: if . Space: . Edge lookup: .
- Adjacency list: For each vertex, store a list of neighbours. Space: . Iterating over neighbours: .
| Operation | Adjacency Matrix | Adjacency List |
|---|---|---|
| Space | ||
| Check edge | ||
| Iterate neighbours | ||
| Add edge | ||
| Remove edge |
:::caution Common Pitfall Choosing the wrong graph representation can make an algorithm asymptotically slower. Use adjacency matrices for dense graphs () and adjacency lists for sparse graphs (). For example, BFS with an adjacency matrix takes but with adjacency lists takes . :::
3. Sorting Algorithms
3.1 Merge Sort
Algorithm. Divide the array in half, recursively sort each half, then merge.
MergeSort(A, l, r): if l < r: m = (l + r) / 2 MergeSort(A, l, m) MergeSort(A, m+1, r) Merge(A, l, m, r)Theorem 3.1. Merge sort runs in time in all cases (best, average, worst).
Proof. The recurrence is . By the Master theorem (case 2): , , So .
Theorem 3.2. Merge sort is stable and requires auxiliary space.
Proof of stability. In MergeWhen elements from the left and right halves are equal, we always take from the left half first. This preserves the relative order of equal elements.
Theorem 3.3 (Correctness of Merge). The Merge procedure produces a sorted array from two sorted subarrays.
Proof. Let and be sorted subarrays. The merge procedure maintains two indices and pointing to the next unmerged element in each subarray. At each step, the minimum of and is appended to the output. By induction on the number of elements placed: if elements have been placed and they are the smallest among the remaining elements, then the -th element placed is the minimum of the remaining elements. Since each subarray is sorted, the smallest remaining element is .
Theorem 3.4 (Optimality of Merge Sort). Merge sort achieves the optimal comparison-based sorting bound.
Proof. By Theorem 3.7, no comparison sort can do better than . Merge sort achieves in all cases, so it is asymptotically optimal among comparison sorts.
Worked Example: Merge Sort Trace
Sort the array :
Split: and
Split: and Split: and → Merge: Split: and → Merge: Merge:
Split: and Split: and → Merge: Merge:
Final merge:
Total comparisons: 13.
3.2 Quicksort
Algorithm. Choose a pivot, partition the array into elements pivot and pivot, recursively Sort each partition.
QuickSort(A, lo, hi): if lo < hi: p = Partition(A, lo, hi) QuickSort(A, lo, p - 1) QuickSort(A, p + 1, hi)Partition (Lomuto): Select the last element as pivot. Iterate through the array, maintaining that elements are pivot and are pivot.
Theorem 3.5. Quicksort runs in expected time and worst-case time.
Proof (expected case). If the pivot is chosen uniformly at random, the expected number of Comparisons is by an indicator random variable argument.
Let be the indicator random variable that and are compared, where are the sorted elements. Since elements are compared only if one is an ancestor of the other in the recursion tree, and the pivot is chosen uniformly at random:
The total number of comparisons is So:
Worst case occurs when the pivot is always the smallest or largest element (e.g., already sorted Array with first-element pivot): .
Theorem 3.6. Quicksort is not stable but uses expected stack space.
Worked Example: Quicksort with Median-of-Three Pivot
Sort using median-of-three (first, middle, last element).
Pivot selection: elements at positions 0, 4, 8 are . Median is .
Partition around :
Recurse on (pivot ): Recurse on (median of is ):
Recurse on (pivot ): Recurse on (pivot ):
Result:
3.3 Heapsort
Algorithm. Build a max-heap in time, then repeatedly extract the maximum.
Theorem 3.7. Heapsort runs in time in all cases, uses auxiliary space, and Is not stable.
Proof. Building the heap takes by Theorem 2.11. Each of the extract-max operations takes For a total of . Space is since the heap is stored in-place.
3.4 Lower Bound on Comparison Sorting
Theorem 3.8. Any comparison-based sorting algorithm requires comparisons in the Worst case.
Proof. A comparison-based sort can be modelled as a decision tree. The tree must have at least Leaves (one for each permutation). A binary tree with leaves has height at least .
More precisely, using Stirling’s approximation: .
3.5 Counting Sort
Counting sort sorts integers in time where is the range of values.
Algorithm:
- Count occurrences: number of elements equal to .
- Compute prefix sums: number of elements .
- Place each element in its correct position (iterating backwards for stability).
Theorem 3.9. Counting sort runs in time and space, and is stable.
Proof. Step 1 takes time. Step 2 takes time. Step 3 takes time. The output array uses space and the count array uses space. Stability follows from iterating backwards in step 3: the last occurrence of each value is placed first, preserving the order of equal elements.
3.6 Radix Sort
Radix sort processes digits from least significant to most significant (LSD radix sort) using a stable sort (e.g., counting sort) as a subroutine.
Theorem 3.10. LSD radix sort sorts integers with digits in base in time.
Proof. We perform passes of counting sort, each taking time. After the -th pass, the array is sorted by the least significant digits. By induction on After passes the array is fully sorted.
Corollary 3.11. For -digit integers where (e.g., 32-bit integers), radix sort runs in time.
Worked Example: Counting Sort Trace
Sort the array using counting sort.
Range of values: So .
Step 1 — Count: (indices 1 through 8).
Step 2 — Prefix sums: .
Step 3 — Place (iterate backwards):
- : Place at position 0. .
- : Place at position 4. .
- : Place at position 3. .
- : Place at position 5. .
- : Place at position 2. .
- : Place at position 1. .
- : Place at position 5. .
Result: .
Worked Example: Radix Sort Trace
Sort the array using LSD radix sort with base 10.
Pass 1 (ones digit): Sort by . After stable counting sort:
Pass 2 (tens digit): Sort by . After stable counting sort:
Pass 3 (hundreds digit): Sort by . After stable counting sort:
Total: 3 passes, each . Total: .
3.7 External Sorting
Problem. Sort data that is too large to fit in main memory.
Algorithm (External Merge Sort):
- Read chunks that fit in memory, sort each in memory, write sorted runs to disk.
- Repeatedly merge runs using a -way merge, reading/writing from disk.
Theorem 3.12. External merge sort with bytes of memory and bytes of data performs passes, where is the merge fan-in.
Proof. The initial pass creates sorted runs of size . Each merge pass combines runs into 1, reducing the number of runs by a factor of . After passes, a single sorted run remains. Each pass reads and writes all bytes, so total I/O is .
:::caution Common Pitfall The lower bound applies only to comparison-based sorting. Non-comparison sorts Like radix sort can achieve time for integers in a bounded range. However, non-comparison sorts sacrifice generality: they depend on the structure of the keys and cannot sort arbitrary objects. :::
3.8 Comparison of Sorting Algorithms
| Algorithm | Best | Average | Worst | Space | Stable |
|---|---|---|---|---|---|
| Merge Sort | Yes | ||||
| Quicksort | No | ||||
| Heapsort | No | ||||
| Counting Sort | Yes | ||||
| Radix Sort | Yes |
4. Graph Algorithms
4.1 Breadth-First Search (BFS)
BFS explores the graph level by level from a source vertex.
BFS(G, s): for each v in V: d[v] = INF pi[v] = NIL d[s] = 0 Q = {s} while Q is not empty: u = Q.dequeue() for each v in G.adj[u]: if d[v] == INF: d[v] = d[u] + 1 pi[v] = u Q.enqueue(v)Theorem 4.1. BFS runs in time and discovers shortest paths (in terms of number of Edges) from the source to all reachable vertices.
Proof. Time: Each vertex is enqueued at most once and dequeued at most once: . Each edge is examined at most twice (once from each endpoint): . Total: .
Correctness: We prove by induction on the length of shortest paths. Let be a vertex discovered via edge . At the time is discovered, equals the shortest-path distance from to (induction hypothesis). Since BFS processes vertices in non-decreasing order of distance, is the shortest distance to . Any path to must go through some vertex at distance at most (the predecessor on the path), and all such vertices have already been processed.
Worked Example: BFS on a Weighted Path Problem
Find the shortest path (fewest edges) from to all other vertices in the graph with edges: .
Starting from :
- Level 0: (distance 0)
- Level 1: (distance 1)
- Level 2: (distance 2)
- Level 3: (distance 3)
Shortest path from to : (or or ), all of length 3.
4.2 Depth-First Search (DFS)
DFS explores as far as possible along each branch before backtracking.
DFS(G): for each v in V: colour[v] = WHITE pi[v] = NIL time = 0 for each v in V: if colour[v] == WHITE: DFS-Visit(G, v)
DFS-Visit(G, u): colour[u] = GREY time += 1 d[u] = time for each v in G.adj[u]: if colour[v] == WHITE: pi[v] = u DFS-Visit(G, v) colour[u] = BLACK time += 1 f[u] = timeTheorem 4.2. DFS runs in time and can be used to detect cycles, find connected Components, compute topological orderings, and identify strongly connected components.
Proof (time). Each vertex is coloured exactly once (from WHITE to GREY) and finished exactly once (from GREY to BLACK): . Each edge is examined at most twice (once in each direction for undirected, once for directed): . Total: .
Theorem 4.3 (Parenthesis Theorem). In any DFS, for any two vertices and Exactly one of the following holds: (1) (interval nesting), (2) (interval nesting), or (3) the intervals and are disjoint.
Proof. The DFS call stack forms a nesting of intervals. When we start visiting from We must finish before finishing Giving nesting. If and are in different DFS trees, their intervals are disjoint.
Theorem 4.4 (White-Path Theorem). is a descendant of in the DFS forest if and only if, at the time is discovered, there is a path from to consisting entirely of white vertices.
Proof. () By induction on the depth of in the DFS tree. If is a child of Then was white when discovered from . If is a deeper descendant, the path goes through intermediate white vertices.
() Suppose there is a white path from to at time . Let be the first vertex on this path discovered after . All vertices before on the path are still white (they can only be discovered after ), so will be discovered from the path. By induction, is a descendant of Hence of .
4.3 Topological Sort
A topological ordering of a DAG is a linear ordering of vertices such that for every directed edge , appears before .
Algorithm: Run DFS on the DAG. Output vertices in reverse order of finishing times.
Theorem 4.5. The reverse post-order of a DFS on a DAG produces a valid topological ordering.
Proof. Suppose there is an edge but appears after in the ordering (i.e., ). Since is an edge, when is being explored (coloured GREY), if is WHITE, then is discovered as a descendant of So Contradiction. If is GREY, we have a back edge, implying a cycle, contradicting that the graph is acyclic. If is BLACK, then Contradicting .
4.4 Strongly Connected Components
Two vertices and are strongly connected if there is a path from to and from to . A strongly connected component (SCC) is a maximal set of strongly connected vertices.
Kosaraju’s Algorithm:
- Run DFS on Recording finishing times.
- Compute (transpose of : reverse all edges).
- Run DFS on in decreasing order of finishing times. Each DFS tree is an SCC.
Theorem 4.6. Kosaraju’s algorithm correctly identifies all SCCs in time.
Proof. Let be the SCC containing the vertex with the highest finishing time in the first DFS. We claim that in No vertex in can reach a vertex outside (otherwise would have a path to and from that vertex, placing it in ). In the second DFS, starting from in We discover exactly the vertices in . Removing and repeating the argument gives the remaining SCCs.
Tarjan’s Algorithm finds SCCs in a single DFS pass using a stack and low-link values, also in time.
4.5 Dijkstra’s Algorithm
Problem. Find shortest paths from a single source in a weighted graph with non-negative edge Weights.
Algorithm. Maintain a priority queue of vertices keyed by their current shortest-path estimate. Repeatedly extract the vertex with minimum distance and relax its edges.
Dijkstra(G, w, s): for each v in V: d[v] = INF pi[v] = NIL d[s] = 0 S = {} // processed vertices Q = V // priority queue keyed by d[] while Q is not empty: u = ExtractMin(Q) S = S ∪ {u} for each v in G.adj[u]: if d[v] > d[u] + w(u, v): d[v] = d[u] + w(u, v) pi[v] = u DecreaseKey(Q, v, d[v])Theorem 4.7. Dijkstra’s algorithm correctly computes shortest paths from the source.
Proof. We prove by induction on that when a vertex is added to , (the true shortest-path distance).
Base case: is the first vertex added, and .
Inductive step: Suppose all vertices in have correct distances. Let be the next vertex extracted from . Suppose for contradiction that . Consider a shortest path from to And let be the first edge on where and . Then (by induction). When was added to The edge was relaxed, so . Since edge weights are non-negative, . But And is in Contradicting that has the minimum -value in .
Theorem 4.8. Dijkstra’s algorithm with a binary heap runs in time. With a Fibonacci heap: .
Proof. Each vertex is extracted from the priority queue at most once: . Each edge is Relaxed at most once, and each relaxation may cause a DecreaseKey: . Total: . With a Fibonacci heap, DecreaseKey is amortised, giving .
Worked Example: Dijkstra's Algorithm
Find shortest paths from in the graph:
- ,
- ,
- ,
Initial: ,
Extract : relax , .
Extract : relax , .
Extract : relax , .
Extract : relax .
Extract : no improvements.
Result: , , , ,
4.6 Bellman-Ford Algorithm
Problem. Find shortest paths from a single source, allowing negative edge weights (but no negative Cycles).
BellmanFord(G, w, s): for each v in V: d[v] = INF pi[v] = NIL d[s] = 0 for i = 1 to |V| - 1: for each edge (u, v) in E: if d[v] > d[u] + w(u, v): d[v] = d[u] + w(u, v) pi[v] = u for each edge (u, v) in E: if d[v] > d[u] + w(u, v): return "negative cycle detected"Theorem 4.9. Bellman-Ford runs in time. It correctly detects negative-weight cycles.
Proof. After iterations of the relaxation loop, is at most the weight of the shortest path from to using at most edges. This is proved by induction on .
Base case (): Only is finite, which is the weight of the empty path.
Inductive step: A shortest path from to using at most edges either uses at most edges (handled by induction) or uses exactly edges, ending with edge where the shortest path to uses at most edges. The relaxation of in iteration handles this case.
After iterations, all shortest paths (which have at most edges) are correct. A -th iteration that updates any distance indicates a path of length with strictly decreasing weight, which contains a cycle of negative weight.
Worked Example: Bellman-Ford with Negative Edge Weights
Find shortest paths from in the graph:
- ,
- , ,
- ,
- ,
Initial: for . .
Iteration 1: Relax all edges.
- : . : .
- : . : . : .
- : . : .
- : . : .
- : . After iter 1: .
Iteration 2: Relax all edges.
- : . : .
- : . : .
- : . No changes: .
Iteration 3—5: No changes. Algorithm terminates.
Negative cycle check: No edge can be relaxed. No negative cycle.
Result: . Shortest path to : Cost .
Worked Example: Bellman-Ford Negative Cycle Detection
Graph: , , . This has a cycle of weight . Not negative.
Now add . Cycle weight: . Negative cycle.
Initial: Rest .
Iteration 1: , , .
Iteration 2: , , .
Iteration 3: , , .
Iteration 4: , , .
Iteration 5: , , .
Check (iteration 6): : Can still relax. Negative cycle detected!
4.7 Floyd-Warshall Algorithm
Problem. Find all-pairs shortest paths.
Algorithm. For : for each pair Check if going through vertex Improves the path.
Derivation. Define as the shortest-path distance from to using only intermediate vertices from . Then:
- (the weight of edge Or if no edge).
- For : The shortest path from to through vertices either does not use vertex (giving ) or uses vertex (giving ).
Theorem 4.10. Floyd-Warshall runs in time and space.
Proof. The triple nested loop (, , ) executes iterations, each doing work. The distance matrix requires space. Note that can overwrite in place because and (paths from to and to using vertices up to cannot be improved by going through again without a negative cycle).
Worked Example: Floyd-Warshall on 4 Vertices
Find all-pairs shortest paths for the graph with vertices and edges: , , , , , , .
Initial distance matrix :
(through vertex 1): . . . . . .
(through vertex 2): . . . . . .
(through vertex 3): . . . . . .
(through vertex 4): . . . . . .
4.8 Minimum Spanning Tree
Kruskal’s Algorithm. Sort edges by weight, add edges to the MST as long as they don’t create a Cycle (using Union-Find). .
Prim’s Algorithm. Start from any vertex, repeatedly add the minimum-weight edge connecting the Tree to a non-tree vertex (using a priority queue). .
Theorem 4.11 (Cut Property). For any cut of the graph, the minimum-weight edge crossing the cut Belongs to some MST.
Proof. Let be a cut and be the minimum-weight crossing edge with , . Let be an MST. If We are done. Otherwise, adding to creates a cycle. This cycle must cross the cut at least once more (it goes from to via some other path). Let be another crossing edge on this cycle. Since is the minimum-weight crossing edge, . Replacing with in gives a spanning tree of weight no greater than Hence an MST containing .
Theorem 4.12 (Cycle Property). For any cycle, the maximum-weight edge on the cycle does not belong To any MST.
Proof. Let be a cycle and be the maximum-weight edge on . Let be an MST. If We are done. Otherwise, removing from disconnects it into two components. The rest of cycle must contain an edge crossing this cut. Since (if We can replace either), replacing with gives a spanning tree of strictly smaller weight, contradicting the optimality of .
Theorem 4.13. Kruskal’s algorithm produces a minimum spanning tree.
Proof. We prove by induction on the number of edges added. At each step, the algorithm adds the minimum-weight edge that does not create a cycle. Consider the cut formed by the two connected components that the new edge connects. By the cut property, this minimum-weight crossing edge belongs to some MST. Since all previously added edges are in every MST (by induction), the edge set maintained by the algorithm is always a subset of some MST.
Theorem 4.14. Prim’s algorithm produces a minimum spanning tree.
Proof. By induction on the size of the tree. At each step, Prim’s adds the minimum-weight edge connecting the current tree to a vertex outside . Consider the cut . By the cut property, the minimum-weight crossing edge belongs to some MST.
Worked Example: Kruskal's Algorithm
Find the MST of the graph with edges (sorted by weight): , , , , , , , .
Vertices: .
Sorted edges: , , , , , , , .
Process each edge:
- : Add. Forest: . Cost: 2.
- : Add. Forest: . Cost: 5.
- : Add. Forest: , . Cost: 9.
- : Add (connects two components). Forest: . Cost: 14.
- : Skip (creates cycle ).
- : Skip (creates cycle).
- : Skip (creates cycle).
- : Skip (creates cycle).
MST: , , , . Total weight: 14.
Worked Example: Prim's Algorithm
Find the MST of the same graph starting from vertex .
Edges from : , , . Minimum: . Tree: . Cost: 4.
Edges crossing cut: , , , . Minimum: . Tree: . Cost: 9.
Edges crossing cut: , , , , . Minimum: . Tree: . Cost: 12.
Edges crossing cut: , , . Minimum: . Tree: . Cost: 14.
MST: , , , . Total weight: 14 (same as Kruskal’s).
5. Dynamic Programming
5.1 Principles
Dynamic programming (DP) solves problems by:
- Overlapping subproblems: The same subproblems are solved repeatedly.
- Optimal substructure: The optimal solution contains optimal solutions to subproblems.
Approaches:
- Top-down (memoisation): Recursive with caching.
- Bottom-up (tabulation): Fill a table iteratively from small subproblems to large.
5.2 Memoisation vs. Tabulation
| Aspect | Memoisation (Top-Down) | Tabulation (Bottom-Up) |
|---|---|---|
| Approach | Recursive with cache | Iterative table fill |
| Order | Natural recursion order | Dependency order |
| Space | stack + table | table only |
| Overhead | Function call overhead | Minimal |
| Subproblems | Computes only needed | Computes all |
| Best for | When not all subproblems needed | When all needed |
When to use which:
- Use memoisation when the subproblem space is sparse (not all subproblems are needed).
- Use tabulation when most subproblems are needed (avoids recursion overhead and stack overflow).
- Both achieve the same asymptotic time complexity.
5.3 Optimal Substructure Proof Technique
To prove that a problem has optimal substructure:
- Show that an optimal solution to the problem includes an optimal solution to a subproblem.
- Proved by contradiction: if the optimal solution contained a suboptimal sub-solution, replacing it with an optimal one would improve the overall solution.
Example (Shortest Path). If is a shortest path from to and is an intermediate vertex on Then the subpath of from to is a shortest path from to .
Proof. If not, there exists a shorter path from to . Then concatenated with the subpath of from to would be shorter than Contradicting that is a shortest path.
:::caution Common Pitfall Not all problems have optimal substructure. For example, the longest simple path problem does not: the longest simple path from to may not contain the longest simple path from to an intermediate vertex Because the subpath might share vertices with the rest of the path, creating a non-simple path. :::
5.4 Common Patterns
1D DP. depends on for . Example: Fibonacci, longest increasing subsequence.
2D DP. depends on for in some set. Example: edit distance, Matrix chain multiplication, longest common subsequence.
Interval DP. depends on where . Example: Optimal BST, matrix chain multiplication.
5.5 Worked Example: 0/1 Knapsack
Problem. Given items with weights and values And a knapsack of capacity Maximise the total value without exceeding the capacity.
Recurrence:
Time: . Space: (can be reduced to with 1D array).
Proof of correctness. For each item Either we don’t include it (value ) or we include it (value ). The optimal choice is the maximum. The base cases are correct.
Worked Example: 0/1 Knapsack
Items: Capacity .
Building the DP table (items as rows, capacities 0-7 as columns):
c: 0 1 2 3 4 5 6 7i=0: 0 0 0 0 0 0 0 0i=1: 0 1 1 1 1 1 1 1i=2: 0 1 1 4 5 5 5 5i=3: 0 1 1 4 5 6 6 9i=4: 0 1 1 4 5 7 8 9Maximum value: (items 2 and 4: , — let me recalculate).
Correct: items 2 and 3 (, ), or items 1, 2, 4 (Not valid). Items 1, 3 (, ), items 2, 4 (). Optimal: items 2 and 3 (, ).
5.6 Worked Example: Edit Distance (Levenshtein Distance)
Problem. Given strings of length and of length Find the minimum number of insertions, deletions, and substitutions to transform into .
Recurrence:
Where the three cases in the minimum are: delete from Insert into Substitute in .
Time: . Space: (can be reduced to ).
Worked Example: Edit Distance
Compute the edit distance between “kitten” and “sitting”.
Building the DP table:
"" s i t t i n g"" 0 1 2 3 4 5 6 7k 1 1 2 3 4 5 6 7i 2 2 1 2 3 4 5 6t 3 3 2 1 2 3 4 5t 4 4 3 2 1 2 3 4e 5 5 4 3 2 2 3 4n 6 6 5 4 3 3 2 3Edit distance: .
Transform: kitten → sitten (substitute k→s) → sittin (substitute e→i) → sitting (insert g).
5.7 Worked Example: Matrix Chain Multiplication
Problem. Given matrices where has dimensions Find the parenthesisation that minimises the total number of scalar multiplications.
Recurrence:
Time: . Space: .
Proof of correctness. The optimal parenthesisation of splits at some position : . The cost is the cost of the left subproduct plus the cost of the right subproduct plus the cost of multiplying the resulting matrices ( scalar multiplications). The optimal minimises this total.
Worked Example: Matrix Chain Multiplication
Matrices: (), (), (). Dimensions: .
.
. Split at : .
. Split at : .
: Try : . Try : .
Minimum: Split at : .
5.8 Worked Example: Longest Common Subsequence
Problem. Given sequences and Find the LCS.
Recurrence:
Time: . Space: (can be reduced to for the length only).
Proof of correctness. If Any LCS of and must include So . If The LCS either Excludes or excludes Giving the max of the two subproblems.
5.9 Worked Example: Coin Change
Problem. Given coin denominations and a target amount Find the minimum number of coins needed.
Recurrence:
Time: . Space: .
Proof of correctness. To make change for amount The last coin used must be some . The remaining amount is And the optimal solution for uses coins. Taking the minimum over all valid gives the optimal solution.
Worked Example: Coin Change
Denominations: . Target: .
Bottom-up computation:
- (use pennies)
- (use a nickel)
- (use a dime)
- …
- (use a quarter)
- (use two quarters)
Working backwards: .
Solution: 2 quarters + 1 dime + 3 pennies = . 6 coins.
5.10 Worked Example: Longest Increasing Subsequence
Problem. Given a sequence Find the length of the longest strictly increasing subsequence (not necessarily contiguous).
Recurrence: With if no such exists.
Time: . Space: .
Worked Example: Longest Increasing Subsequence
Find the LIS of .
(just 10) (10, 22) (just 9) (10, 22, 33) (10, 21) (10, 22, 33, 50) (10, 21, 41) (10, 22, 33, 50, 60) (10, 22, 33, 50, 60, 80)
LIS length: 6.
Patience sorting approach (): Maintain piles. For each card, place on the leftmost pile whose top card is the current card, or start a new pile on the right if no such pile exists. The number of piles equals the LIS length.
6. Advanced Topics
6.1 NP-Completeness
6.1.1 P, NP, and NP-Completeness
P: The class of decision problems solvable in polynomial time by a deterministic Turing machine.
NP: The class of decision problems solvable in polynomial time by a non-deterministic Turing Machine. Equivalently, problems whose “yes” instances have polynomial-time verifiable certificates.
NP-hard: A problem is NP-hard if every problem in NP can be reduced to in polynomial Time.
NP-complete: A problem is NP-complete if it is in NP and NP-hard.
Theorem 6.1. If any NP-complete problem is in P, then P = NP.
6.1.2 Polynomial-Time Reductions
A polynomial-time reduction from problem to problem is a polynomial-time algorithm that Transforms instances of into instances of Preserving the answer.
Lemma 6.1. If and Then .
Lemma 6.2. If and is NP-hard, then is NP-hard.
6.1.3 The Cook-Levin Theorem
Theorem 6.2 (Cook-Levin, 1971). SAT is NP-complete.
Proof sketch. We show that every problem in NP reduces to SAT. Let . There exists a polynomial-time non-deterministic Turing machine that decides Running in time on inputs of length . For an input We construct a Boolean formula that is satisfiable if and only if accepts .
The formula encodes:
- Tableau variables: iff cell of the computation tableau contains symbol . The tableau has rows and columns.
- Initial configuration: Row 0 encodes the initial state of on input .
- Transition constraints: Each window of the tableau corresponds to a valid transition of .
- Accepting configuration: Some cell in the last row contains the accept state.
Each of these constraints can be expressed as a polynomial-size CNF formula. The total formula has size polynomial in and is satisfiable iff accepts .
6.1.4 Classic NP-Complete Problems
SAT. Given a Boolean formula in CNF, is there a satisfying assignment?
3-SAT. SAT restricted to clauses with exactly 3 literals.
Vertex Cover. Given a graph and integer Is there a vertex cover of size ?
Travelling Salesman Problem (decision version). Given a weighted graph and bound Is there a Tour of total weight ?
Subset Sum. Given a set of integers and a target Is there a subset summing to ?
Clique. Given a graph and integer Does contain a clique of size ?
6.1.5 Proof Strategy for NP-Completeness
To prove a problem is NP-complete:
- Show (polynomial-time verifiable certificate).
- Show a known NP-complete problem reduces to : .
Example. 3-SAT Vertex Cover: construct a graph from the 3-SAT formula where each Variable and each clause become vertices, and edges enforce the constraint that a satisfying Assignment corresponds to a vertex cover.
Worked Example: 3-SAT $\leq_p$ Vertex Cover Reduction
Reduce 3-SAT formula to a vertex cover instance.
For each variable Create two vertices and connected by an edge (the “literal edge”).
For each clause Create a triangle of 3 vertices .
For each clause vertex Connect it to the literal vertex corresponding to the -th literal of clause .
Claim: is satisfiable iff the graph has a vertex cover of size where is the number of variables and is the number of clauses.
() If is satisfiable, include in the cover: for each variable, the literal vertex matching the truth assignment (e.g., if , if ). This covers all literal edges ( vertices). For each clause triangle, at least one literal in the clause is true, so the corresponding literal vertex covers one of the three edges from the triangle. Include the other two vertices of the triangle ( vertices total).
() A vertex cover of size must include exactly one endpoint of each literal edge (otherwise the triangle requires 3 vertices). Set each variable according to which literal vertex is in the cover. Each clause triangle has exactly one uncovered vertex, whose edge to a literal vertex is covered by that literal vertex, meaning the clause is satisfied.
The reduction takes polynomial time (number of vertices and edges is polynomial in the formula size).
:::caution Common Pitfall NP-hardness does not mean the problem is unsolvable. It means there is no known polynomial-time Algorithm. Many NP-complete problems have efficient approximation algorithms or can be solved Exactly for practical input sizes using branch-and-bound or SAT solvers.
6.2 Approximation Algorithms
When exact solutions are intractable, we seek approximation algorithms with provable guarantees.
Definition. A -approximation algorithm for a minimisation problem produces a solution of cost at most times the optimal cost. For a maximisation problem, the solution has value at least times the optimal value.
Theorem 6.3. Greedy vertex cover (repeatedly pick an edge, add both endpoints) is a 2-approximation.
Proof. The algorithm selects a set of vertices. Each edge in the matching used by the algorithm contributes 2 vertices to . Let be a maximum matching. Then Since OPT must contain at least one endpoint of every edge in (and is maximum, so the size of any matching). Therefore the approximation ratio is at most 2.
Theorem 6.4 (Metric TSP). The Christofides algorithm is a -approximation for TSP with the triangle inequality.
Proof. The algorithm computes an MST (), finds a minimum-weight perfect matching on the odd-degree vertices of the MST (), and combines them into an Eulerian tour which is shortcut to a Hamiltonian cycle. The total weight is at most .
Theorem 6.5 (Inapproximability). Unless P = NP, TSP (general, without triangle inequality) has no polynomial-time approximation algorithm with any constant ratio.
Proof sketch. If a -approximation existed for TSP, we could use it to solve the Hamiltonian cycle problem (which is NP-complete): given a graph Construct a TSP instance with edge weight 1 for existing edges and weight for non-edges. If the approximation returns a tour of weight Then has a Hamiltonian cycle. Otherwise, the tour weight is at least So the approximation ratio would exceed Contradiction.
Theorem 6.6 (SET COVER). The greedy algorithm for SET COVER is a -approximation, where is the size of the universe.
Proof sketch. At each step, the greedy algorithm picks the set covering the most uncovered elements. Let be the cost of the -th set picked, and let be the number of newly covered elements. Then (otherwise OPT could not cover the remaining elements at lower cost). Summing gives the bound.
Worked Example: Greedy Set Cover
Universe . Sets: , , , , . All sets have equal cost 1.
Greedy:
- Pick (covers 3 elements, tied with ). Covered: .
- Remaining: . covers (2 new), covers (2 new), covers (2 new). Pick . Covered: .
- Remaining: . Pick (or or ). Covered: .
Greedy solution: Size 3. Optimal: or Size 3. Here greedy is optimal, but it is a -approximation.
6.3 Randomised Algorithms
6.3.1 Las Vegas vs. Monte Carlo
Las Vegas algorithms always produce the correct answer but have randomised running time.
Monte Carlo algorithms always run in polynomial time but may produce incorrect answers with some probability.
| Property | Las Vegas | Monte Carlo |
|---|---|---|
| Correctness | Always correct | Bounded error prob |
| Running time | Randomised | Bounded |
| Example | Randomised Quicksort | Miller-Rabin primality |
Theorem 6.6. A Monte Carlo algorithm with error probability can be amplified to error probability by running it times and taking the majority vote (for decision problems with one-sided error) or the most frequent answer (for two-sided error).
Proof. For one-sided error: . For two-sided error with majority vote: by the Chernoff bound, the probability that the majority is wrong decreases exponentially in .
6.3.2 Randomised Select (Quickselect)
Problem. Find the -th smallest element in an unsorted array.
Algorithm (Randomised Select): Like quicksort, but recurse only into the partition containing the -th element.
Theorem 6.7. Randomised select has expected running time .
Proof. The expected number of comparisons satisfies . This can be shown to satisfy by induction.
Worked Example: Randomised Select
Find the 3rd smallest element in .
Pivot = randomly chosen. Suppose pivot = 5 (index 5).
Partition: .
Pivot 5 is at index 4 (0-indexed). We want rank 3 (0-indexed rank 2). So recurse on left: .
Pivot = randomly chosen. Suppose pivot = 3.
Partition: .
Pivot 3 is at index 2. We want rank 2. So return 3.
The 3rd smallest element is 3.
Worked Example: Miller-Rabin Primality Test
Test whether is prime (it is not; A Carmichael number).
Write So , .
Choose random base .
Compute .
, , , .
. .
and .
Now square: .
Square: .
Square: .
We got 1 on the last squaring, but the previous result was . So is a witness that 561 is composite. Output: COMPOSITE.
The error probability of Miller-Rabin is at most per random base, so iterations give error .
6.3.3 Hashing with Universal Hash Functions
Definition. A family of hash functions from to is universal if for any distinct , .
Theorem 6.8. With a universal hash family and chaining, the expected number of collisions for any element is at most .
Proof. For a fixed element Let be the indicator that where are the other elements. Then by universality. By linearity of expectation, the expected number of collisions is .
6.4 Amortised Analysis (Detailed)
6.4.1 Aggregate Analysis
Example: Multi-pop Stack. A stack supports push () and multi-pop() (pop elements, cost where is the stack size). Although a single multi-pop can cost A sequence of push/multi-pop operations costs total: each element is pushed once and popped at most once.
Theorem 6.9. The amortised cost per operation for a multi-pop stack is .
Proof. In a sequence of operations, each element is pushed at most once and popped at most once. The total cost is at most So the amortised cost per operation is .
6.4.2 Accounting Method
Each operation is charged an amortised cost. The difference between the amortised cost and the actual cost is stored as credit. The credit must always be non-negative.
Example: Binary Counter. A -bit binary counter supports increment (flip bits from the right until a 0 is found).
Assign amortised cost of 2 per increment: 1 to pay for flipping a 0 to 1, 1 stored as credit. When a 1 is flipped back to 0, the credit from when it was set pays for the flip.
Theorem 6.10. The amortised cost per increment of a binary counter is .
Proof. Each bit flips from 0 to 1 at most once between consecutive resets to 0. The credit stored on a 1 bit pays for the subsequent flip back to 0. The total credit is always non-negative.
6.4.3 Potential Method
The potential function maps data structure states to non-negative real numbers. The amortised cost of the -th operation is .
Theorem 6.11. If for all Then .
Worked Example: Binary Counter with Potential Method
Define number of 1-bits in the counter after operations.
For increment : let be the number of trailing 1s flipped. The actual cost is (flipping ones and one zero). The number of 1-bits changes by .
The amortised cost per increment is exactly 2, i.e., .
Worked Example: Splay Tree Amortised Analysis
A splay tree is a BST where every access is followed by a splay operation that moves the accessed node to the root using rotations. Three types of rotations are used: zig (single rotation when parent is root), zig-zig (two rotations in the same direction), and zig-zag (two rotations in alternating directions).
Access Lemma. The amortised cost of splaying a node in a splay tree with nodes is .
Proof sketch. Define the potential as Where is the number of nodes in the subtree rooted at (including ). Define the rank .
The amortised cost of a splay step at node with parent and grandparent is Where primes denote ranks after the step.
- Zig: .
- Zig-zig: .
- Zig-zag: .
Summing over all splay steps: .
Corollary. A sequence of splay tree operations takes amortised time.
7. Problem Set
7.1 Analysis (Problems 1—3)
Problem 1. Prove that .
Problem 2. Use the Master Theorem to solve the recurrence . If the Master Theorem does not apply, explain why.
Problem 3. Prove that using Stirling’s approximation: .
7.2 Data Structures (Problems 4—8)
Problem 4. Show the result of inserting the keys into an initially empty AVL tree. Show all rotations.
Problem 5. Prove that deleting a node from a red-black tree with internal nodes takes time.
Problem 6. Design a hash table for strings using chaining. Choose the table size A hash function, and compute the expected number of comparisons for a successful search.
Problem 7. A skip list uses . What is the expected maximum level for elements? What is the expected time for search?
Problem 8. Prove that the union-by-rank heuristic alone (without path compression) gives amortised time per Union-Find operation.
7.3 Sorting (Problems 9—11)
Problem 9. Prove that heapsort is not stable by giving a concrete counterexample.
Problem 10. Given an array of integers in the range Design an sorting algorithm using radix sort. Justify the choice of base and number of passes.
Problem 11. Prove that the best-case number of comparisons for comparison-based sorting is (not just the worst case).
7.4 Graph Algorithms (Problems 12—15)
Problem 12. Run Dijkstra’s algorithm on the following graph from source . Show the state of the priority queue after each extraction. Edge weights: , , , , , , , .
Problem 13. Prove that if a graph has a negative-weight cycle reachable from the source, then Bellman-Ford will detect it.
Problem 14. Use the cut property to prove that the minimum spanning tree is unique when all edge weights are distinct.
Problem 15. Find the strongly connected components of the graph with edges: . Use Kosaraju’s algorithm and show all steps.
7.5 Dynamic Programming (Problems 16—17)
Problem 16. You are given types of coin denominations and a target amount . Find the minimum number of coins needed to make exact change for (or report that it is impossible). Give a recurrence, prove correctness, and state the time and space complexity.
Problem 17. Given a sequence of matrices , , , , Find the optimal parenthesisation using the matrix chain multiplication DP. Show the full DP table.
7.6 Advanced Topics (Problems 18—20)
Problem 18. Prove that CLIQUE is NP-complete by reducing from 3-SAT.
Problem 19. Design a 2-approximation algorithm for the metric TSP using the MST shortcutting technique. Prove the approximation ratio.
Problem 20. A randomised algorithm for MINIMUM CUT works by repeatedly contracting random edges until two vertices remain. Prove that the probability that any specific minimum cut survives is at least And hence that repetitions suffice to find a minimum cut with high probability (Karger’s algorithm).
Common Pitfalls
Forgetting that average-case for quicksort becomes worst-case on already sorted input.
Confusing an algorithm with a program — an algorithm is a step-by-step procedure, not its implementation in code.
Forgetting edge cases in algorithm design (e.g., empty input, single element, already sorted data).
Misunderstanding the difference between a stack (LIFO) and a queue (FIFO) in data structure applications.
Worked Examples
Example 1: Merge Sort Trace
Problem. Trace merge sort on the array [38, 27, 43, 3, 9, 82, 10].
Solution. Recursive splits and merges:
[38, 27, 43, 3, 9, 82, 10]→ [38, 27, 43] and [3, 9, 82, 10]→ [38] [27, 43] and [3, 9] [82, 10]→ [38] [27][43] and [3][9] [82][10]Merge: [27, 38, 43] and [3, 9, 10, 82]Final: [3, 9, 10, 27, 38, 43, 82]Time: at every case. Space: auxiliary.
Example 2: Binary Search on Sorted Array
Problem. Search for 23 in the sorted array [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]. Show each step.
Solution. Low = 0, high = 9.
- Step 1: mid = 4,
A[4]= 16 < 23 → low = 5 - Step 2: mid = 7,
A[7]= 56 > 23 → high = 6 - Step 3: mid = 5,
A[5]= 23 = 23 → found at index 5
3 comparisons. worst case.
Summary
- Big-O notation classifies growth rates: < < < < < .
- Sorting: merge sort and heap sort guarantee ; quick sort is worst case but average.
- Fundamental data structures: arrays ( access), linked lists ( insert/delete), hash tables ( average lookup), BSTs ( balanced).
- Graph representations: adjacency matrix ( space) vs adjacency list ( space).
- Algorithm design paradigms: divide-and-conquer, greedy, dynamic programming — choose based on optimal substructure and overlapping subproblems.
Cross-References
| Topic | Site | Link |
|---|---|---|
| Advanced Algorithms | WyattsNotes | View |
| Advanced Data Structures | WyattsNotes | View |
| Discrete Mathematics | WyattsNotes | View |
| Theory of Computation | WyattsNotes | View |
| Algorithms — MIT 6.006 | MIT OCW | View |
:::