Skip to content

Algorithms and Data Structures

1. Algorithm Analysis

1.1 Asymptotic Notation

Definition. f(n)=O(g(n))f(n) = O(g(n)) if there exist constants c>0c > 0 and n0n_0 such that f(n)cg(n)f(n) \leq c \cdot g(n) for all nn0n \geq n_0.

Definition. f(n)=Ω(g(n))f(n) = \Omega(g(n)) if there exist constants c>0c > 0 and n0n_0 such that f(n)cg(n)f(n) \geq c \cdot g(n) for all nn0n \geq n_0.

Definition. f(n)=Θ(g(n))f(n) = \Theta(g(n)) if f(n)=O(g(n))f(n) = O(g(n)) and f(n)=Ω(g(n))f(n) = \Omega(g(n)).

Definition. f(n)=o(g(n))f(n) = o(g(n)) if limnf(n)/g(n)=0\lim_{n \to \infty} f(n)/g(n) = 0.

Definition. f(n)=ω(g(n))f(n) = \omega(g(n)) if limng(n)/f(n)=0\lim_{n \to \infty} g(n)/f(n) = 0.

Theorem 1.1. f(n)=O(g(n))f(n) = O(g(n)) if and only if g(n)=Ω(f(n))g(n) = \Omega(f(n)).

Proof. Suppose f(n)=O(g(n))f(n) = O(g(n)). Then there exist c,n0c, n_0 such that f(n)cg(n)f(n) \leq c \cdot g(n) for all nn0n \geq n_0Hence g(n)(1/c)f(n)g(n) \geq (1/c) \cdot f(n) for all nn0n \geq n_0So g(n)=Ω(f(n))g(n) = \Omega(f(n)). The converse follows by symmetry. \blacksquare

Theorem 1.2. f(n)=Θ(g(n))f(n) = \Theta(g(n)) if and only if there exist constants c1,c2>0c_1, c_2 > 0 and n0n_0 such that c1g(n)f(n)c2g(n)c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) for all nn0n \geq n_0.

Proof. By definition, f(n)=Θ(g(n))f(n) = \Theta(g(n)) means f(n)=O(g(n))f(n) = O(g(n)) and f(n)=Ω(g(n))f(n) = \Omega(g(n)). The former gives f(n)c2g(n)f(n) \leq c_2 \cdot g(n) for some c2>0c_2 > 0And the latter gives f(n)c1g(n)f(n) \geq c_1 \cdot g(n) for some c1>0c_1 > 0. Combining yields the stated inequality. \blacksquare

Theorem 1.3 (Limit Rule). If limnf(n)/g(n)=c\lim_{n \to \infty} f(n)/g(n) = c where 0<c<0 < c < \inftyThen f(n)=Θ(g(n))f(n) = \Theta(g(n)). If c=0c = 0Then f(n)=O(g(n))f(n) = O(g(n)). If c=c = \inftyThen g(n)=O(f(n))g(n) = O(f(n)).

Proof. If c=0c = 0Then for any ε>0\varepsilon > 0There exists n0n_0 such that f(n)/g(n)<εf(n)/g(n) < \varepsilon for all nn0n \geq n_0So f(n)εg(n)f(n) \leq \varepsilon \cdot g(n)Establishing f(n)=O(g(n))f(n) = O(g(n)). If 0<c<0 < c < \inftyTake ε=c/2\varepsilon = c/2; then (c/2)g(n)f(n)(3c/2)g(n)(c/2) \cdot g(n) \leq f(n) \leq (3c/2) \cdot g(n) for sufficiently large nnGiving Θ\Theta. The c=c = \infty case is symmetric. \blacksquare

Proposition 1.4. Asymptotic notation is transitive: if f=O(g)f = O(g) and g=O(h)g = O(h)Then f=O(h)f = O(h).

Proof. There exist c1,n1c_1, n_1 with f(n)c1g(n)f(n) \leq c_1 g(n) for nn1n \geq n_1And c2,n2c_2, n_2 with g(n)c2h(n)g(n) \leq c_2 h(n) for nn2n \geq n_2. Then f(n)c1c2h(n)f(n) \leq c_1 c_2 h(n) for nmax(n1,n2)n \geq \max(n_1, n_2). \blacksquare

Proposition 1.5. OO-notation is reflexive: f=O(f)f = O(f) for all ff.

Proof. Take c=1c = 1 and n0=1n_0 = 1. Then f(n)1f(n)f(n) \leq 1 \cdot f(n) for all n1n \geq 1. \blacksquare

Worked Example: Proving $n^2 + 3n + 1 = O(n^2)$

We must find c>0c > 0 and n0n_0 such that n2+3n+1cn2n^2 + 3n + 1 \leq c \cdot n^2 for all nn0n \geq n_0.

Note that n2+3n+1n2+3n2+n2=5n2n^2 + 3n + 1 \leq n^2 + 3n^2 + n^2 = 5n^2 for all n1n \geq 1 (since n1n \geq 1 implies 3n3n23n \leq 3n^2 and 1n21 \leq n^2).

So take c=5c = 5 and n0=1n_0 = 1.

To show tightness, note n2+3n+1n2n^2 + 3n + 1 \geq n^2 for all n0n \geq 0So n2+3n+1=Θ(n2)n^2 + 3n + 1 = \Theta(n^2).

Worked Example: Proving $2^n \neq O(n^k)$ for any constant $k$

By the limit rule: limn2n/nk=\lim_{n \to \infty} 2^n / n^k = \infty for any fixed kk (this follows from repeated application of L”Hôpital’s rule, or from the fact that log(2n)=nlog2\log(2^n) = n \log 2 grows faster than log(nk)=klogn\log(n^k) = k \log n). Therefore 2n=ω(nk)2^n = \omega(n^k) for all kkAnd in particular 2nO(nk)2^n \neq O(n^k).

Worked Example: Sum of Harmonic Series and Its Use in Analysis

The kk-th harmonic number is Hk=i=1k1/iH_k = \sum_{i=1}^{k} 1/i. We show Hk=Θ(logk)H_k = \Theta(\log k).

Upper bound: Hk=i=1k1/i1+1k1xdx=1+lnk=O(logk)H_k = \sum_{i=1}^{k} 1/i \leq 1 + \int_1^k \frac{1}{x}\, dx = 1 + \ln k = O(\log k).

Lower bound: Hk1k+11xdx=ln(k+1)=Ω(logk)H_k \geq \int_1^{k+1} \frac{1}{x}\, dx = \ln(k+1) = \Omega(\log k).

Therefore Hk=Θ(logk)H_k = \Theta(\log k). This is used extensively in analysing algorithms like binary search on unbounded domains and the expected depth of elements in a random BST.

Worked Example: Analysing a Nested Loop Algorithm

Consider the algorithm:

for i = 1 to n:
for j = i to n:
constant-time work

The total number of iterations is i=1n(ni+1)=k=1nk=n(n+1)/2=Θ(n2)\sum_{i=1}^{n}(n - i + 1) = \sum_{k=1}^{n} k = n(n+1)/2 = \Theta(n^2).

Now consider:

for i = 1 to n:
j = 1
while j < n:
j = j * 2
constant-time work

The inner loop runs log2n+1=O(logn)\lfloor \log_2 n \rfloor + 1 = O(\log n) times, and the outer loop runs nn times. Total: O(nlogn)O(n \log n).

Now consider the triple loop:

for i = 1 to n:
for j = 1 to i:
for k = 1 to j:
constant-time work

Total iterations: i=1nj=1ij=i=1ni(i+1)2=Θ(n3)\sum_{i=1}^{n} \sum_{j=1}^{i} j = \sum_{i=1}^{n} \frac{i(i+1)}{2} = \Theta(n^3).

1.2 Common Complexity Classes

ClassNameExample
O(1)O(1)ConstantArray access by index
O(logn)O(\log n)LogarithmicBinary search
O(n)O(n)LinearLinear search
O(nlogn)O(n \log n)Log-linearMerge sort, heap sort
O(n2)O(n^2)QuadraticBubble sort, insertion sort
O(n3)O(n^3)CubicNaive matrix multiplication
O(2n)O(2^n)ExponentialNaive Fibonacci, subset problems
O(n!)O(n!)FactorialGenerating all permutations

Proposition 1.6. These classes form a strict hierarchy: O(1)O(logn)O(n)O(nlogn)O(n2)O(n3)O(2n)O(n!)O(1) \subsetneq O(\log n) \subsetneq O(n) \subsetneq O(n \log n) \subsetneq O(n^2) \subsetneq O(n^3) \subsetneq O(2^n) \subsetneq O(n!)

1.3 Recurrences and the Master Theorem

Many divide-and-conquer algorithms yield recurrences of the form:

T(n)=aT(n/b)+f(n)T(n) = aT(n/b) + f(n)

Where a1a \geq 1 is the number of subproblems, b>1b > 1 is the factor by which the input is divided, and f(n)f(n) is the cost of dividing and combining.

Theorem 1.7 (Master Theorem). Let a1a \geq 1 and b>1b > 1 be constants, let f(n)f(n) be a function, and let T(n)T(n) be defined on the nonnegative integers by the recurrence T(n)=aT(n/b)+f(n)T(n) = aT(n/b) + f(n)Where we interpret n/bn/b to mean either n/b\lfloor n/b \rfloor or n/b\lceil n/b \rceil. Let c=logbac = \log_b a. Then:

  1. If f(n)=O(ncε)f(n) = O(n^{c - \varepsilon}) for some ε>0\varepsilon > 0Then T(n)=Θ(nc)T(n) = \Theta(n^c).
  2. If f(n)=Θ(nclogkn)f(n) = \Theta(n^c \log^k n) for some k0k \geq 0Then T(n)=Θ(nclogk+1n)T(n) = \Theta(n^c \log^{k+1} n).
  3. If f(n)=Ω(nc+ε)f(n) = \Omega(n^{c + \varepsilon}) for some ε>0\varepsilon > 0And if af(n/b)qf(n)a f(n/b) \leq q f(n) for some constant q<1q < 1 and all sufficiently large nn (the regularity condition), then T(n)=Θ(f(n))T(n) = \Theta(f(n)).

Proof (Case 1 sketch). The recursion tree has depth logbn\log_b n. At level iiThere are aia^i nodes, each costing f(n/bi)f(n/b^i). Since f(n)=O(ncε)f(n) = O(n^{c - \varepsilon})The cost at level ii is ai(n/bi)cε=ncε(a/bcε)ia^i \cdot (n/b^i)^{c - \varepsilon} = n^{c - \varepsilon} \cdot (a / b^{c - \varepsilon})^i. The total cost is dominated by the leaves (level logbn\log_b n), which contribute alogbn=nca^{\log_b n} = n^c. The internal levels contribute a geometric series with ratio a/bcε=bε>1a / b^{c - \varepsilon} = b^\varepsilon > 1So the leaf level dominates. \blacksquare

Worked Example: Merge Sort Recurrence

Merge sort divides into 2 subproblems of size n/2n/2 and combines in O(n)O(n) time.

T(n)=2T(n/2)+Θ(n)T(n) = 2T(n/2) + \Theta(n)

Here a=2a = 2, b=2b = 2So c=log22=1c = \log_2 2 = 1. We have f(n)=Θ(n)=Θ(nclog0n)f(n) = \Theta(n) = \Theta(n^c \log^0 n)Which is Case 2 with k=0k = 0.

Therefore T(n)=Θ(n1log1n)=Θ(nlogn)T(n) = \Theta(n^1 \log^1 n) = \Theta(n \log n).

Worked Example: Binary Search Recurrence

T(n)=T(n/2)+O(1)T(n) = T(n/2) + O(1)

Here a=1a = 1, b=2b = 2So c=log21=0c = \log_2 1 = 0. We have f(n)=O(1)=O(n0)f(n) = O(1) = O(n^0). This matches Case 2 with k=0k = 0.

Therefore T(n)=Θ(logn)T(n) = \Theta(\log n).

Worked Example: Strassen's Matrix Multiplication

Strassen’s algorithm divides into 7 subproblems of size n/2n/2 and combines in O(n2)O(n^2) time.

T(n)=7T(n/2)+O(n2)T(n) = 7T(n/2) + O(n^2)

Here a=7a = 7, b=2b = 2So c=log272.807c = \log_2 7 \approx 2.807. We have f(n)=O(n2)=O(ncε)f(n) = O(n^2) = O(n^{c - \varepsilon}) with ε=c20.807\varepsilon = c - 2 \approx 0.807Which is Case 1.

Therefore T(n)=Θ(nlog27)=Θ(n2.807)T(n) = \Theta(n^{\log_2 7}) = \Theta(n^{2.807}).

1.4 Amortised Analysis

Amortised analysis gives the average cost per operation over a sequence of operations, even when Individual operations may be expensive.

Methods:

  1. Aggregate analysis: Compute total cost of nn operations, divide by nn.
  2. Accounting method: Assign an amortised cost to each operation; the sum of amortised costs must be an upper bound on the total actual cost.
  3. Potential method: Define a potential function Φ\Phi; the amortised cost of the ii-th operation is c^i=ci+Φ(Di)Φ(Di1)\hat{c}_i = c_i + \Phi(D_i) - \Phi(D_{i-1}).

Theorem 1.8 (Potential Method). If Φ(Di)Φ(D0)\Phi(D_i) \geq \Phi(D_0) for all i1i \geq 1Then the total amortised cost i=1nc^i\sum_{i=1}^{n} \hat{c}_i is an upper bound on the total actual cost i=1nci\sum_{i=1}^{n} c_i.

Proof. i=1nc^i=i=1n(ci+Φ(Di)Φ(Di1))=i=1nci+Φ(Dn)Φ(D0)i=1nci\sum_{i=1}^{n} \hat{c}_i = \sum_{i=1}^{n} (c_i + \Phi(D_i) - \Phi(D_{i-1})) = \sum_{i=1}^{n} c_i + \Phi(D_n) - \Phi(D_0) \geq \sum_{i=1}^{n} c_i. \blacksquare

Example (Dynamic Array). A dynamic array doubles in size when full. Insertion is O(1)O(1) Amortised: most insertions cost O(1)O(1); occasional resizing costs O(n)O(n)But is paid for by the Surplus from previous O(1)O(1) insertions.

Worked Example: Dynamic Array Amortised Analysis (Accounting Method)

Assign an amortised cost of 3 per insertion. The actual cost of insertion without resizing is 1. We store the surplus of 2 as “credit” on each element in the array.

When the array of capacity kk is full and we must resize to capacity 2k2k:

  • The resizing cost is kk (copy all kk elements).
  • We have 2k2k credits stored (2 per element).
  • We spend kk credits on the copy, leaving kk credits.
  • The new array has kk elements, so we need 2k2k more credits, but we already have kk.
  • Subsequent insertions build up credits again.

Since the total credits are always non-negative, the amortised cost of 3 per insertion covers all actual costs.

Worked Example: Dynamic Array Amortised Analysis (Potential Method)

Define the potential function Φ(D)=2nm\Phi(D) = 2n - m where nn is the number of elements and mm is the capacity. We require mnm \geq nSo Φ0\Phi \geq 0 (since 2nn=n02n - n = n \geq 0).

Case 1: No resize. c^=1+(2(n+1)m)(2nm)=1+2=3\hat{c} = 1 + (2(n+1) - m) - (2n - m) = 1 + 2 = 3.

Case 2: Resize from m=nm = n to m=2nm = 2n. c^=(1+n)+(2(n+1)2n)(2nn)=(n+1)+2n=3\hat{c} = (1 + n) + (2(n+1) - 2n) - (2n - n) = (n + 1) + 2 - n = 3.

In both cases, the amortised cost is O(1)O(1).

:::caution Common Pitfall Amortised analysis gives a guarantee over a sequence of operations, not a per-operation worst-case bound. A single operation can still be expensive (e.g., a resize in a dynamic array costs O(n)O(n)). Amortised bounds are meaningful only when the sequence length is not bounded by a constant. :::

2. Fundamental Data Structures

2.1 Arrays and Linked Lists

Array. Contiguous memory, O(1)O(1) access by index, O(n)O(n) insertion/deletion (shift required).

OperationArraySingly Linked ListDoubly Linked List
Access by indexO(1)O(1)O(n)O(n)O(n)O(n)
Search (by value)O(n)O(n)O(n)O(n)O(n)O(n)
Insert at headO(n)O(n)O(1)O(1)O(1)O(1)
Insert at tailO(1)O(1)O(n)O(n)O(1)O(1)*
Delete at headO(n)O(n)O(1)O(1)O(1)O(1)
Delete at tailO(n)O(n)O(n)O(n)O(1)O(1)*
Insert at positionO(n)O(n)O(n)O(n)O(1)O(1)*
Space overhead00O(n)O(n)O(n)O(n)

*Given a pointer to the node or its predecessor.

Linked List. Each node stores data and a pointer to the next node. O(1)O(1) insertion/deletion at Head (given pointer), O(n)O(n) access by position.

Doubly Linked List. Each node has pointers to both next and previous nodes. O(1)O(1) insertion And deletion at any position (given pointers to the node and its neighbours).

Worked Example: Reversing a Singly Linked List

To reverse a singly linked list in O(n)O(n) time and O(1)O(1) space:

prev = null
curr = head
while curr != null:
next = curr.next
curr.next = prev
prev = curr
curr = next
head = prev

At each iteration, we move one node to the front of the reversed list. After nn iterations, all nn nodes are reversed.

2.2 Stacks and Queues

Stack. Last-in, first-out (LIFO). Operations: push, pop, peek, all O(1)O(1).

Queue. First-in, first-out (FIFO). Operations: enqueue, dequeue, peek, all O(1)O(1).

Implementation. Both can be implemented with arrays (with circular buffer) or linked lists.

Worked Example: Implementing a Queue with Two Stacks

A queue can be implemented using two stacks in_stack and out_stack.

  • enqueue(x): push xx onto in_stack. O(1)O(1).
  • dequeue(): if out_stack is empty, pop all elements from in_stack and push onto out_stack. Then pop from out_stack.

The dequeue operation is O(1)O(1) amortised: each element is moved from in_stack to out_stack at most once across all operations. Over nn operations, the total number of element moves is at most nnGiving O(1)O(1) amortised per dequeue.

2.3 Hash Tables

A hash table maps keys to values using a hash function h:K0,1,,m1h : K \to \\{0, 1, \ldots, m - 1\\}.

Collision resolution:

  • Chaining: Each bucket is a linked list. Average case: O(1+α)O(1 + \alpha) where α=n/m\alpha = n/m is the load factor.
  • Open addressing (linear probing): Insert into the next available slot. Average case: O(1/(1α))O(1/(1-\alpha)).

Theorem 2.1 (Uniform Hashing). Under the assumption of simple uniform hashing, the expected Length of a chain in a hash table with chaining is α=n/m\alpha = n/m.

Proof. Under simple uniform hashing, each of the nn keys is equally likely to hash to any of the mm slots. For any given key kkThe expected number of other keys that hash to the same slot as kk is (n1)/m<α(n - 1)/m < \alpha. Including kk itself, the expected chain length is 1+(n1)/m1+α1 + (n-1)/m \leq 1 + \alpha. \blacksquare

Theorem 2.2 (Successful Search with Chaining). Under simple uniform hashing, the expected time for a successful search in a hash table with chaining is O(1+α/2)O(1 + \alpha/2).

Proof. The expected length of the list containing the searched element is 1+(n1)/m1 + (n - 1)/mSince n1n - 1 other elements are distributed uniformly. On average, the searched element is halfway through its list, giving (1+(n1)/m)/2+1/2(1 + (n-1)/m)/2 + 1/2 comparisons (we examine elements before the target plus the target itself). This equals 1+α/21/(2m)=O(1+α)1 + \alpha/2 - 1/(2m) = O(1 + \alpha). \blacksquare

Double Hashing. Uses two hash functions: h(k,i)=(h1(k)+ih2(k))modmh(k, i) = (h_1(k) + i \cdot h_2(k)) \bmod mWhere h2(k)h_2(k) is relatively prime to mm. This avoids the clustering problems of linear probing.

Worked Example: Choosing a Good Hash Function

For integer keys, a common choice is the multiplication method: h(k)=m(kAmod1)h(k) = \lfloor m \cdot (k \cdot A \bmod 1) \rfloor where A=(51)/20.618A = (\sqrt{5} - 1)/2 \approx 0.618 (the inverse golden ratio).

For string keys, the polynomial rolling hash: h(s)=(i=0k1s[i]pk1i)modmh(s) = \left(\sum_{i=0}^{k-1} s[i] \cdot p^{k-1-i}\right) \bmod m where pp is a prime (e.g., 31 or 127).

Both avoid clustering better than simple modulo hashing when the input distribution is adversarial.

2.4 Trees

Binary Search Tree (BST). For each node: left subtree values <\lt node value, right subtree values >\gt node value.

  • Search, insert, delete: O(h)O(h) where hh is the height.
  • For a balanced BST, h=O(logn)h = O(\log n).

Theorem 2.3. The expected height of a randomly built BST with nn distinct keys is O(logn)O(\log n).

Proof. Let XnX_n be the height of a BST built from nn random keys. Let Yn=2XnY_n = 2^{X_n}. Then E[Yn]14i=0n1(ni)E[Yi]E[Yn1i]/n\mathrm{E}[Y_n] \leq \frac{1}{4} \sum_{i=0}^{n-1} \binom{n}{i} \mathrm{E}[Y_i] \mathrm{E}[Y_{n-1-i}] / n. Using the indicator random variable technique, E[Yn]cn+1n3/2i=0n1i3/2(n1i)3/2cicn1icn3/2\mathrm{E}[Y_n] \leq \frac{c^{n+1}}{n^{3/2}} \sum_{i=0}^{n-1} \frac{i^{3/2}(n-1-i)^{3/2}}{c^i c^{n-1-i}} \leq c \cdot n^{3/2} for some constant cc. Taking logs gives E[Xn]=E[logYn]logE[Yn]=O(logn)\mathrm{E}[X_n] = \mathrm{E}[\log Y_n] \leq \log \mathrm{E}[Y_n] = O(\log n) by Jensen’s inequality. \blacksquare

2.4.1 AVL Trees

Definition. An AVL tree is a BST where for every node, the heights of its left and right subtrees differ by at most 1. The balance factor of a node is the height of its left subtree minus the height of its right subtree; valid balance factors are 1,0,1\\{-1, 0, 1\\}.

Rotations. After insertion or deletion, the balance factor may become ±2\pm 2. This is fixed by rotations:

  • Right rotation (LL case): Balance factor +2+2 at node AABalance factor +1+1 at left child BB.

    A B
    / \ / \
    B C → D A
    / \ / \
    D E E C
  • Left rotation (RR case): Mirror of right rotation.

  • Left-Right rotation (LR case): Balance factor +2+2 at AABalance factor 1-1 at left child BB. First left-rotate BBThen right-rotate AA.

  • Right-Left rotation (RL case): Mirror of LR case.

Theorem 2.4. An AVL tree with nn nodes has height h1.4404log2(n+2)1.3277h \leq 1.4404 \cdot \log_2(n + 2) - 1.3277.

Proof. Let N(h)N(h) be the minimum number of nodes in an AVL tree of height hh. We have N(0)=1N(0) = 1, N(1)=2N(1) = 2And N(h)=1+N(h1)+N(h2)N(h) = 1 + N(h-1) + N(h-2) for h2h \geq 2. This is the Fibonacci recurrence, giving N(h)=Fh+31N(h) = F_{h+3} - 1. Using Fh=ϕhϕ^h5F_h = \frac{\phi^h - \hat{\phi}^h}{\sqrt{5}} where ϕ=1+52\phi = \frac{1+\sqrt{5}}{2}We get N(h)>ϕh/51N(h) > \phi^h / \sqrt{5} - 1So h<logϕ(5(n+1))1.4404log2(n+1)h \lt \log_\phi(\sqrt{5}(n + 1)) \approx 1.4404 \log_2(n + 1). \blacksquare

Corollary 2.5. All AVL tree operations (search, insert, delete) run in O(logn)O(\log n) time.

Proof. Each operation visits O(h)=O(logn)O(h) = O(\log n) nodes along a root-to-leaf path, and each rotation is O(1)O(1). Since at most O(logn)O(\log n) rotations are needed (at most 2 per level), the total cost is O(logn)O(\log n). \blacksquare

2.4.2 Red-Black Trees

Definition. A red-black tree is a BST satisfying:

  1. Every node is either red or black.
  2. The root is black.
  3. Every leaf (NIL sentinel) is black.
  4. If a node is red, both its children are black.
  5. For every node, all simple paths from the node to descendant leaves contain the same number of black nodes (the black-height).

Theorem 2.6. A red-black tree with nn internal nodes has height h2log2(n+1)h \leq 2 \log_2(n + 1).

Proof. Let rr be the root. By property 5, at least half the nodes on any root-to-leaf path are black (by property 4, no two reds are adjacent), so the black-height bh(r)h/2bh(r) \geq h/2. Let S(h,bh)S(h, bh) be the minimum number of internal nodes in a subtree of height hh and black-height bhbh. Then S(h,bh)2bh1S(h, bh) \geq 2^{bh} - 1 (proved by induction on bhbhUsing the fact that each child has black-height at least bh1bh - 1 and the root is black). Since bh(r)h/2bh(r) \geq h/2 and n2bh(r)12h/21n \geq 2^{bh(r)} - 1 \geq 2^{h/2} - 1We get h2log2(n+1)h \leq 2 \log_2(n + 1). \blacksquare

Insertion. Insert as in a standard BST (colour the new node red), then fix violations by recolouring and rotating up to O(logn)O(\log n) times.

Deletion. More complex than insertion. When a black node is removed, we may need to fix the black-height property by recolouring and rotating. The fix-up procedure takes O(logn)O(\log n) time.

2.4.3 B-Trees

Definition. A B-tree of minimum degree t2t \geq 2 is a rooted tree satisfying:

  1. Every node has at most 2t12t - 1 keys.
  2. Every non-root node has at least t1t - 1 keys (hence at least tt children).
  3. The root has at least 1 key.
  4. All leaves appear at the same depth.

Theorem 2.7. A B-tree with nn keys and minimum degree tt has height hlogtn+12h \leq \log_t \frac{n+1}{2}.

Proof. The root has at least 1 key and 2 children. At depth 1, each node has at least t1t - 1 keys and tt children. At depth d1d \geq 1The number of nodes is at least 2td12t^{d-1}Each with at least t1t - 1 keys. So the total number of keys is at least 1+2(t1)i=0h1ti=1+2(t1)th1t1=2th11 + 2(t-1) \sum_{i=0}^{h-1} t^i = 1 + 2(t-1) \cdot \frac{t^h - 1}{t - 1} = 2t^h - 1. Setting n2th1n \geq 2t^h - 1 gives hlogt((n+1)/2)h \leq \log_t((n+1)/2). \blacksquare

Worked Example: Red-Black Tree Insertion Trace

Insert keys 7, 3, 18, 10, 22, 8, 11, 26 into an initially empty red-black tree.

Insert 7: Root is coloured black. T=[7(B)]T = [7(B)]

Insert 3: Left child of 7, coloured red. T=[7(B)3(R)]T = [7(B) \leftarrow 3(R)]. Valid (root is black, no red-red violation).

Insert 18: Right child of 7, coloured red. T=[3(R)7(B)18(R)]T = [3(R) \leftarrow 7(B) \rightarrow 18(R)]. Valid.

Insert 10: Insert as left child of 18, coloured red. Now 18 has red child 10. Uncle of 10 is 3(R) — Case 1 (uncle is red): recolour uncle 3 black, parent 18 black, grandparent 7 red. Grandparent 7 is root, so recolour 7 black.

Result: T=[3(B)7(B)18(B)]T = [3(B) \leftarrow 7(B) \rightarrow 18(B)] with 10(R)10(R) under 1818.

Insert 22: Right child of 18, coloured red. Uncle of 22 is 10(R) — Case 1 again: recolour 10 black, 22 black, 18 red. Parent 18 is not root, grandparent is 7(B). Now 7 has right child 18(R). Check 7’s left child: 3(B). No violation.

Insert 8: Left child of 10, coloured red. Uncle of 8 is 22(R) — Case 1: recolour 8 black, 22 black, 10 red. Now 10(R) is left child of 18(B). Check grandparent 7: right child 18(B) with left child 10(R). No red-red violation.

Insert 11: Right child of 10, coloured red. Now 10(R) has both children red — wait, 10 is red and 11 is red: violation. Uncle of 11 is 8(R) — Case 1: recolour 8 black, 11 black, 10 red. Grandparent 18(B): 10(R) is left child, 22(B) is right child. No violation.

Insert 26: Right child of 22, coloured red. Uncle of 26 is 10(R) — Case 1: recolour 10 black, 26 black, 22 red. Now 22(R) is right child of 18(B). Check left child of 18: 10(B). Valid.

2.5 Skip Lists

A skip list is a probabilistic data structure that provides O(logn)O(\log n) expected-time search, insertion, and deletion. It consists of multiple levels of linked lists.

Structure:

  • Level 0 is a standard linked list containing all elements in sorted order.
  • Each element is promoted to level 1 with probability pp ( p=1/2p = 1/2).
  • Each level-ii element is promoted to level i+1i+1 independently with probability pp.
  • A sentinel head node links to the first element at every level.

Theorem 2.8. The expected number of levels in a skip list with nn elements is O(logn)O(\log n).

Proof. An element reaches level ii with probability pip^i. The expected number of elements at level ii is npinp^i. The highest non-empty level is approximately log1/pn\log_{1/p} n. \blacksquare

Theorem 2.9. Search, insert, and delete in a skip list take O(logn)O(\log n) expected time.

Proof. The search path at each level ii moves right to find an element whose successor at level ii is greater than the target, then drops to level i1i - 1. At each level, the expected number of rightward steps is at most 1/p1/p (a geometric distribution argument). With O(log1/pn)O(\log_{1/p} n) levels, the total expected work is O(logn/p)=O(logn)O(\log n / p) = O(\log n). \blacksquare

Worked Example: Skip List Search

Search for key 25 in a skip list containing keys [3,7,12,19,25,31,42][3, 7, 12, 19, 25, 31, 42] with the following level structure (probability p=1/2p = 1/2):

Level 3: [HEAD] -------------------------> [31] ---------->
Level 2: [HEAD] ---------> [12] -------------------------->
Level 1: [HEAD] -> [7] -> [12] -> [19] -> [25] -> [31] -> [42]
Level 0: [HEAD] -> [7] -> [12] -> [19] -> [25] -> [31] -> [42]

Search starts at HEAD, level 3:

  • Move right to 31. 31>2531 > 25Drop to level 2.
  • At level 2, move right to 12. 122512 \leq 25Move right — next is NIL. Drop to level 1.
  • At level 1, move right to 19. 192519 \leq 25Move right to 25. 25=2525 = 25. Found!

The search examined keys 31, 12, 19, 25 — 4 comparisons across 3 levels.

Worked Example: Skip List Insertion

Insert key 15 into the skip list above.

  1. Search for insertion position: traverse levels 3, 2, 1 to find that 15 goes between 12 and 19.
  2. Insert 15 at level 0 (always). Randomly promote: suppose coin flip says promote to level 1. Insert at level 1. Flip again: promote to level 2. Insert at level 2. Flip again: don’t promote. Stop.
Level 3: [HEAD] -------------------------> [31] ---------->
Level 2: [HEAD] ---------> [12] -> [15] ------------------>
Level 1: [HEAD] -> [7] -> [12] -> [15] -> [19] -> [25] -> [31] -> [42]
Level 0: [HEAD] -> [7] -> [12] -> [15] -> [19] -> [25] -> [31] -> [42]

Insertion takes O(logn)O(\log n) expected time (search path + insertions at promoted levels).

2.6 Bloom Filters

A Bloom filter is a space-efficient probabilistic data structure for set membership testing. It may produce false positives but never false negatives.

Structure: A bit array of mm bits (initially all 0) and kk independent hash functions h1,,hkh_1, \ldots, h_k.

Operations:

  • Insert xx: set bits h1(x),h2(x),,hk(x)h_1(x), h_2(x), \ldots, h_k(x) to 1.
  • Query xx: return true if all h1(x),,hk(x)h_1(x), \ldots, h_k(x) are set to 1.

Theorem 2.10. After inserting nn elements into a Bloom filter of size mm with kk hash functions, the probability of a false positive is approximately (1ekn/m)k\left(1 - e^{-kn/m}\right)^k.

Proof. After inserting nn elements, each of the mm bits is set to 0 with probability (11/m)knekn/m(1 - 1/m)^{kn} \approx e^{-kn/m}. The probability that all kk bits for a query element are set is (1ekn/m)k(1 - e^{-kn/m})^k. \blacksquare

Optimal number of hash functions. The false positive rate is minimised when k=(m/n)ln2k = (m/n) \ln 2Giving a minimum rate of approximately (1/2)k=0.6185m/n(1/2)^k = 0.6185^{m/n}.

Worked Example: Bloom Filter Configuration

We want to store n=10000n = 10000 URLs with a false positive rate of at most 1%.

From the formula: (1ekn/m)k0.01(1 - e^{-kn/m})^k \leq 0.01.

Optimal k=(m/n)ln2k = (m/n) \ln 2. At the optimal point, the false positive rate is (1/2)k0.6185m/n(1/2)^k \approx 0.6185^{m/n}.

We need 0.6185m/n0.010.6185^{m/n} \leq 0.01So m/nln(0.01)/ln(0.6185)9.585m/n \geq \ln(0.01) / \ln(0.6185) \approx 9.585.

Take m=95850m = 95850 bits 11.5\approx 11.5 KB. Then k=(95850/10000)ln26.64k = (95850 / 10000) \ln 2 \approx 6.64So use k=7k = 7 hash functions.

This requires only 11.5 KB of memory versus approximately 600 KB for storing the URLs as 60-byte strings.

Worked Example: Bloom Filter Operations Trace

A Bloom filter has m=10m = 10 bits, k=2k = 2 hash functions: h1(x)=xmod10h_1(x) = x \bmod 10, h2(x)=(x3)mod10h_2(x) = (x \cdot 3) \bmod 10.

Insert 15: h1(15)=5h_1(15) = 5, h2(15)=53mod10=5h_2(15) = 5 \cdot 3 \bmod 10 = 5. Set bit 5. Bit array: 0000010000

Insert 22: h1(22)=2h_1(22) = 2, h2(22)=223mod10=6h_2(22) = 22 \cdot 3 \bmod 10 = 6. Set bits 2, 6. Bit array: 0010010100

Insert 37: h1(37)=7h_1(37) = 7, h2(37)=373mod10=1h_2(37) = 37 \cdot 3 \bmod 10 = 1. Set bits 1, 7. Bit array: 0110010110

Query 22: h1(22)=2h_1(22) = 2 (set), h2(22)=6h_2(22) = 6 (set). Result: possibly in set (true positive).

Query 42: h1(42)=2h_1(42) = 2 (set), h2(42)=423mod10=6h_2(42) = 42 \cdot 3 \bmod 10 = 6 (set). Result: possibly in set (false positive!).

Query 50: h1(50)=0h_1(50) = 0 (not set). Result: definitely not in set (true negative).

2.7 Heaps

A binary heap is a complete binary tree satisfying the heap property:

  • Max-heap: parent \geq children.
  • Min-heap: parent \leq children.

Implemented as an array: parent of node ii is at (i1)/2\lfloor(i-1)/2\rfloor; children at 2i+12i+1 and 2i+22i+2.

Operations: insert O(logn)O(\log n)Extract-max/min O(logn)O(\log n)Peek O(1)O(1).

Theorem 2.11 (Build-Heap). buildHeap on an array of nn elements runs in O(n)O(n) time.

Proof. buildHeap calls heapify on each of the n/2\lfloor n/2 \rfloor non-leaf nodes. The cost of heapify at a node of height hh is O(h)O(h). The total cost is h=0lognn/2h+1O(h)=O(nh=0h/2h)=O(n)\sum_{h=0}^{\lfloor \log n \rfloor} \lceil n / 2^{h+1} \rceil \cdot O(h) = O\left(n \sum_{h=0}^{\infty} h / 2^h\right) = O(n)Since h=0h/2h=2\sum_{h=0}^{\infty} h / 2^h = 2. \blacksquare

Worked Example: Build-Heap Step by Step

Build a max-heap from the array [4,10,3,5,1,8,2][4, 10, 3, 5, 1, 8, 2].

The array represents the tree:

4
/ \
10 3
/ \ / \
5 1 8 2

Non-leaf nodes (indices 7/21=2\lfloor 7/2 \rfloor - 1 = 2 down to 0): nodes 3, 10, 4.

Heapify index 2 (value 3): Children are 8 and 2. Swap 3 with 8. Array: [4,10,8,5,1,3,2][4, 10, 8, 5, 1, 3, 2].

4
/ \
10 8
/ \ / \
5 1 3 2

Heapify index 1 (value 10): Children are 5 and 1. 10 is largest. No swap.

Heapify index 0 (value 4): Children are 10 and 8. Swap with 10. Array: [10,4,8,5,1,3,2][10, 4, 8, 5, 1, 3, 2].

10
/ \
4 8
/ \ / \
5 1 3 2

Now heapify index 1 (value 4): Children are 5 and 1. Swap with 5. Array: [10,5,8,4,1,3,2][10, 5, 8, 4, 1, 3, 2].

10
/ \
5 8
/ \ / \
4 1 3 2

This is a valid max-heap. Total operations: 3 swaps = O(n)O(n).

2.8 Union-Find (Disjoint Set Union)

The Union-Find data structure maintains a partition of a set into disjoint subsets, supporting:

  • MakeSet(xx): Create a singleton set x\\{x\\}.
  • Find(xx): Return the representative of the set containing xx.
  • Union(x,yx, y): Merge the sets containing xx and yy.

Implementation with path compression and union by rank:

  • Each set is represented as a rooted tree.
  • Find follows parent pointers and compresses the path (makes every node on the path point directly to the root).
  • Union attaches the shorter tree under the root of the taller tree (by rank/size).

Theorem 2.12. With both path compression and union by rank, the amortised time per operation is O(α(n))O(\alpha(n))Where α(n)\alpha(n) is the inverse Ackermann function.

Proof (outline). The inverse Ackermann function α(n)\alpha(n) grows so slowly that α(n)4\alpha(n) \leq 4 for all practical values of nn (n22265536n \leq 2^{2^{2^{65536}}}). The proof uses a potential function argument on the levels of the tree. Each node is assigned a level based on its rank. The key insight is that path compression prevents the number of nodes at high levels from growing. The total number of Find operations that can charge to any level-\ell node is bounded by O(nF1())O(n \cdot F^{-1}(\ell)) where FF is a fast-growing function. Summing over all levels gives O(nα(n))O(n \alpha(n)) total, hence O(α(n))O(\alpha(n)) amortised per operation. \blacksquare

Worked Example: Union-Find for Connected Components

Given an undirected graph with nn vertices and mm edges, determine the connected components.

for each vertex v:
MakeSet(v)
for each edge (u, v):
if Find(u) != Find(v):
Union(u, v)

After processing all edges, vertices in the same connected component have the same representative. Total time: O(n+mα(n))O(n+m)O(n + m \alpha(n)) \approx O(n + m).

2.9 Graphs

A graph G=(V,E)G = (V, E) can be represented by:

  • Adjacency matrix: Aij=1A_{ij} = 1 if (i,j)E(i,j) \in E. Space: O(V2)O(V^2). Edge lookup: O(1)O(1).
  • Adjacency list: For each vertex, store a list of neighbours. Space: O(V+E)O(V + E). Iterating over neighbours: O(deg(v))O(\mathrm{deg}(v)).
OperationAdjacency MatrixAdjacency List
SpaceO(V2)O(V^2)O(V+E)O(V + E)
Check edge (u,v)(u,v)O(1)O(1)O(deg(u))O(\mathrm{deg}(u))
Iterate neighboursO(V)O(V)O(deg(v))O(\mathrm{deg}(v))
Add edgeO(1)O(1)O(1)O(1)
Remove edgeO(1)O(1)O(deg(u))O(\mathrm{deg}(u))

:::caution Common Pitfall Choosing the wrong graph representation can make an algorithm asymptotically slower. Use adjacency matrices for dense graphs (EV2E \approx V^2) and adjacency lists for sparse graphs (EV2E \ll V^2). For example, BFS with an adjacency matrix takes O(V2)O(V^2) but with adjacency lists takes O(V+E)O(V + E). :::

3. Sorting Algorithms

3.1 Merge Sort

Algorithm. Divide the array in half, recursively sort each half, then merge.

MergeSort(A, l, r):
if l < r:
m = (l + r) / 2
MergeSort(A, l, m)
MergeSort(A, m+1, r)
Merge(A, l, m, r)

Theorem 3.1. Merge sort runs in O(nlogn)O(n \log n) time in all cases (best, average, worst).

Proof. The recurrence is T(n)=2T(n/2)+O(n)T(n) = 2T(n/2) + O(n). By the Master theorem (case 2): a=2a = 2, b=2b = 2, f(n)=O(n)=O(nlogba)f(n) = O(n) = O(n^{\log_b a})So T(n)=O(nlogn)T(n) = O(n \log n). \blacksquare

Theorem 3.2. Merge sort is stable and requires O(n)O(n) auxiliary space.

Proof of stability. In MergeWhen elements from the left and right halves are equal, we always take from the left half first. This preserves the relative order of equal elements. \blacksquare

Theorem 3.3 (Correctness of Merge). The Merge procedure produces a sorted array from two sorted subarrays.

Proof. Let A[l..m]A[l..m] and A[m+1..r]A[m+1..r] be sorted subarrays. The merge procedure maintains two indices ii and jj pointing to the next unmerged element in each subarray. At each step, the minimum of A[i]A[i] and A[j]A[j] is appended to the output. By induction on the number of elements placed: if kk elements have been placed and they are the kk smallest among the remaining elements, then the (k+1)(k+1)-th element placed is the minimum of the remaining elements. Since each subarray is sorted, the smallest remaining element is min(A[i],A[j])\min(A[i], A[j]). \blacksquare

Theorem 3.4 (Optimality of Merge Sort). Merge sort achieves the optimal O(nlogn)O(n \log n) comparison-based sorting bound.

Proof. By Theorem 3.7, no comparison sort can do better than Ω(nlogn)\Omega(n \log n). Merge sort achieves O(nlogn)O(n \log n) in all cases, so it is asymptotically optimal among comparison sorts. \blacksquare

Worked Example: Merge Sort Trace

Sort the array [38,27,43,3,9,82,10][38, 27, 43, 3, 9, 82, 10]:

Split: [38,27,43,3][38, 27, 43, 3] and [9,82,10][9, 82, 10]

Split: [38,27][38, 27] and [43,3][43, 3] Split: [38][38] and [27][27] → Merge: [27,38][27, 38] Split: [43][43] and [3][3] → Merge: [3,43][3, 43] Merge: [3,27,38,43][3, 27, 38, 43]

Split: [9,82][9, 82] and [10][10] Split: [9][9] and [82][82] → Merge: [9,82][9, 82] Merge: [9,10,82][9, 10, 82]

Final merge: [3,9,10,27,38,43,82][3, 9, 10, 27, 38, 43, 82]

Total comparisons: 13.

3.2 Quicksort

Algorithm. Choose a pivot, partition the array into elements \leq pivot and >> pivot, recursively Sort each partition.

QuickSort(A, lo, hi):
if lo < hi:
p = Partition(A, lo, hi)
QuickSort(A, lo, p - 1)
QuickSort(A, p + 1, hi)

Partition (Lomuto): Select the last element as pivot. Iterate through the array, maintaining that elements A[lo..i]A[\mathrm{lo}..i] are \leq pivot and A[i+1..j1]A[i+1..j-1] are >> pivot.

Theorem 3.5. Quicksort runs in O(nlogn)O(n \log n) expected time and O(n2)O(n^2) worst-case time.

Proof (expected case). If the pivot is chosen uniformly at random, the expected number of Comparisons is O(nlogn)O(n \log n) by an indicator random variable argument.

Let XijX_{ij} be the indicator random variable that ziz_i and zjz_j are compared, where z1,,znz_1, \ldots, z_n are the sorted elements. Since elements are compared only if one is an ancestor of the other in the recursion tree, and the pivot is chosen uniformly at random:

E[Xij]=Pr(zi and zj are compared)=2ji+1\mathrm{E}[X_{ij}] = \Pr(z_i \mathrm{~and~} z_j \mathrm{~are~compared}) = \frac{2}{j - i + 1}

The total number of comparisons is X=i<jXijX = \sum_{i < j} X_{ij}So:

E[X]=i=1n1j=i+1n2ji+1k=1nn2k+1=O(nlogn)\mathrm{E}[X] = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \frac{2}{j - i + 1} \leq \sum_{k=1}^{n} n \cdot \frac{2}{k+1} = O(n \log n)

Worst case occurs when the pivot is always the smallest or largest element (e.g., already sorted Array with first-element pivot): T(n)=T(n1)+O(n)=O(n2)T(n) = T(n-1) + O(n) = O(n^2). \blacksquare

Theorem 3.6. Quicksort is not stable but uses O(logn)O(\log n) expected stack space.

Worked Example: Quicksort with Median-of-Three Pivot

Sort [3,7,8,5,2,1,9,5,4][3, 7, 8, 5, 2, 1, 9, 5, 4] using median-of-three (first, middle, last element).

Pivot selection: elements at positions 0, 4, 8 are 3,2,43, 2, 4. Median is 33.

Partition around 33: [2,1,3,8,5,7,9,5,4][2, 1, 3, 8, 5, 7, 9, 5, 4]

Recurse on [2,1][2, 1] (pivot 22): [1,2][1, 2] Recurse on [8,5,7,9,5,4][8, 5, 7, 9, 5, 4] (median of 8,7,48, 7, 4 is 77): [4,5,5,7,9,8][4, 5, 5, 7, 9, 8]

Recurse on [4,5,5][4, 5, 5] (pivot 55): [4,5,5][4, 5, 5] Recurse on [9,8][9, 8] (pivot 99): [8,9][8, 9]

Result: [1,2,3,4,5,5,7,8,9][1, 2, 3, 4, 5, 5, 7, 8, 9]

3.3 Heapsort

Algorithm. Build a max-heap in O(n)O(n) time, then repeatedly extract the maximum.

Theorem 3.7. Heapsort runs in O(nlogn)O(n \log n) time in all cases, uses O(1)O(1) auxiliary space, and Is not stable.

Proof. Building the heap takes O(n)O(n) by Theorem 2.11. Each of the nn extract-max operations takes O(logn)O(\log n)For a total of O(nlogn)O(n \log n). Space is O(1)O(1) since the heap is stored in-place. \blacksquare

3.4 Lower Bound on Comparison Sorting

Theorem 3.8. Any comparison-based sorting algorithm requires Ω(nlogn)\Omega(n \log n) comparisons in the Worst case.

Proof. A comparison-based sort can be modelled as a decision tree. The tree must have at least n!n! Leaves (one for each permutation). A binary tree with n!n! leaves has height at least log2(n!)log2((n/2)n/2)=(n/2)log2(n/2)=Ω(nlogn)\log_2(n!) \geq \log_2((n/2)^{n/2}) = (n/2)\log_2(n/2) = \Omega(n \log n).

More precisely, using Stirling’s approximation: log2(n!)=nlog2nnlog2e+O(logn)=Ω(nlogn)\log_2(n!) = n \log_2 n - n \log_2 e + O(\log n) = \Omega(n \log n). \blacksquare

3.5 Counting Sort

Counting sort sorts integers in O(n+k)O(n + k) time where kk is the range of values.

Algorithm:

  1. Count occurrences: C[i]=C[i] = number of elements equal to ii.
  2. Compute prefix sums: C[i]=C[i] = number of elements i\leq i.
  3. Place each element in its correct position (iterating backwards for stability).

Theorem 3.9. Counting sort runs in O(n+k)O(n + k) time and O(n+k)O(n + k) space, and is stable.

Proof. Step 1 takes O(n)O(n) time. Step 2 takes O(k)O(k) time. Step 3 takes O(n)O(n) time. The output array uses O(n)O(n) space and the count array uses O(k)O(k) space. Stability follows from iterating backwards in step 3: the last occurrence of each value is placed first, preserving the order of equal elements. \blacksquare

3.6 Radix Sort

Radix sort processes digits from least significant to most significant (LSD radix sort) using a stable sort (e.g., counting sort) as a subroutine.

Theorem 3.10. LSD radix sort sorts nn integers with dd digits in base bb in O(d(n+b))O(d(n + b)) time.

Proof. We perform dd passes of counting sort, each taking O(n+b)O(n + b) time. After the ii-th pass, the array is sorted by the ii least significant digits. By induction on iiAfter dd passes the array is fully sorted. \blacksquare

Corollary 3.11. For dd-digit integers where d=O(1)d = O(1) (e.g., 32-bit integers), radix sort runs in O(n)O(n) time.

Worked Example: Counting Sort Trace

Sort the array [4,2,2,8,3,3,1][4, 2, 2, 8, 3, 3, 1] using counting sort.

Range of values: [1,8][1, 8]So k=8k = 8.

Step 1 — Count: C=[0,1,2,2,1,0,0,0]C = [0, 1, 2, 2, 1, 0, 0, 0] (indices 1 through 8).

Step 2 — Prefix sums: C=[0,1,3,5,6,6,6,6]C = [0, 1, 3, 5, 6, 6, 6, 6].

Step 3 — Place (iterate backwards):

  • A[6]=1A[6] = 1: C[1]=1C[1] = 1Place at position 0. C[1]=0C[1] = 0.
  • A[5]=3A[5] = 3: C[3]=5C[3] = 5Place at position 4. C[3]=4C[3] = 4.
  • A[4]=3A[4] = 3: C[3]=4C[3] = 4Place at position 3. C[3]=3C[3] = 3.
  • A[3]=8A[3] = 8: C[8]=6C[8] = 6Place at position 5. C[8]=5C[8] = 5.
  • A[2]=2A[2] = 2: C[2]=3C[2] = 3Place at position 2. C[2]=2C[2] = 2.
  • A[1]=2A[1] = 2: C[2]=2C[2] = 2Place at position 1. C[2]=1C[2] = 1.
  • A[0]=4A[0] = 4: C[4]=6C[4] = 6Place at position 5. C[4]=5C[4] = 5.

Result: [1,2,2,3,3,4,8][1, 2, 2, 3, 3, 4, 8].

Worked Example: Radix Sort Trace

Sort the array [170,45,75,90,802,24,2,66][170, 45, 75, 90, 802, 24, 2, 66] using LSD radix sort with base 10.

Pass 1 (ones digit): Sort by 0,5,5,0,2,4,2,60, 5, 5, 0, 2, 4, 2, 6. After stable counting sort: [170,90,802,2,24,45,75,66][170, 90, 802, 2, 24, 45, 75, 66]

Pass 2 (tens digit): Sort by 7,9,0,0,2,4,7,67, 9, 0, 0, 2, 4, 7, 6. After stable counting sort: [802,2,24,45,66,170,75,90][802, 2, 24, 45, 66, 170, 75, 90]

Pass 3 (hundreds digit): Sort by 8,0,0,0,0,1,0,08, 0, 0, 0, 0, 1, 0, 0. After stable counting sort: [2,24,45,66,75,90,170,802][2, 24, 45, 66, 75, 90, 170, 802]

Total: 3 passes, each O(n+10)=O(n)O(n + 10) = O(n). Total: O(3n)=O(n)O(3n) = O(n).

3.7 External Sorting

Problem. Sort data that is too large to fit in main memory.

Algorithm (External Merge Sort):

  1. Read chunks that fit in memory, sort each in memory, write sorted runs to disk.
  2. Repeatedly merge runs using a kk-way merge, reading/writing from disk.

Theorem 3.12. External merge sort with MM bytes of memory and NN bytes of data performs O(NMlogk(NM))O\left(\frac{N}{M} \log_{k}\left(\frac{N}{M}\right)\right) passes, where kk is the merge fan-in.

Proof. The initial pass creates N/MN/M sorted runs of size MM. Each merge pass combines kk runs into 1, reducing the number of runs by a factor of kk. After logk(N/M)\log_k(N/M) passes, a single sorted run remains. Each pass reads and writes all NN bytes, so total I/O is O(Nlogk(N/M))O(N \log_k(N/M)). \blacksquare

:::caution Common Pitfall The O(nlogn)O(n \log n) lower bound applies only to comparison-based sorting. Non-comparison sorts Like radix sort can achieve O(n)O(n) time for integers in a bounded range. However, non-comparison sorts sacrifice generality: they depend on the structure of the keys and cannot sort arbitrary objects. :::

3.8 Comparison of Sorting Algorithms

AlgorithmBestAverageWorstSpaceStable
Merge SortO(nlogn)O(n \log n)O(nlogn)O(n \log n)O(nlogn)O(n \log n)O(n)O(n)Yes
QuicksortO(nlogn)O(n \log n)O(nlogn)O(n \log n)O(n2)O(n^2)O(logn)O(\log n)No
HeapsortO(nlogn)O(n \log n)O(nlogn)O(n \log n)O(nlogn)O(n \log n)O(1)O(1)No
Counting SortO(n+k)O(n + k)O(n+k)O(n + k)O(n+k)O(n + k)O(n+k)O(n + k)Yes
Radix SortO(d(n+b))O(d(n+b))O(d(n+b))O(d(n+b))O(d(n+b))O(d(n+b))O(n+b)O(n + b)Yes

4. Graph Algorithms

4.1 Breadth-First Search (BFS)

BFS explores the graph level by level from a source vertex.

BFS(G, s):
for each v in V:
d[v] = INF
pi[v] = NIL
d[s] = 0
Q = {s}
while Q is not empty:
u = Q.dequeue()
for each v in G.adj[u]:
if d[v] == INF:
d[v] = d[u] + 1
pi[v] = u
Q.enqueue(v)

Theorem 4.1. BFS runs in O(V+E)O(V + E) time and discovers shortest paths (in terms of number of Edges) from the source to all reachable vertices.

Proof. Time: Each vertex is enqueued at most once and dequeued at most once: O(V)O(V). Each edge is examined at most twice (once from each endpoint): O(E)O(E). Total: O(V+E)O(V + E).

Correctness: We prove by induction on the length of shortest paths. Let vv be a vertex discovered via edge (u,v)(u, v). At the time vv is discovered, d[u]d[u] equals the shortest-path distance from ss to uu (induction hypothesis). Since BFS processes vertices in non-decreasing order of distance, d[v]=d[u]+1d[v] = d[u] + 1 is the shortest distance to vv. Any path to vv must go through some vertex at distance at most d[u]d[u] (the predecessor on the path), and all such vertices have already been processed. \blacksquare

Worked Example: BFS on a Weighted Path Problem

Find the shortest path (fewest edges) from AA to all other vertices in the graph with edges: (A,B),(A,C),(B,D),(C,D),(C,E),(D,F),(E,F)(A,B), (A,C), (B,D), (C,D), (C,E), (D,F), (E,F).

Starting from AA:

  • Level 0: AA (distance 0)
  • Level 1: B,CB, C (distance 1)
  • Level 2: D,ED, E (distance 2)
  • Level 3: FF (distance 3)

Shortest path from AA to FF: ACEFA \to C \to E \to F (or ABDFA \to B \to D \to F or ACDFA \to C \to D \to F), all of length 3.

4.2 Depth-First Search (DFS)

DFS explores as far as possible along each branch before backtracking.

DFS(G):
for each v in V:
colour[v] = WHITE
pi[v] = NIL
time = 0
for each v in V:
if colour[v] == WHITE:
DFS-Visit(G, v)
DFS-Visit(G, u):
colour[u] = GREY
time += 1
d[u] = time
for each v in G.adj[u]:
if colour[v] == WHITE:
pi[v] = u
DFS-Visit(G, v)
colour[u] = BLACK
time += 1
f[u] = time

Theorem 4.2. DFS runs in O(V+E)O(V + E) time and can be used to detect cycles, find connected Components, compute topological orderings, and identify strongly connected components.

Proof (time). Each vertex is coloured exactly once (from WHITE to GREY) and finished exactly once (from GREY to BLACK): O(V)O(V). Each edge is examined at most twice (once in each direction for undirected, once for directed): O(E)O(E). Total: O(V+E)O(V + E). \blacksquare

Theorem 4.3 (Parenthesis Theorem). In any DFS, for any two vertices uu and vvExactly one of the following holds: (1) d[u]<d[v]<f[v]<f[u]d[u] \lt d[v] \lt f[v] \lt f[u] (interval nesting), (2) d[v]<d[u]<f[u]<f[v]d[v] \lt d[u] \lt f[u] \lt f[v] (interval nesting), or (3) the intervals [d[u],f[u]][d[u], f[u]] and [d[v],f[v]][d[v], f[v]] are disjoint.

Proof. The DFS call stack forms a nesting of intervals. When we start visiting vv from uuWe must finish vv before finishing uuGiving nesting. If uu and vv are in different DFS trees, their intervals are disjoint. \blacksquare

Theorem 4.4 (White-Path Theorem). vv is a descendant of uu in the DFS forest if and only if, at the time d[u]d[u] is discovered, there is a path from uu to vv consisting entirely of white vertices.

Proof. (\Rightarrow) By induction on the depth of vv in the DFS tree. If vv is a child of uuThen vv was white when discovered from uu. If vv is a deeper descendant, the path goes through intermediate white vertices.

(\Leftarrow) Suppose there is a white path from uu to vv at time d[u]d[u]. Let ww be the first vertex on this path discovered after uu. All vertices before ww on the path are still white (they can only be discovered after ww), so ww will be discovered from the path. By induction, vv is a descendant of wwHence of uu. \blacksquare

4.3 Topological Sort

A topological ordering of a DAG is a linear ordering of vertices such that for every directed edge (u,v)(u, v), uu appears before vv.

Algorithm: Run DFS on the DAG. Output vertices in reverse order of finishing times.

Theorem 4.5. The reverse post-order of a DFS on a DAG produces a valid topological ordering.

Proof. Suppose there is an edge (u,v)(u, v) but uu appears after vv in the ordering (i.e., f[u]<f[v]f[u] \lt f[v]). Since (u,v)(u, v) is an edge, when uu is being explored (coloured GREY), if vv is WHITE, then vv is discovered as a descendant of uuSo f[v]<f[u]f[v] \lt f[u]Contradiction. If vv is GREY, we have a back edge, implying a cycle, contradicting that the graph is acyclic. If vv is BLACK, then f[v]<d[u]<f[u]f[v] \lt d[u] \lt f[u]Contradicting f[u]<f[v]f[u] \lt f[v]. \blacksquare

4.4 Strongly Connected Components

Two vertices uu and vv are strongly connected if there is a path from uu to vv and from vv to uu. A strongly connected component (SCC) is a maximal set of strongly connected vertices.

Kosaraju’s Algorithm:

  1. Run DFS on GGRecording finishing times.
  2. Compute GTG^T (transpose of GG: reverse all edges).
  3. Run DFS on GTG^T in decreasing order of finishing times. Each DFS tree is an SCC.

Theorem 4.6. Kosaraju’s algorithm correctly identifies all SCCs in O(V+E)O(V + E) time.

Proof. Let CC be the SCC containing the vertex ss with the highest finishing time in the first DFS. We claim that in GTG^TNo vertex in CC can reach a vertex outside CC (otherwise ss would have a path to and from that vertex, placing it in CC). In the second DFS, starting from ss in GTG^TWe discover exactly the vertices in CC. Removing CC and repeating the argument gives the remaining SCCs. \blacksquare

Tarjan’s Algorithm finds SCCs in a single DFS pass using a stack and low-link values, also in O(V+E)O(V + E) time.

4.5 Dijkstra’s Algorithm

Problem. Find shortest paths from a single source in a weighted graph with non-negative edge Weights.

Algorithm. Maintain a priority queue of vertices keyed by their current shortest-path estimate. Repeatedly extract the vertex with minimum distance and relax its edges.

Dijkstra(G, w, s):
for each v in V:
d[v] = INF
pi[v] = NIL
d[s] = 0
S = {} // processed vertices
Q = V // priority queue keyed by d[]
while Q is not empty:
u = ExtractMin(Q)
S = S ∪ {u}
for each v in G.adj[u]:
if d[v] > d[u] + w(u, v):
d[v] = d[u] + w(u, v)
pi[v] = u
DecreaseKey(Q, v, d[v])

Theorem 4.7. Dijkstra’s algorithm correctly computes shortest paths from the source.

Proof. We prove by induction on S|S| that when a vertex uu is added to SS, d[u]=δ(s,u)d[u] = \delta(s, u) (the true shortest-path distance).

Base case: ss is the first vertex added, and d[s]=0=δ(s,s)d[s] = 0 = \delta(s, s).

Inductive step: Suppose all vertices in SS have correct distances. Let uu be the next vertex extracted from QQ. Suppose for contradiction that d[u]>δ(s,u)d[u] > \delta(s, u). Consider a shortest path PP from ss to uuAnd let (x,y)(x, y) be the first edge on PP where xSx \in S and ySy \notin S. Then δ(s,y)=δ(s,x)+w(x,y)=d[x]+w(x,y)\delta(s, y) = \delta(s, x) + w(x, y) = d[x] + w(x, y) (by induction). When xx was added to SSThe edge (x,y)(x, y) was relaxed, so d[y]d[x]+w(x,y)=δ(s,y)d[y] \leq d[x] + w(x, y) = \delta(s, y). Since edge weights are non-negative, δ(s,y)δ(s,u)\delta(s, y) \leq \delta(s, u). But d[y]δ(s,y)δ(s,u)<d[u]d[y] \leq \delta(s, y) \leq \delta(s, u) \lt d[u]And yy is in QQContradicting that uu has the minimum dd-value in QQ. \blacksquare

Theorem 4.8. Dijkstra’s algorithm with a binary heap runs in O((V+E)logV)O((V + E)\log V) time. With a Fibonacci heap: O(VlogV+E)O(V \log V + E).

Proof. Each vertex is extracted from the priority queue at most once: O(VlogV)O(V \log V). Each edge is Relaxed at most once, and each relaxation may cause a DecreaseKey: O(ElogV)O(E \log V). Total: O((V+E)logV)O((V + E)\log V). With a Fibonacci heap, DecreaseKey is O(1)O(1) amortised, giving O(VlogV+E)O(V \log V + E). \blacksquare

Worked Example: Dijkstra's Algorithm

Find shortest paths from AA in the graph:

  • A4BA \xrightarrow{4} B, A2CA \xrightarrow{2} C
  • B3DB \xrightarrow{3} D, B1EB \xrightarrow{1} E
  • C1BC \xrightarrow{1} B, C5DC \xrightarrow{5} D
  • D2ED \xrightarrow{2} E

Initial: d[A]=0d[A]=0, d[B]=d[C]=d[D]=d[E]=d[B]=d[C]=d[D]=d[E]=\infty

Extract AA: relax B4B \to 4, C2C \to 2. Q=C(2),B(4),D(),E()Q = \\{C(2), B(4), D(\infty), E(\infty)\\}

Extract CC: relax Bmin(4,2+1)=3B \to \min(4, 2+1)=3, D2+5=7D \to 2+5=7. Q=B(3),D(7),E()Q = \\{B(3), D(7), E(\infty)\\}

Extract BB: relax Dmin(7,3+3)=6D \to \min(7, 3+3)=6, E3+1=4E \to 3+1=4. Q=E(4),D(6)Q = \\{E(4), D(6)\\}

Extract EE: relax Dmin(6,4+2)=6D \to \min(6, 4+2)=6. Q=D(6)Q = \\{D(6)\\}

Extract DD: no improvements.

Result: d[A]=0d[A]=0, d[B]=3d[B]=3, d[C]=2d[C]=2, d[D]=6d[D]=6, d[E]=4d[E]=4

4.6 Bellman-Ford Algorithm

Problem. Find shortest paths from a single source, allowing negative edge weights (but no negative Cycles).

BellmanFord(G, w, s):
for each v in V:
d[v] = INF
pi[v] = NIL
d[s] = 0
for i = 1 to |V| - 1:
for each edge (u, v) in E:
if d[v] > d[u] + w(u, v):
d[v] = d[u] + w(u, v)
pi[v] = u
for each edge (u, v) in E:
if d[v] > d[u] + w(u, v):
return "negative cycle detected"

Theorem 4.9. Bellman-Ford runs in O(VE)O(VE) time. It correctly detects negative-weight cycles.

Proof. After ii iterations of the relaxation loop, d[v]d[v] is at most the weight of the shortest path from ss to vv using at most ii edges. This is proved by induction on ii.

Base case (i=0i = 0): Only d[s]=0d[s] = 0 is finite, which is the weight of the empty path.

Inductive step: A shortest path from ss to vv using at most ii edges either uses at most i1i-1 edges (handled by induction) or uses exactly ii edges, ending with edge (u,v)(u, v) where the shortest path to uu uses at most i1i-1 edges. The relaxation of (u,v)(u, v) in iteration ii handles this case.

After V1V - 1 iterations, all shortest paths (which have at most V1V - 1 edges) are correct. A VV-th iteration that updates any distance indicates a path of length VV with strictly decreasing weight, which contains a cycle of negative weight. \blacksquare

Worked Example: Bellman-Ford with Negative Edge Weights

Find shortest paths from AA in the graph:

  • A6BA \xrightarrow{6} B, A7CA \xrightarrow{7} C
  • B5CB \xrightarrow{5} C, B4DB \xrightarrow{-4} D, B8EB \xrightarrow{8} E
  • C3DC \xrightarrow{-3} D, C9EC \xrightarrow{9} E
  • D2ED \xrightarrow{2} E, D7FD \xrightarrow{7} F
  • E5FE \xrightarrow{-5} F

Initial: d=[,,,,,]d = [\infty, \infty, \infty, \infty, \infty, \infty] for [A,B,C,D,E,F][A, B, C, D, E, F]. d[A]=0d[A] = 0.

Iteration 1: Relax all edges.

  • (A,B)(A,B): d[B]=6d[B] = 6. (A,C)(A,C): d[C]=7d[C] = 7.
  • (B,C)(B,C): min(7,6+5)=7\min(7, 6+5)=7. (B,D)(B,D): d[D]=6+(4)=2d[D] = 6+(-4) = 2. (B,E)(B,E): d[E]=6+8=14d[E] = 6+8 = 14.
  • (C,D)(C,D): min(2,7+(3))=2\min(2, 7+(-3)) = 2. (C,E)(C,E): min(14,7+9)=14\min(14, 7+9) = 14.
  • (D,E)(D,E): min(14,2+2)=4\min(14, 2+2) = 4. (D,F)(D,F): d[F]=2+7=9d[F] = 2+7 = 9.
  • (E,F)(E,F): min(9,4+(5))=1\min(9, 4+(-5)) = -1. After iter 1: d=[0,6,7,2,4,1]d = [0, 6, 7, 2, 4, -1].

Iteration 2: Relax all edges.

  • (B,D)(B,D): min(2,6+(4))=2\min(2, 6+(-4)) = 2. (B,E)(B,E): min(4,6+8)=4\min(4, 6+8) = 4.
  • (D,E)(D,E): min(4,2+2)=4\min(4, 2+2) = 4. (D,F)(D,F): min(1,2+7)=1\min(-1, 2+7) = -1.
  • (E,F)(E,F): min(1,4+(5))=1\min(-1, 4+(-5)) = -1. No changes: d=[0,6,7,2,4,1]d = [0, 6, 7, 2, 4, -1].

Iteration 3—5: No changes. Algorithm terminates.

Negative cycle check: No edge can be relaxed. No negative cycle.

Result: d=[0,6,7,2,4,1]d = [0, 6, 7, 2, 4, -1]. Shortest path to FF: ABDEFA \to B \to D \to E \to FCost 1-1.

Worked Example: Bellman-Ford Negative Cycle Detection

Graph: A1BA \xrightarrow{1} B, B3CB \xrightarrow{-3} C, C2AC \xrightarrow{2} A. This has a cycle ABCAA \to B \to C \to A of weight 1+(3)+2=01 + (-3) + 2 = 0. Not negative.

Now add C1AC \xrightarrow{-1} A. Cycle weight: 1+(3)+(1)=31 + (-3) + (-1) = -3. Negative cycle.

Initial: d[A]=0d[A] = 0Rest \infty.

Iteration 1: d[B]=1d[B] = 1, d[C]=2d[C] = -2, d[A]=min(0,2+(1))=3d[A] = \min(0, -2 + (-1)) = -3.

Iteration 2: d[B]=2d[B] = -2, d[C]=5d[C] = -5, d[A]=6d[A] = -6.

Iteration 3: d[B]=5d[B] = -5, d[C]=8d[C] = -8, d[A]=9d[A] = -9.

Iteration 4: d[B]=8d[B] = -8, d[C]=11d[C] = -11, d[A]=12d[A] = -12.

Iteration 5: d[B]=11d[B] = -11, d[C]=14d[C] = -14, d[A]=15d[A] = -15.

Check (iteration 6): (A,B)(A,B): 15+1=14<11-15 + 1 = -14 \lt -11Can still relax. Negative cycle detected!

4.7 Floyd-Warshall Algorithm

Problem. Find all-pairs shortest paths.

Algorithm. For k=1,,Vk = 1, \ldots, V: for each pair (i,j)(i, j)Check if going through vertex kk Improves the path.

dij(k)=min(dij(k1),dik(k1)+dkj(k1))d_{ij}^{(k)} = \min(d_{ij}^{(k-1)}, d_{ik}^{(k-1)} + d_{kj}^{(k-1)})

Derivation. Define dij(k)d_{ij}^{(k)} as the shortest-path distance from ii to jj using only intermediate vertices from 1,2,,k\\{1, 2, \ldots, k\\}. Then:

  • dij(0)=w(i,j)d_{ij}^{(0)} = w(i,j) (the weight of edge (i,j)(i,j)Or \infty if no edge).
  • For k1k \geq 1: The shortest path from ii to jj through vertices 1,,k\\{1, \ldots, k\\} either does not use vertex kk (giving dij(k1)d_{ij}^{(k-1)}) or uses vertex kk (giving dik(k1)+dkj(k1)d_{ik}^{(k-1)} + d_{kj}^{(k-1)}).

Theorem 4.10. Floyd-Warshall runs in O(V3)O(V^3) time and O(V2)O(V^2) space.

Proof. The triple nested loop (kk, ii, jj) executes V3V^3 iterations, each doing O(1)O(1) work. The distance matrix requires V2V^2 space. Note that dij(k)d_{ij}^{(k)} can overwrite dij(k1)d_{ij}^{(k-1)} in place because dik(k)=dik(k1)d_{ik}^{(k)} = d_{ik}^{(k-1)} and dkj(k)=dkj(k1)d_{kj}^{(k)} = d_{kj}^{(k-1)} (paths from ii to kk and kk to jj using vertices up to kk cannot be improved by going through kk again without a negative cycle). \blacksquare

Worked Example: Floyd-Warshall on 4 Vertices

Find all-pairs shortest paths for the graph with vertices 1,2,3,4\\{1, 2, 3, 4\\} and edges: w(1,2)=3w(1,2) = 3, w(1,3)=8w(1,3) = 8, w(1,4)=4w(1,4) = -4 w(2,1)=5w(2,1) = 5, w(2,3)=7w(2,3) = 7, w(2,4)=2w(2,4) = 2 w(3,1)=2w(3,1) = 2, w(3,4)=1w(3,4) = -1 w(4,1)=6w(4,1) = 6, w(4,3)=9w(4,3) = 9.

Initial distance matrix D(0)D^{(0)}: D(0)=(03845072201690)D^{(0)} = \begin{pmatrix} 0 & 3 & 8 & -4 \\ 5 & 0 & 7 & 2 \\ 2 & \infty & 0 & -1 \\ 6 & \infty & 9 & 0 \end{pmatrix}

k=1k = 1 (through vertex 1): D(1)[2][3]=min(7,5+8)=7D^{(1)}[2][3] = \min(7, 5 + 8) = 7. D(1)[2][4]=min(2,5+(4))=1D^{(1)}[2][4] = \min(2, 5 + (-4)) = 1. D(1)[3][2]=min(,2+3)=5D^{(1)}[3][2] = \min(\infty, 2 + 3) = 5. D(1)[3][4]=min(1,2+(4))=2D^{(1)}[3][4] = \min(-1, 2 + (-4)) = -2. D(1)[4][2]=min(,6+3)=9D^{(1)}[4][2] = \min(\infty, 6 + 3) = 9. D(1)[4][3]=min(9,6+8)=9D^{(1)}[4][3] = \min(9, 6 + 8) = 9.

D(1)=(0384507125026990)D^{(1)} = \begin{pmatrix} 0 & 3 & 8 & -4 \\ 5 & 0 & 7 & 1 \\ 2 & 5 & 0 & -2 \\ 6 & 9 & 9 & 0 \end{pmatrix}

k=2k = 2 (through vertex 2): D(2)[1][3]=min(8,3+7)=8D^{(2)}[1][3] = \min(8, 3 + 7) = 8. D(2)[1][4]=min(4,3+1)=4D^{(2)}[1][4] = \min(-4, 3 + 1) = -4. D(2)[3][1]=min(2,5+5)=2D^{(2)}[3][1] = \min(2, 5 + 5) = 2. D(2)[3][4]=min(2,5+1)=2D^{(2)}[3][4] = \min(-2, 5 + 1) = -2. D(2)[4][1]=min(6,9+5)=6D^{(2)}[4][1] = \min(6, 9 + 5) = 6. D(2)[4][3]=min(9,9+7)=9D^{(2)}[4][3] = \min(9, 9 + 7) = 9.

D(2)=(0384507125026990)D^{(2)} = \begin{pmatrix} 0 & 3 & 8 & -4 \\ 5 & 0 & 7 & 1 \\ 2 & 5 & 0 & -2 \\ 6 & 9 & 9 & 0 \end{pmatrix}

k=3k = 3 (through vertex 3): D(3)[1][2]=min(3,8+5)=3D^{(3)}[1][2] = \min(3, 8 + 5) = 3. D(3)[1][4]=min(4,8+(2))=4D^{(3)}[1][4] = \min(-4, 8 + (-2)) = -4. D(3)[2][1]=min(5,7+2)=5D^{(3)}[2][1] = \min(5, 7 + 2) = 5. D(3)[2][4]=min(1,7+(2))=1D^{(3)}[2][4] = \min(1, 7 + (-2)) = 1. D(3)[4][1]=min(6,9+2)=6D^{(3)}[4][1] = \min(6, 9 + 2) = 6. D(3)[4][2]=min(9,9+5)=9D^{(3)}[4][2] = \min(9, 9 + 5) = 9.

D(3)=(0384507125026990)D^{(3)} = \begin{pmatrix} 0 & 3 & 8 & -4 \\ 5 & 0 & 7 & 1 \\ 2 & 5 & 0 & -2 \\ 6 & 9 & 9 & 0 \end{pmatrix}

k=4k = 4 (through vertex 4): D(4)[1][2]=min(3,4+9)=3D^{(4)}[1][2] = \min(3, -4 + 9) = 3. D(4)[1][3]=min(8,4+9)=5D^{(4)}[1][3] = \min(8, -4 + 9) = 5. D(4)[2][1]=min(5,1+6)=5D^{(4)}[2][1] = \min(5, 1 + 6) = 5. D(4)[2][3]=min(7,1+9)=7D^{(4)}[2][3] = \min(7, 1 + 9) = 7. D(4)[3][1]=min(2,2+6)=2D^{(4)}[3][1] = \min(2, -2 + 6) = 2. D(4)[3][2]=min(5,2+9)=5D^{(4)}[3][2] = \min(5, -2 + 9) = 5.

D(4)=(0354507125026990)D^{(4)} = \begin{pmatrix} 0 & 3 & 5 & -4 \\ 5 & 0 & 7 & 1 \\ 2 & 5 & 0 & -2 \\ 6 & 9 & 9 & 0 \end{pmatrix}

4.8 Minimum Spanning Tree

Kruskal’s Algorithm. Sort edges by weight, add edges to the MST as long as they don’t create a Cycle (using Union-Find). O(ElogE)O(E \log E).

Prim’s Algorithm. Start from any vertex, repeatedly add the minimum-weight edge connecting the Tree to a non-tree vertex (using a priority queue). O((V+E)logV)O((V + E)\log V).

Theorem 4.11 (Cut Property). For any cut of the graph, the minimum-weight edge crossing the cut Belongs to some MST.

Proof. Let (S,VS)(S, V \setminus S) be a cut and e=(u,v)e = (u, v) be the minimum-weight crossing edge with uSu \in S, vSv \notin S. Let TT be an MST. If eTe \in TWe are done. Otherwise, adding ee to TT creates a cycle. This cycle must cross the cut at least once more (it goes from uu to vv via some other path). Let ee' be another crossing edge on this cycle. Since ee is the minimum-weight crossing edge, w(e)w(e)w(e) \leq w(e'). Replacing ee' with ee in TT gives a spanning tree of weight no greater than TTHence an MST containing ee. \blacksquare

Theorem 4.12 (Cycle Property). For any cycle, the maximum-weight edge on the cycle does not belong To any MST.

Proof. Let CC be a cycle and ee be the maximum-weight edge on CC. Let TT be an MST. If eTe \notin TWe are done. Otherwise, removing ee from TT disconnects it into two components. The rest of cycle CC must contain an edge eee' \neq e crossing this cut. Since w(e)<w(e)w(e') \lt w(e) (if w(e)=w(e)w(e') = w(e)We can replace either), replacing ee with ee' gives a spanning tree of strictly smaller weight, contradicting the optimality of TT. \blacksquare

Theorem 4.13. Kruskal’s algorithm produces a minimum spanning tree.

Proof. We prove by induction on the number of edges added. At each step, the algorithm adds the minimum-weight edge that does not create a cycle. Consider the cut formed by the two connected components that the new edge connects. By the cut property, this minimum-weight crossing edge belongs to some MST. Since all previously added edges are in every MST (by induction), the edge set maintained by the algorithm is always a subset of some MST. \blacksquare

Theorem 4.14. Prim’s algorithm produces a minimum spanning tree.

Proof. By induction on the size of the tree. At each step, Prim’s adds the minimum-weight edge connecting the current tree TT to a vertex outside TT. Consider the cut (T,VT)(T, V \setminus T). By the cut property, the minimum-weight crossing edge belongs to some MST. \blacksquare

Worked Example: Kruskal's Algorithm

Find the MST of the graph with edges (sorted by weight): (D,E,2)(D, E, 2), (C,E,3)(C, E, 3), (A,B,4)(A, B, 4), (B,C,5)(B, C, 5), (B,E,6)(B, E, 6), (A,E,7)(A, E, 7), (A,D,8)(A, D, 8), (C,D,9)(C, D, 9).

Vertices: A,B,C,D,E\\{A, B, C, D, E\\}.

Sorted edges: (D,E,2)(D,E,2), (C,E,3)(C,E,3), (A,B,4)(A,B,4), (B,C,5)(B,C,5), (B,E,6)(B,E,6), (A,E,7)(A,E,7), (A,D,8)(A,D,8), (C,D,9)(C,D,9).

Process each edge:

  1. (D,E,2)(D, E, 2): Add. Forest: DE\\{D-E\\}. Cost: 2.
  2. (C,E,3)(C, E, 3): Add. Forest: CDE\\{C-D-E\\}. Cost: 5.
  3. (A,B,4)(A, B, 4): Add. Forest: AB\\{A-B\\}, CDE\\{C-D-E\\}. Cost: 9.
  4. (B,C,5)(B, C, 5): Add (connects two components). Forest: ABCDE\\{A-B-C-D-E\\}. Cost: 14.
  5. (B,E,6)(B, E, 6): Skip (creates cycle BCDEBB-C-D-E-B).
  6. (A,E,7)(A, E, 7): Skip (creates cycle).
  7. (A,D,8)(A, D, 8): Skip (creates cycle).
  8. (C,D,9)(C, D, 9): Skip (creates cycle).

MST: (D,E,2)(D,E,2), (C,E,3)(C,E,3), (A,B,4)(A,B,4), (B,C,5)(B,C,5). Total weight: 14.

Worked Example: Prim's Algorithm

Find the MST of the same graph starting from vertex AA.

Edges from AA: (A,B,4)(A,B,4), (A,E,7)(A,E,7), (A,D,8)(A,D,8). Minimum: (A,B,4)(A,B,4). Tree: A,B\\{A, B\\}. Cost: 4.

Edges crossing cut: (B,C,5)(B,C,5), (B,E,6)(B,E,6), (A,E,7)(A,E,7), (A,D,8)(A,D,8). Minimum: (B,C,5)(B,C,5). Tree: A,B,C\\{A, B, C\\}. Cost: 9.

Edges crossing cut: (C,E,3)(C,E,3), (C,D,9)(C,D,9), (B,E,6)(B,E,6), (A,E,7)(A,E,7), (A,D,8)(A,D,8). Minimum: (C,E,3)(C,E,3). Tree: A,B,C,E\\{A, B, C, E\\}. Cost: 12.

Edges crossing cut: (D,E,2)(D,E,2), (C,D,9)(C,D,9), (A,D,8)(A,D,8). Minimum: (D,E,2)(D,E,2). Tree: A,B,C,D,E\\{A, B, C, D, E\\}. Cost: 14.

MST: (A,B,4)(A,B,4), (B,C,5)(B,C,5), (C,E,3)(C,E,3), (D,E,2)(D,E,2). Total weight: 14 (same as Kruskal’s).

5. Dynamic Programming

5.1 Principles

Dynamic programming (DP) solves problems by:

  1. Overlapping subproblems: The same subproblems are solved repeatedly.
  2. Optimal substructure: The optimal solution contains optimal solutions to subproblems.

Approaches:

  • Top-down (memoisation): Recursive with caching.
  • Bottom-up (tabulation): Fill a table iteratively from small subproblems to large.

5.2 Memoisation vs. Tabulation

AspectMemoisation (Top-Down)Tabulation (Bottom-Up)
ApproachRecursive with cacheIterative table fill
OrderNatural recursion orderDependency order
SpaceO(n)O(n) stack + O(n)O(n) tableO(n)O(n) table only
OverheadFunction call overheadMinimal
SubproblemsComputes only neededComputes all
Best forWhen not all subproblems neededWhen all needed

When to use which:

  • Use memoisation when the subproblem space is sparse (not all subproblems are needed).
  • Use tabulation when most subproblems are needed (avoids recursion overhead and stack overflow).
  • Both achieve the same asymptotic time complexity.

5.3 Optimal Substructure Proof Technique

To prove that a problem has optimal substructure:

  1. Show that an optimal solution to the problem includes an optimal solution to a subproblem.
  2. Proved by contradiction: if the optimal solution contained a suboptimal sub-solution, replacing it with an optimal one would improve the overall solution.

Example (Shortest Path). If pp is a shortest path from uu to vv and ww is an intermediate vertex on ppThen the subpath of pp from uu to ww is a shortest path from uu to ww.

Proof. If not, there exists a shorter path pp' from uu to ww. Then pp' concatenated with the subpath of pp from ww to vv would be shorter than ppContradicting that pp is a shortest path. \blacksquare

:::caution Common Pitfall Not all problems have optimal substructure. For example, the longest simple path problem does not: the longest simple path from uu to vv may not contain the longest simple path from uu to an intermediate vertex wwBecause the subpath might share vertices with the rest of the path, creating a non-simple path. :::

5.4 Common Patterns

1D DP. dp[i]dp[i] depends on dp[j]dp[j] for j<ij < i. Example: Fibonacci, longest increasing subsequence.

2D DP. dp[i][j]dp[i][j] depends on dp[i][j]dp[i'][j'] for (i,j)(i', j') in some set. Example: edit distance, Matrix chain multiplication, longest common subsequence.

Interval DP. dp[i][j]dp[i][j] depends on dp[i][j]dp[i'][j'] where iijji \leq i' \leq j' \leq j. Example: Optimal BST, matrix chain multiplication.

5.5 Worked Example: 0/1 Knapsack

Problem. Given nn items with weights w1,,wnw_1, \ldots, w_n and values v1,,vnv_1, \ldots, v_nAnd a knapsack of capacity WWMaximise the total value without exceeding the capacity.

Recurrence:

dp[i][c]={0ifi=0orc=0dp[i1][c]ifwi>cmax(dp[i1][c],dp[i1][cwi]+vi)ifwicdp[i][c] = \begin{cases} 0 & \mathrm{if} i = 0 \mathrm{ or} c = 0 \\ dp[i-1][c] & \mathrm{if} w_i > c \\ \max(dp[i-1][c], dp[i-1][c - w_i] + v_i) & \mathrm{if} w_i \leq c \end{cases}

Time: O(nW)O(nW). Space: O(nW)O(nW) (can be reduced to O(W)O(W) with 1D array).

Proof of correctness. For each item iiEither we don’t include it (value dp[i1][c]dp[i-1][c]) or we include it (value vi+dp[i1][cwi]v_i + dp[i-1][c - w_i]). The optimal choice is the maximum. The base cases are correct. \blacksquare

Worked Example: 0/1 Knapsack

Items: (w=1,v=1),(w=3,v=4),(w=4,v=5),(w=5,v=7)\\{(w=1, v=1), (w=3, v=4), (w=4, v=5), (w=5, v=7)\\}Capacity W=7W = 7.

Building the DP table (items as rows, capacities 0-7 as columns):

c: 0 1 2 3 4 5 6 7
i=0: 0 0 0 0 0 0 0 0
i=1: 0 1 1 1 1 1 1 1
i=2: 0 1 1 4 5 5 5 5
i=3: 0 1 1 4 5 6 6 9
i=4: 0 1 1 4 5 7 8 9

Maximum value: dp[4][7]=9dp[4][7] = 9 (items 2 and 4: w=3+5=7w = 3 + 5 = 7, v=4+7=11v = 4 + 7 = 11 — let me recalculate).

Correct: items 2 and 3 (w=3+4=7w=3+4=7, v=4+5=9v=4+5=9), or items 1, 2, 4 (w=1+3+5=9>7w=1+3+5=9 > 7Not valid). Items 1, 3 (w=1+4=5w=1+4=5, v=1+5=6v=1+5=6), items 2, 4 (w=3+5=8>7w=3+5=8 > 7). Optimal: items 2 and 3 (w=3+4=7w=3+4=7, v=4+5=9v=4+5=9).

5.6 Worked Example: Edit Distance (Levenshtein Distance)

Problem. Given strings ss of length mm and tt of length nnFind the minimum number of insertions, deletions, and substitutions to transform ss into tt.

Recurrence:

dp[i][j]={jifi=0iifj=0dp[i1][j1]ifs[i]=t[j]1+min(dp[i1][j],dp[i][j1],dp[i1][j1])ifs[i]t[j]dp[i][j] = \begin{cases} j & \mathrm{if} i = 0 \\ i & \mathrm{if} j = 0 \\ dp[i-1][j-1] & \mathrm{if} s[i] = t[j] \\ 1 + \min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) & \mathrm{if} s[i] \neq t[j] \end{cases}

Where the three cases in the minimum are: delete from ssInsert into ssSubstitute in ss.

Time: O(mn)O(mn). Space: O(mn)O(mn) (can be reduced to O(min(m,n))O(\min(m,n))).

Worked Example: Edit Distance

Compute the edit distance between “kitten” and “sitting”.

Building the DP table:

"" s i t t i n g
"" 0 1 2 3 4 5 6 7
k 1 1 2 3 4 5 6 7
i 2 2 1 2 3 4 5 6
t 3 3 2 1 2 3 4 5
t 4 4 3 2 1 2 3 4
e 5 5 4 3 2 2 3 4
n 6 6 5 4 3 3 2 3

Edit distance: dp[6][7]=3dp[6][7] = 3.

Transform: kitten → sitten (substitute k→s) → sittin (substitute e→i) → sitting (insert g).

5.7 Worked Example: Matrix Chain Multiplication

Problem. Given matrices A1,A2,,AnA_1, A_2, \ldots, A_n where AiA_i has dimensions pi1×pip_{i-1} \times p_iFind the parenthesisation that minimises the total number of scalar multiplications.

Recurrence:

dp[i][j]={0ifi=jminik<j(dp[i][k]+dp[k+1][j]+pi1pkpj)ifi<jdp[i][j] = \begin{cases} 0 & \mathrm{if} i = j \\ \min_{i \leq k < j} (dp[i][k] + dp[k+1][j] + p_{i-1} p_k p_j) & \mathrm{if} i < j \end{cases}

Time: O(n3)O(n^3). Space: O(n2)O(n^2).

Proof of correctness. The optimal parenthesisation of AiAjA_i \cdots A_j splits at some position kk: (AiAk)(Ak+1Aj)(A_i \cdots A_k)(A_{k+1} \cdots A_j). The cost is the cost of the left subproduct plus the cost of the right subproduct plus the cost of multiplying the resulting matrices (pi1pkpjp_{i-1} \cdot p_k \cdot p_j scalar multiplications). The optimal kk minimises this total. \blacksquare

Worked Example: Matrix Chain Multiplication

Matrices: A1A_1 (10×3010 \times 30), A2A_2 (30×530 \times 5), A3A_3 (5×605 \times 60). Dimensions: p=[10,30,5,60]p = [10, 30, 5, 60].

dp[1][1]=dp[2][2]=dp[3][3]=0dp[1][1] = dp[2][2] = dp[3][3] = 0.

dp[1][2]=p0p1p2=10×30×5=1500dp[1][2] = p_0 p_1 p_2 = 10 \times 30 \times 5 = 1500. Split at k=1k=1: (A1A2)(A_1 A_2).

dp[2][3]=p1p2p3=30×5×60=9000dp[2][3] = p_1 p_2 p_3 = 30 \times 5 \times 60 = 9000. Split at k=2k=2: (A2A3)(A_2 A_3).

dp[1][3]dp[1][3]: Try k=1k=1: dp[1][1]+dp[2][3]+10×30×60=0+9000+18000=27000dp[1][1] + dp[2][3] + 10 \times 30 \times 60 = 0 + 9000 + 18000 = 27000. Try k=2k=2: dp[1][2]+dp[3][3]+10×5×60=1500+0+3000=4500dp[1][2] + dp[3][3] + 10 \times 5 \times 60 = 1500 + 0 + 3000 = 4500.

Minimum: dp[1][3]=4500dp[1][3] = 4500Split at k=2k=2: (A1(A2A3))(A_1(A_2 A_3)).

5.8 Worked Example: Longest Common Subsequence

Problem. Given sequences X=(x1,,xm)X = (x_1, \ldots, x_m) and Y=(y1,,yn)Y = (y_1, \ldots, y_n)Find the LCS.

Recurrence:

dp[i][j]={0ifi=0orj=0dp[i1][j1]+1ifxi=yjmax(dp[i1][j],dp[i][j1])ifxiyjdp[i][j] = \begin{cases} 0 & \mathrm{if} i = 0 \mathrm{ or} j = 0 \\ dp[i-1][j-1] + 1 & \mathrm{if} x_i = y_j \\ \max(dp[i-1][j], dp[i][j-1]) & \mathrm{if} x_i \neq y_j \end{cases}

Time: O(mn)O(mn). Space: O(mn)O(mn) (can be reduced to O(min(m,n))O(\min(m,n)) for the length only).

Proof of correctness. If xi=yjx_i = y_jAny LCS of X[1..i]X[1..i] and Y[1..j]Y[1..j] must include xix_i So LCS=1+LCS(X[1..i1],Y[1..j1])\mathrm{LCS} = 1 + \mathrm{LCS}(X[1..i-1], Y[1..j-1]). If xiyjx_i \neq y_jThe LCS either Excludes xix_i or excludes yjy_jGiving the max of the two subproblems. \blacksquare

5.9 Worked Example: Coin Change

Problem. Given coin denominations d1,,dnd_1, \ldots, d_n and a target amount MMFind the minimum number of coins needed.

Recurrence:

dp[c]={0ifc=0mini:dic(dp[cdi]+1)ifc>0dp[c] = \begin{cases} 0 & \mathrm{if} c = 0 \\ \min_{i: d_i \leq c}(dp[c - d_i] + 1) & \mathrm{if} c > 0 \end{cases}

Time: O(nM)O(nM). Space: O(M)O(M).

Proof of correctness. To make change for amount c>0c > 0The last coin used must be some dicd_i \leq c. The remaining amount is cdic - d_iAnd the optimal solution for cc uses 1+dp[cdi]1 + dp[c - d_i] coins. Taking the minimum over all valid did_i gives the optimal solution. \blacksquare

Worked Example: Coin Change

Denominations: 1,5,10,25\\{1, 5, 10, 25\\}. Target: M=63M = 63.

Bottom-up computation:

  • dp[0]=0dp[0] = 0
  • dp[1..4]=dp[c1]+1=cdp[1..4] = dp[c - 1] + 1 = c (use pennies)
  • dp[5]=min(dp[4]+1,dp[0]+1)=1dp[5] = \min(dp[4]+1, dp[0]+1) = 1 (use a nickel)
  • dp[10]=min(dp[9]+1,dp[5]+1,dp[0]+1)=1dp[10] = \min(dp[9]+1, dp[5]+1, dp[0]+1) = 1 (use a dime)
  • dp[25]=1dp[25] = 1 (use a quarter)
  • dp[50]=2dp[50] = 2 (use two quarters)
  • dp[63]=min(dp[62]+1,dp[58]+1,dp[53]+1,dp[38]+1)dp[63] = \min(dp[62]+1, dp[58]+1, dp[53]+1, dp[38]+1)

Working backwards: dp[63]=dp[38]+1=dp[13]+2=dp[3]+3=6dp[63] = dp[38] + 1 = dp[13] + 2 = dp[3] + 3 = 6.

Solution: 2 quarters + 1 dime + 3 pennies = 25+25+10+1+1+1=6325 + 25 + 10 + 1 + 1 + 1 = 63. 6 coins.

5.10 Worked Example: Longest Increasing Subsequence

Problem. Given a sequence a1,,ana_1, \ldots, a_nFind the length of the longest strictly increasing subsequence (not necessarily contiguous).

Recurrence: dp[i]=1+maxdp[j]:j<i and aj<aidp[i] = 1 + \max\\{dp[j] : j \lt i \mathrm{~and~} a_j \lt a_i\\}With dp[i]=1dp[i] = 1 if no such jj exists.

Time: O(n2)O(n^2). Space: O(n)O(n).

Worked Example: Longest Increasing Subsequence

Find the LIS of [10,22,9,33,21,50,41,60,80][10, 22, 9, 33, 21, 50, 41, 60, 80].

dp[0]=1dp[0] = 1 (just 10) dp[1]=dp[0]+1=2dp[1] = dp[0] + 1 = 2 (10, 22) dp[2]=1dp[2] = 1 (just 9) dp[3]=dp[1]+1=3dp[3] = dp[1] + 1 = 3 (10, 22, 33) dp[4]=dp[0]+1=2dp[4] = dp[0] + 1 = 2 (10, 21) dp[5]=dp[3]+1=4dp[5] = dp[3] + 1 = 4 (10, 22, 33, 50) dp[6]=dp[4]+1=3dp[6] = dp[4] + 1 = 3 (10, 21, 41) dp[7]=dp[5]+1=5dp[7] = dp[5] + 1 = 5 (10, 22, 33, 50, 60) dp[8]=dp[7]+1=6dp[8] = dp[7] + 1 = 6 (10, 22, 33, 50, 60, 80)

LIS length: 6.

Patience sorting approach (O(nlogn)O(n \log n)): Maintain piles. For each card, place on the leftmost pile whose top card is \geq the current card, or start a new pile on the right if no such pile exists. The number of piles equals the LIS length.

6. Advanced Topics

6.1 NP-Completeness

6.1.1 P, NP, and NP-Completeness

P: The class of decision problems solvable in polynomial time by a deterministic Turing machine.

NP: The class of decision problems solvable in polynomial time by a non-deterministic Turing Machine. Equivalently, problems whose “yes” instances have polynomial-time verifiable certificates.

NP-hard: A problem AA is NP-hard if every problem in NP can be reduced to AA in polynomial Time.

NP-complete: A problem is NP-complete if it is in NP and NP-hard.

Theorem 6.1. If any NP-complete problem is in P, then P = NP.

6.1.2 Polynomial-Time Reductions

A polynomial-time reduction from problem AA to problem BB is a polynomial-time algorithm that Transforms instances of AA into instances of BBPreserving the answer.

Lemma 6.1. If ApBA \leq_p B and BPB \in PThen APA \in P.

Lemma 6.2. If ApBA \leq_p B and AA is NP-hard, then BB is NP-hard.

6.1.3 The Cook-Levin Theorem

Theorem 6.2 (Cook-Levin, 1971). SAT is NP-complete.

Proof sketch. We show that every problem in NP reduces to SAT. Let LNPL \in \mathrm{NP}. There exists a polynomial-time non-deterministic Turing machine MM that decides LLRunning in time p(n)p(n) on inputs of length nn. For an input xxWe construct a Boolean formula ϕx\phi_x that is satisfiable if and only if MM accepts xx.

The formula encodes:

  1. Tableau variables: T[i,j,σ]=1T[i, j, \sigma] = 1 iff cell (i,j)(i, j) of the computation tableau contains symbol σ\sigma. The tableau has p(n)+1p(n) + 1 rows and p(n)p(n) columns.
  2. Initial configuration: Row 0 encodes the initial state of MM on input xx.
  3. Transition constraints: Each 2×32 \times 3 window of the tableau corresponds to a valid transition of MM.
  4. Accepting configuration: Some cell in the last row contains the accept state.

Each of these constraints can be expressed as a polynomial-size CNF formula. The total formula ϕx\phi_x has size polynomial in nn and is satisfiable iff MM accepts xx. \blacksquare

6.1.4 Classic NP-Complete Problems

SAT. Given a Boolean formula in CNF, is there a satisfying assignment?

3-SAT. SAT restricted to clauses with exactly 3 literals.

Vertex Cover. Given a graph G=(V,E)G = (V, E) and integer kkIs there a vertex cover of size k\leq k?

Travelling Salesman Problem (decision version). Given a weighted graph and bound BBIs there a Tour of total weight B\leq B?

Subset Sum. Given a set of integers and a target TTIs there a subset summing to TT?

Clique. Given a graph GG and integer kkDoes GG contain a clique of size kk?

6.1.5 Proof Strategy for NP-Completeness

To prove a problem BB is NP-complete:

  1. Show BNPB \in \mathrm{NP} (polynomial-time verifiable certificate).
  2. Show a known NP-complete problem AA reduces to BB: ApBA \leq_p B.

Example. 3-SAT p\leq_p Vertex Cover: construct a graph from the 3-SAT formula where each Variable and each clause become vertices, and edges enforce the constraint that a satisfying Assignment corresponds to a vertex cover.

Worked Example: 3-SAT $\leq_p$ Vertex Cover Reduction

Reduce 3-SAT formula ϕ=(x1xˉ2x3)(xˉ1x2x3)(x1x2xˉ3)\phi = (x_1 \vee \bar{x}_2 \vee x_3) \wedge (\bar{x}_1 \vee x_2 \vee x_3) \wedge (x_1 \vee x_2 \vee \bar{x}_3) to a vertex cover instance.

For each variable xix_iCreate two vertices xix_i and xˉi\bar{x}_i connected by an edge (the “literal edge”).

For each clause CjC_jCreate a triangle of 3 vertices cj1,cj2,cj3c_{j1}, c_{j2}, c_{j3}.

For each clause vertex cjkc_{jk}Connect it to the literal vertex corresponding to the kk-th literal of clause jj.

Claim: ϕ\phi is satisfiable iff the graph has a vertex cover of size k+2mk + 2m where kk is the number of variables and mm is the number of clauses.

(\Rightarrow) If ϕ\phi is satisfiable, include in the cover: for each variable, the literal vertex matching the truth assignment (e.g., x1x_1 if x1=truex_1 = \mathrm{true}, xˉ1\bar{x}_1 if x1=falsex_1 = \mathrm{false}). This covers all literal edges (kk vertices). For each clause triangle, at least one literal in the clause is true, so the corresponding literal vertex covers one of the three edges from the triangle. Include the other two vertices of the triangle (2m2m vertices total).

(\Leftarrow) A vertex cover of size k+2mk + 2m must include exactly one endpoint of each literal edge (otherwise the triangle requires 3 vertices). Set each variable according to which literal vertex is in the cover. Each clause triangle has exactly one uncovered vertex, whose edge to a literal vertex is covered by that literal vertex, meaning the clause is satisfied.

The reduction takes polynomial time (number of vertices and edges is polynomial in the formula size).

:::caution Common Pitfall NP-hardness does not mean the problem is unsolvable. It means there is no known polynomial-time Algorithm. Many NP-complete problems have efficient approximation algorithms or can be solved Exactly for practical input sizes using branch-and-bound or SAT solvers.

6.2 Approximation Algorithms

When exact solutions are intractable, we seek approximation algorithms with provable guarantees.

Definition. A ρ\rho-approximation algorithm for a minimisation problem produces a solution of cost at most ρ\rho times the optimal cost. For a maximisation problem, the solution has value at least (1/ρ)(1/\rho) times the optimal value.

Theorem 6.3. Greedy vertex cover (repeatedly pick an edge, add both endpoints) is a 2-approximation.

Proof. The algorithm selects a set CC of vertices. Each edge in the matching used by the algorithm contributes 2 vertices to CC. Let MM^* be a maximum matching. Then C=2M2OPT|C| = 2|M^*| \leq 2 \cdot |\mathrm{OPT}|Since OPT must contain at least one endpoint of every edge in MM^* (and MM^* is maximum, so M|M^*| \geq the size of any matching). Therefore the approximation ratio is at most 2. \blacksquare

Theorem 6.4 (Metric TSP). The Christofides algorithm is a 3/23/2-approximation for TSP with the triangle inequality.

Proof. The algorithm computes an MST (OPT\leq \mathrm{OPT}), finds a minimum-weight perfect matching MM on the odd-degree vertices of the MST (MOPT/2\lvert M \rvert \leq \mathrm{OPT}/2), and combines them into an Eulerian tour which is shortcut to a Hamiltonian cycle. The total weight is at most MST+MOPT+OPT/2=32OPT\mathrm{MST} + \lvert M \rvert \leq \mathrm{OPT} + \mathrm{OPT}/2 = \frac{3}{2}\mathrm{OPT}. \blacksquare

Theorem 6.5 (Inapproximability). Unless P = NP, TSP (general, without triangle inequality) has no polynomial-time approximation algorithm with any constant ratio.

Proof sketch. If a cc-approximation existed for TSP, we could use it to solve the Hamiltonian cycle problem (which is NP-complete): given a graph GGConstruct a TSP instance with edge weight 1 for existing edges and weight cn+1cn + 1 for non-edges. If the approximation returns a tour of weight nnThen GG has a Hamiltonian cycle. Otherwise, the tour weight is at least n1+cn+1>cnn - 1 + cn + 1 \gt cnSo the approximation ratio would exceed ccContradiction. \blacksquare

Theorem 6.6 (SET COVER). The greedy algorithm for SET COVER is a (lnn+O(1))(\ln n + O(1))-approximation, where nn is the size of the universe.

Proof sketch. At each step, the greedy algorithm picks the set covering the most uncovered elements. Let cic_i be the cost of the ii-th set picked, and let nin_i be the number of newly covered elements. Then ci/niOPT/(nj<inj)c_i / n_i \leq \mathrm{OPT} / (n - \sum_{j \lt i} n_j) (otherwise OPT could not cover the remaining elements at lower cost). Summing gives the lnn+O(1)\ln n + O(1) bound. \blacksquare

Worked Example: Greedy Set Cover

Universe U=1,2,3,4,5,6U = \\{1, 2, 3, 4, 5, 6\\}. Sets: S1=1,2,3S_1 = \\{1, 2, 3\\}, S2=2,4S_2 = \\{2, 4\\}, S3=3,5,6S_3 = \\{3, 5, 6\\}, S4=4,5S_4 = \\{4, 5\\}, S5=1,4,6S_5 = \\{1, 4, 6\\}. All sets have equal cost 1.

Greedy:

  1. Pick S1S_1 (covers 3 elements, tied with S3S_3). Covered: 1,2,3\\{1, 2, 3\\}.
  2. Remaining: 4,5,6\\{4, 5, 6\\}. S3S_3 covers 5,6\\{5, 6\\} (2 new), S4S_4 covers 4,5\\{4, 5\\} (2 new), S5S_5 covers 4,6\\{4, 6\\} (2 new). Pick S3S_3. Covered: 1,2,3,5,6\\{1, 2, 3, 5, 6\\}.
  3. Remaining: 4\\{4\\}. Pick S2S_2 (or S4S_4 or S5S_5). Covered: 1,2,3,4,5,6\\{1, 2, 3, 4, 5, 6\\}.

Greedy solution: S1,S3,S2\\{S_1, S_3, S_2\\}Size 3. Optimal: S1,S5,S4\\{S_1, S_5, S_4\\} or S3,S5,S2\\{S_3, S_5, S_2\\}Size 3. Here greedy is optimal, but it is a lnn\ln n-approximation.

6.3 Randomised Algorithms

6.3.1 Las Vegas vs. Monte Carlo

Las Vegas algorithms always produce the correct answer but have randomised running time.

Monte Carlo algorithms always run in polynomial time but may produce incorrect answers with some probability.

PropertyLas VegasMonte Carlo
CorrectnessAlways correctBounded error prob
Running timeRandomisedBounded
ExampleRandomised QuicksortMiller-Rabin primality

Theorem 6.6. A Monte Carlo algorithm with error probability ϵ\epsilon can be amplified to error probability ϵk\epsilon^k by running it kk times and taking the majority vote (for decision problems with one-sided error) or the most frequent answer (for two-sided error).

Proof. For one-sided error: Pr[all k runs fail]=ϵk\Pr[\mathrm{all}~ k \mathrm{~runs~fail}] = \epsilon^k. For two-sided error with majority vote: by the Chernoff bound, the probability that the majority is wrong decreases exponentially in kk. \blacksquare

6.3.2 Randomised Select (Quickselect)

Problem. Find the kk-th smallest element in an unsorted array.

Algorithm (Randomised Select): Like quicksort, but recurse only into the partition containing the kk-th element.

Theorem 6.7. Randomised select has expected running time O(n)O(n).

Proof. The expected number of comparisons satisfies T(n)n+1ni=1n(T(max(i1,ni)))T(n) \leq n + \frac{1}{n}\sum_{i=1}^{n}(T(\max(i-1, n-i))). This can be shown to satisfy T(n)=O(n)T(n) = O(n) by induction. \blacksquare

Worked Example: Randomised Select

Find the 3rd smallest element in [7,2,1,6,8,5,3,4][7, 2, 1, 6, 8, 5, 3, 4].

Pivot = randomly chosen. Suppose pivot = 5 (index 5).

Partition: [7,2,1,6,8,5,3,4][2,1,3,4,5,6,8,7][7, 2, 1, 6, 8, 5, 3, 4] \to [2, 1, 3, 4, 5, 6, 8, 7].

Pivot 5 is at index 4 (0-indexed). We want rank 3 (0-indexed rank 2). 4>24 > 2So recurse on left: [2,1,3,4][2, 1, 3, 4].

Pivot = randomly chosen. Suppose pivot = 3.

Partition: [2,1,3,4][2,1,3,4][2, 1, 3, 4] \to [2, 1, 3, 4].

Pivot 3 is at index 2. We want rank 2. 2=22 = 2So return 3.

The 3rd smallest element is 3.

Worked Example: Miller-Rabin Primality Test

Test whether n=561n = 561 is prime (it is not; 561=3×11×17561 = 3 \times 11 \times 17A Carmichael number).

Write n1=560=24×35n - 1 = 560 = 2^4 \times 35So s=4s = 4, d=35d = 35.

Choose random base a=2a = 2.

Compute admodn=235mod561a^d \bmod n = 2^{35} \bmod 561.

25=322^5 = 32, 210=1024mod561=4632^{10} = 1024 \bmod 561 = 463, 220=4632mod561=672^{20} = 463^2 \bmod 561 = 67, 235=22021025mod561=6746332mod5612^{35} = 2^{20} \cdot 2^{10} \cdot 2^5 \bmod 561 = 67 \cdot 463 \cdot 32 \bmod 561.

67×463=31021mod561=3102155×561=3102130855=16667 \times 463 = 31021 \bmod 561 = 31021 - 55 \times 561 = 31021 - 30855 = 166. 166×32=5312mod561=53129×561=53125049=263166 \times 32 = 5312 \bmod 561 = 5312 - 9 \times 561 = 5312 - 5049 = 263.

admodn=2631a^d \bmod n = 263 \neq 1 and n1=560\neq n - 1 = 560.

Now square: 2632mod561=69169mod561=69169123×561=6916969003=166560263^2 \bmod 561 = 69169 \bmod 561 = 69169 - 123 \times 561 = 69169 - 69003 = 166 \neq 560.

Square: 1662mod561=27556mod561=2755649×561=2755627489=67560166^2 \bmod 561 = 27556 \bmod 561 = 27556 - 49 \times 561 = 27556 - 27489 = 67 \neq 560.

Square: 672mod561=4489mod561=44898×561=44894488=167^2 \bmod 561 = 4489 \bmod 561 = 4489 - 8 \times 561 = 4489 - 4488 = 1.

We got 1 on the last squaring, but the previous result was 671,56067 \neq 1, 560. So a=2a = 2 is a witness that 561 is composite. Output: COMPOSITE.

The error probability of Miller-Rabin is at most 1/41/4 per random base, so kk iterations give error 4k\leq 4^{-k}.

6.3.3 Hashing with Universal Hash Functions

Definition. A family H\mathcal{H} of hash functions from UU to 0,,m1\\{0, \ldots, m - 1\\} is universal if for any distinct x,yUx, y \in U, PrhH[h(x)=h(y)]1/m\Pr_{h \in \mathcal{H}}[h(x) = h(y)] \leq 1/m.

Theorem 6.8. With a universal hash family and chaining, the expected number of collisions for any element is at most n/mn/m.

Proof. For a fixed element xxLet XiyX_{iy} be the indicator that h(x)=h(yi)h(x) = h(y_i) where y1,,yny_1, \ldots, y_n are the other n1n-1 elements. Then E[Xiy]=Pr[h(x)=h(yi)]1/m\mathrm{E}[X_{iy}] = \Pr[h(x) = h(y_i)] \leq 1/m by universality. By linearity of expectation, the expected number of collisions is iE[Xiy](n1)/m\sum_i \mathrm{E}[X_{iy}] \leq (n-1)/m. \blacksquare

6.4 Amortised Analysis (Detailed)

6.4.1 Aggregate Analysis

Example: Multi-pop Stack. A stack supports push (O(1)O(1)) and multi-pop(kk) (pop kk elements, cost min(k,s)\min(k, s) where ss is the stack size). Although a single multi-pop can cost O(n)O(n)A sequence of nn push/multi-pop operations costs O(n)O(n) total: each element is pushed once and popped at most once.

Theorem 6.9. The amortised cost per operation for a multi-pop stack is O(1)O(1).

Proof. In a sequence of nn operations, each element is pushed at most once and popped at most once. The total cost is at most 2n=O(n)2n = O(n)So the amortised cost per operation is O(n)/n=O(1)O(n)/n = O(1). \blacksquare

6.4.2 Accounting Method

Each operation is charged an amortised cost. The difference between the amortised cost and the actual cost is stored as credit. The credit must always be non-negative.

Example: Binary Counter. A kk-bit binary counter supports increment (flip bits from the right until a 0 is found).

Assign amortised cost of 2 per increment: 1 to pay for flipping a 0 to 1, 1 stored as credit. When a 1 is flipped back to 0, the credit from when it was set pays for the flip.

Theorem 6.10. The amortised cost per increment of a binary counter is O(1)O(1).

Proof. Each bit flips from 0 to 1 at most once between consecutive resets to 0. The credit stored on a 1 bit pays for the subsequent flip back to 0. The total credit is always non-negative. \blacksquare

6.4.3 Potential Method

The potential function Φ\Phi maps data structure states to non-negative real numbers. The amortised cost of the ii-th operation is c^i=ci+Φ(Di)Φ(Di1)\hat{c}_i = c_i + \Phi(D_i) - \Phi(D_{i-1}).

Theorem 6.11. If Φ(Di)0\Phi(D_i) \geq 0 for all iiThen i=1nc^ii=1nci\sum_{i=1}^{n} \hat{c}_i \geq \sum_{i=1}^{n} c_i.

Worked Example: Binary Counter with Potential Method

Define Φ(Di)=\Phi(D_i) = number of 1-bits in the counter after ii operations.

For increment ii: let tit_i be the number of trailing 1s flipped. The actual cost is ti+1t_i + 1 (flipping tit_i ones and one zero). The number of 1-bits changes by 1ti1 - t_i.

c^i=(ti+1)+Φ(Di)Φ(Di1)=(ti+1)+(1ti)=2\hat{c}_i = (t_i + 1) + \Phi(D_i) - \Phi(D_{i-1}) = (t_i + 1) + (1 - t_i) = 2

The amortised cost per increment is exactly 2, i.e., O(1)O(1).

Worked Example: Splay Tree Amortised Analysis

A splay tree is a BST where every access is followed by a splay operation that moves the accessed node to the root using rotations. Three types of rotations are used: zig (single rotation when parent is root), zig-zig (two rotations in the same direction), and zig-zag (two rotations in alternating directions).

Access Lemma. The amortised cost of splaying a node xx in a splay tree with nn nodes is O(logn)O(\log n).

Proof sketch. Define the potential as Φ(T)=xTlogsize(x)\Phi(T) = \sum_{x \in T} \log \mathrm{size}(x)Where size(x)\mathrm{size}(x) is the number of nodes in the subtree rooted at xx (including xx). Define the rank r(x)=logsize(x)r(x) = \log \mathrm{size}(x).

The amortised cost of a splay step at node xx with parent pp and grandparent gg is c^=1+r(x)r(x)\hat{c} = 1 + r'(x) - r(x)Where primes denote ranks after the step.

  • Zig: c^=1+r(x)r(x)1+3(r(x)r(x))\hat{c} = 1 + r'(x) - r(x) \leq 1 + 3(r'(x) - r(x)).
  • Zig-zig: c^=2+r(x)r(x)3(r(x)r(x))\hat{c} = 2 + r'(x) - r(x) \leq 3(r'(x) - r(x)).
  • Zig-zag: c^=2+r(x)r(x)3(r(x)r(x))\hat{c} = 2 + r'(x) - r(x) \leq 3(r'(x) - r(x)).

Summing over all splay steps: c^total1+3(rfinal(x)rinitial(x))1+3logn=O(logn)\hat{c}_{\mathrm{total} \leq 1 + 3(r_{\mathrm{final}(x) - r_{\mathrm{initial}(x)) \leq 1 + 3 \log n = O(\log n)}}}. \blacksquare

Corollary. A sequence of mm splay tree operations takes O(mlogn)O(m \log n) amortised time.

7. Problem Set

7.1 Analysis (Problems 1—3)

Problem 1. Prove that n3/1000100n2100n+3=Θ(n3)n^3 / 1000 - 100n^2 - 100n + 3 = \Theta(n^3).

Problem 2. Use the Master Theorem to solve the recurrence T(n)=3T(n/4)+nlognT(n) = 3T(n/4) + n \log n. If the Master Theorem does not apply, explain why.

Problem 3. Prove that log(n!)=Θ(nlogn)\log(n!) = \Theta(n \log n) using Stirling’s approximation: n!2πn(n/e)nn! \approx \sqrt{2\pi n}(n/e)^n.

7.2 Data Structures (Problems 4—8)

Problem 4. Show the result of inserting the keys 15,5,20,10,25,3,7,3015, 5, 20, 10, 25, 3, 7, 30 into an initially empty AVL tree. Show all rotations.

Problem 5. Prove that deleting a node from a red-black tree with nn internal nodes takes O(logn)O(\log n) time.

Problem 6. Design a hash table for n=1000n = 1000 strings using chaining. Choose the table size mmA hash function, and compute the expected number of comparisons for a successful search.

Problem 7. A skip list uses p=1/4p = 1/4. What is the expected maximum level for n=10000n = 10000 elements? What is the expected time for search?

Problem 8. Prove that the union-by-rank heuristic alone (without path compression) gives O(logn)O(\log n) amortised time per Union-Find operation.

7.3 Sorting (Problems 9—11)

Problem 9. Prove that heapsort is not stable by giving a concrete counterexample.

Problem 10. Given an array of nn integers in the range [0,n21][0, n^2 - 1]Design an O(n)O(n) sorting algorithm using radix sort. Justify the choice of base and number of passes.

Problem 11. Prove that the best-case number of comparisons for comparison-based sorting is Ω(nlogn)\Omega(n \log n) (not just the worst case).

7.4 Graph Algorithms (Problems 12—15)

Problem 12. Run Dijkstra’s algorithm on the following graph from source AA. Show the state of the priority queue after each extraction. Edge weights: A10BA \xrightarrow{10} B, A3CA \xrightarrow{3} C, C4BC \xrightarrow{4} B, C8DC \xrightarrow{8} D, C2EC \xrightarrow{2} E, B7DB \xrightarrow{7} D, E5DE \xrightarrow{5} D, D6BD \xrightarrow{6} B.

Problem 13. Prove that if a graph has a negative-weight cycle reachable from the source, then Bellman-Ford will detect it.

Problem 14. Use the cut property to prove that the minimum spanning tree is unique when all edge weights are distinct.

Problem 15. Find the strongly connected components of the graph with edges: (A,B),(B,C),(C,A),(B,D),(D,E),(E,F),(F,D),(F,G)(A, B), (B, C), (C, A), (B, D), (D, E), (E, F), (F, D), (F, G). Use Kosaraju’s algorithm and show all steps.

7.5 Dynamic Programming (Problems 16—17)

Problem 16. You are given nn types of coin denominations d1,d2,,dnd_1, d_2, \ldots, d_n and a target amount MM. Find the minimum number of coins needed to make exact change for MM (or report that it is impossible). Give a recurrence, prove correctness, and state the time and space complexity.

Problem 17. Given a sequence of matrices A1(2×10)A_1 (2 \times 10), A2(10×50)A_2 (10 \times 50), A3(50×20)A_3 (50 \times 20), A4(20×5)A_4 (20 \times 5), A5(5×80)A_5 (5 \times 80)Find the optimal parenthesisation using the matrix chain multiplication DP. Show the full DP table.

7.6 Advanced Topics (Problems 18—20)

Problem 18. Prove that CLIQUE is NP-complete by reducing from 3-SAT.

Problem 19. Design a 2-approximation algorithm for the metric TSP using the MST shortcutting technique. Prove the approximation ratio.

Problem 20. A randomised algorithm for MINIMUM CUT works by repeatedly contracting random edges until two vertices remain. Prove that the probability that any specific minimum cut survives is at least 2/(n(n1))2 / (n(n-1))And hence that O(n2logn)O(n^2 \log n) repetitions suffice to find a minimum cut with high probability (Karger’s algorithm).

Common Pitfalls

  1. Forgetting that O(nlogn)O(n \log n) average-case for quicksort becomes O(n2)O(n^2) worst-case on already sorted input.

  2. Confusing an algorithm with a program — an algorithm is a step-by-step procedure, not its implementation in code.

  3. Forgetting edge cases in algorithm design (e.g., empty input, single element, already sorted data).

  4. Misunderstanding the difference between a stack (LIFO) and a queue (FIFO) in data structure applications.

Worked Examples

Example 1: Merge Sort Trace

Problem. Trace merge sort on the array [38, 27, 43, 3, 9, 82, 10].

Solution. Recursive splits and merges:

[38, 27, 43, 3, 9, 82, 10]
→ [38, 27, 43] and [3, 9, 82, 10]
→ [38] [27, 43] and [3, 9] [82, 10]
→ [38] [27][43] and [3][9] [82][10]
Merge: [27, 38, 43] and [3, 9, 10, 82]
Final: [3, 9, 10, 27, 38, 43, 82]

Time: O(nlogn)O(n \log n) at every case. Space: O(n)O(n) auxiliary.

\blacksquare

Example 2: Binary Search on Sorted Array

Problem. Search for 23 in the sorted array [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]. Show each step.

Solution. Low = 0, high = 9.

  • Step 1: mid = 4, A[4] = 16 < 23 → low = 5
  • Step 2: mid = 7, A[7] = 56 > 23 → high = 6
  • Step 3: mid = 5, A[5] = 23 = 23 → found at index 5

3 comparisons. O(logn)O(\log n) worst case.

\blacksquare

Summary

  • Big-O notation classifies growth rates: O(1)O(1) < O(logn)O(\log n) < O(n)O(n) < O(nlogn)O(n\log n) < O(n2)O(n^2) < O(2n)O(2^n).
  • Sorting: merge sort and heap sort guarantee O(nlogn)O(n\log n); quick sort is O(n2)O(n^2) worst case but O(nlogn)O(n\log n) average.
  • Fundamental data structures: arrays (O(1)O(1) access), linked lists (O(1)O(1) insert/delete), hash tables (O(1)O(1) average lookup), BSTs (O(logn)O(\log n) balanced).
  • Graph representations: adjacency matrix (O(V2)O(V^2) space) vs adjacency list (O(V+E)O(V+E) space).
  • Algorithm design paradigms: divide-and-conquer, greedy, dynamic programming — choose based on optimal substructure and overlapping subproblems.

Cross-References

TopicSiteLink
Advanced AlgorithmsWyattsNotesView
Advanced Data StructuresWyattsNotesView
Discrete MathematicsWyattsNotesView
Theory of ComputationWyattsNotesView
Algorithms — MIT 6.006MIT OCWView

:::