Advanced Topics
6.1 NP-Completeness
6.1.1 P, NP, and NP-Completeness
P: The class of decision problems solvable in polynomial time by a deterministic Turing machine.
NP: The class of decision problems solvable in polynomial time by a non-deterministic Turing Machine. Equivalently, problems whose “yes” instances have polynomial-time verifiable certificates.
NP-hard: A problem is NP-hard if every problem in NP can be reduced to in polynomial Time.
NP-complete: A problem is NP-complete if it is in NP and NP-hard.
Theorem 6.1. If any NP-complete problem is in P, then P = NP.
6.1.2 Polynomial-Time Reductions
A polynomial-time reduction from problem to problem is a polynomial-time algorithm that Transforms instances of into instances of Preserving the answer.
Lemma 6.1. If and Then .
Lemma 6.2. If and is NP-hard, then is NP-hard.
6.1.3 The Cook-Levin Theorem
Theorem 6.2 (Cook-Levin, 1971). SAT is NP-complete.
Proof sketch. We show that every problem in NP reduces to SAT. Let . There exists a polynomial-time non-deterministic Turing machine that decides Running in time on inputs of length . For an input We construct a Boolean formula that is satisfiable if and only if accepts .
The formula encodes:
- Tableau variables: iff cell of the computation tableau contains symbol . The tableau has rows and columns.
- Initial configuration: Row 0 encodes the initial state of on input .
- Transition constraints: Each window of the tableau corresponds to a valid transition of .
- Accepting configuration: Some cell in the last row contains the accept state.
Each of these constraints can be expressed as a polynomial-size CNF formula. The total formula has size polynomial in and is satisfiable iff accepts .
6.1.4 Classic NP-Complete Problems
SAT. Given a Boolean formula in CNF, is there a satisfying assignment?
3-SAT. SAT restricted to clauses with exactly 3 literals.
Vertex Cover. Given a graph and integer Is there a vertex cover of size ?
Travelling Salesman Problem (decision version). Given a weighted graph and bound Is there a Tour of total weight ?
Subset Sum. Given a set of integers and a target Is there a subset summing to ?
Clique. Given a graph and integer Does contain a clique of size ?
6.1.5 Proof Strategy for NP-Completeness
To prove a problem is NP-complete:
- Show (polynomial-time verifiable certificate).
- Show a known NP-complete problem reduces to : .
Example. 3-SAT Vertex Cover: construct a graph from the 3-SAT formula where each Variable and each clause become vertices, and edges enforce the constraint that a satisfying Assignment corresponds to a vertex cover.
Worked Example: 3-SAT $\leq_p$ Vertex Cover Reduction
Reduce 3-SAT formula to a vertex cover instance.
For each variable Create two vertices and connected by an edge (the “literal edge”).
For each clause Create a triangle of 3 vertices .
For each clause vertex Connect it to the literal vertex corresponding to the -th literal of clause .
Claim: is satisfiable iff the graph has a vertex cover of size where is the number of variables and is the number of clauses.
() If is satisfiable, include in the cover: for each variable, the literal vertex matching the truth assignment (e.g., if , if ). This covers all literal edges ( vertices). For each clause triangle, at least one literal in the clause is true, so the corresponding literal vertex covers one of the three edges from the triangle. Include the other two vertices of the triangle ( vertices total).
() A vertex cover of size must include exactly one endpoint of each literal edge (otherwise the triangle requires 3 vertices). Set each variable according to which literal vertex is in the cover. Each clause triangle has exactly one uncovered vertex, whose edge to a literal vertex is covered by that literal vertex, meaning the clause is satisfied.
The reduction takes polynomial time (number of vertices and edges is polynomial in the formula size).
:::caution Common Pitfall NP-hardness does not mean the problem is unsolvable. It means there is no known polynomial-time Algorithm. Many NP-complete problems have efficient approximation algorithms or can be solved Exactly for practical input sizes using branch-and-bound or SAT solvers.
6.2 Approximation Algorithms
When exact solutions are intractable, we seek approximation algorithms with provable guarantees.
Definition. A -approximation algorithm for a minimisation problem produces a solution of cost at most times the optimal cost. For a maximisation problem, the solution has value at least times the optimal value.
Theorem 6.3. Greedy vertex cover (repeatedly pick an edge, add both endpoints) is a 2-approximation.
Proof. The algorithm selects a set of vertices. Each edge in the matching used by the algorithm contributes 2 vertices to . Let be a maximum matching. Then Since OPT must contain at least one endpoint of every edge in (and is maximum, so the size of any matching). Therefore the approximation ratio is at most 2.
Theorem 6.4 (Metric TSP). The Christofides algorithm is a -approximation for TSP with the triangle inequality.
Proof. The algorithm computes an MST (), finds a minimum-weight perfect matching on the odd-degree vertices of the MST (), and combines them into an Eulerian tour which is shortcut to a Hamiltonian cycle. The total weight is at most .
Theorem 6.5 (Inapproximability). Unless P = NP, TSP (general, without triangle inequality) has no polynomial-time approximation algorithm with any constant ratio.
Proof sketch. If a -approximation existed for TSP, we could use it to solve the Hamiltonian cycle problem (which is NP-complete): given a graph Construct a TSP instance with edge weight 1 for existing edges and weight for non-edges. If the approximation returns a tour of weight Then has a Hamiltonian cycle. Otherwise, the tour weight is at least So the approximation ratio would exceed Contradiction.
Theorem 6.6 (SET COVER). The greedy algorithm for SET COVER is a -approximation, where is the size of the universe.
Proof sketch. At each step, the greedy algorithm picks the set covering the most uncovered elements. Let be the cost of the -th set picked, and let be the number of newly covered elements. Then (otherwise OPT could not cover the remaining elements at lower cost). Summing gives the bound.
Worked Example: Greedy Set Cover
Universe . Sets: , , , , . All sets have equal cost 1.
Greedy:
- Pick (covers 3 elements, tied with ). Covered: .
- Remaining: . covers (2 new), covers (2 new), covers (2 new). Pick . Covered: .
- Remaining: . Pick (or or ). Covered: .
Greedy solution: Size 3. Optimal: or Size 3. Here greedy is optimal, but it is a -approximation.
6.3 Randomised Algorithms
6.3.1 Las Vegas vs. Monte Carlo
Las Vegas algorithms always produce the correct answer but have randomised running time.
Monte Carlo algorithms always run in polynomial time but may produce incorrect answers with some probability.
| Property | Las Vegas | Monte Carlo |
|---|---|---|
| Correctness | Always correct | Bounded error prob |
| Running time | Randomised | Bounded |
| Example | Randomised Quicksort | Miller-Rabin primality |
Theorem 6.6. A Monte Carlo algorithm with error probability can be amplified to error probability by running it times and taking the majority vote (for decision problems with one-sided error) or the most frequent answer (for two-sided error).
Proof. For one-sided error: . For two-sided error with majority vote: by the Chernoff bound, the probability that the majority is wrong decreases exponentially in .
6.3.2 Randomised Select (Quickselect)
Problem. Find the -th smallest element in an unsorted array.
Algorithm (Randomised Select): Like quicksort, but recurse only into the partition containing the -th element.
Theorem 6.7. Randomised select has expected running time .
Proof. The expected number of comparisons satisfies . This can be shown to satisfy by induction.
Worked Example: Randomised Select
Find the 3rd smallest element in .
Pivot = randomly chosen. Suppose pivot = 5 (index 5).
Partition: .
Pivot 5 is at index 4 (0-indexed). We want rank 3 (0-indexed rank 2). So recurse on left: .
Pivot = randomly chosen. Suppose pivot = 3.
Partition: .
Pivot 3 is at index 2. We want rank 2. So return 3.
The 3rd smallest element is 3.
Worked Example: Miller-Rabin Primality Test
Test whether is prime (it is not; A Carmichael number).
Write So , .
Choose random base .
Compute .
, , , .
. .
and .
Now square: .
Square: .
Square: .
We got 1 on the last squaring, but the previous result was . So is a witness that 561 is composite. Output: COMPOSITE.
The error probability of Miller-Rabin is at most per random base, so iterations give error .
6.3.3 Hashing with Universal Hash Functions
Definition. A family of hash functions from to is universal if for any distinct , .
Theorem 6.8. With a universal hash family and chaining, the expected number of collisions for any element is at most .
Proof. For a fixed element Let be the indicator that where are the other elements. Then by universality. By linearity of expectation, the expected number of collisions is .
6.4 Amortised Analysis (Detailed)
6.4.1 Aggregate Analysis
Example: Multi-pop Stack. A stack supports push () and multi-pop() (pop elements, cost where is the stack size). Although a single multi-pop can cost A sequence of push/multi-pop operations costs total: each element is pushed once and popped at most once.
Theorem 6.9. The amortised cost per operation for a multi-pop stack is .
Proof. In a sequence of operations, each element is pushed at most once and popped at most once. The total cost is at most So the amortised cost per operation is .
6.4.2 Accounting Method
Each operation is charged an amortised cost. The difference between the amortised cost and the actual cost is stored as credit. The credit must always be non-negative.
Example: Binary Counter. A -bit binary counter supports increment (flip bits from the right until a 0 is found).
Assign amortised cost of 2 per increment: 1 to pay for flipping a 0 to 1, 1 stored as credit. When a 1 is flipped back to 0, the credit from when it was set pays for the flip.
Theorem 6.10. The amortised cost per increment of a binary counter is .
Proof. Each bit flips from 0 to 1 at most once between consecutive resets to 0. The credit stored on a 1 bit pays for the subsequent flip back to 0. The total credit is always non-negative.
6.4.3 Potential Method
The potential function maps data structure states to non-negative real numbers. The amortised cost of the -th operation is .
Theorem 6.11. If for all Then .
Worked Example: Binary Counter with Potential Method
Define number of 1-bits in the counter after operations.
For increment : let be the number of trailing 1s flipped. The actual cost is (flipping ones and one zero). The number of 1-bits changes by .
The amortised cost per increment is exactly 2, i.e., .
Worked Example: Splay Tree Amortised Analysis
A splay tree is a BST where every access is followed by a splay operation that moves the accessed node to the root using rotations. Three types of rotations are used: zig (single rotation when parent is root), zig-zig (two rotations in the same direction), and zig-zag (two rotations in alternating directions).
Access Lemma. The amortised cost of splaying a node in a splay tree with nodes is .
Proof sketch. Define the potential as Where is the number of nodes in the subtree rooted at (including ). Define the rank .
The amortised cost of a splay step at node with parent and grandparent is Where primes denote ranks after the step.
- Zig: .
- Zig-zig: .
- Zig-zag: .
Summing over all splay steps: .
Corollary. A sequence of splay tree operations takes amortised time.
:::