Skip to content

Query Optimisation

7.1 Query Processing Pipeline

SQLparseASTrewriteLogicalplanoptimisePhysicalplanexecuteResult\mathrm{SQL} \xrightarrow{\mathrm{parse} \mathrm{AST} \xrightarrow{\mathrm{rewrite} \mathrm{Logical} plan \xrightarrow{\mathrm{optimise} \mathrm{Physical} plan \xrightarrow{\mathrm{execute} \mathrm{Result}}}}}

7.2 Cost-Based Optimisation

The optimiser estimates the cost of alternative execution plans and chooses the cheapest.

Cost model. Cost = I/O cost (disk page accesses) + CPU cost. For disk-bound queries, I/O Dominates.

Catalog …/4-statistics-and-probability/2_statistics: Table cardinality (nn), attribute value cardinality, number of distinct Values, histogram of value distribution, index information.

Selectivity estimation. For a predicate σA=v(R)\sigma_{A = v}(R)The selectivity is approximately 1/V(A,R)1 / V(A, R) where V(A,R)V(A, R) is the number of distinct values of AA in RR.

Predicate typeSelectivity estimate
A=vA = v1/V(A,R)1 / V(A, R)
A>vA \gt v(max(A)v)/(max(A)min(A))(\max(A) - v) / (\max(A) - \min(A))
A1=v1A2=v2A_1 = v_1 \land A_2 = v_21/V(A1)×1/V(A2)1 / V(A_1) \times 1 / V(A_2)
A1=v1A2=v2A_1 = v_1 \lor A_2 = v_21/V(A1)+1/V(A2)1/(V(A1)×V(A2))1/V(A_1) + 1/V(A_2) - 1/(V(A_1) \times V(A_2))

7.3 Join Algorithms

Nested-loop join. For each tuple in RRScan all of SS.

Cost=nRnSpageaccesses(worstcase)\mathrm{Cost} = n_R \cdot n_S \mathrm{ page} accesses (worst case)

If one relation fits in memory, buffer it and scan the other: cost = nR+nSn_R + n_S.

Block nested-loop join. Use BB buffer pages. Load blocks of RR into B2B - 2 buffers, scan SS With the remaining buffer.

Cost=nR+nR/(B2)nS\mathrm{Cost} = n_R + \lceil n_R / (B - 2) \rceil \cdot n_S

Sort-merge join. Sort both relations on the join attribute, then merge.

Cost=2nRlogB1(nR)+2nSlogB1(nS)+nR+nS\mathrm{Cost} = 2 \cdot n_R \cdot \log_{B-1}(n_R) + 2 \cdot n_S \cdot \log_{B-1}(n_S) + n_R + n_S

Efficient for large relations, especially when both are already sorted.

Hash join. Build a hash table on the smaller relation (build phase), then probe with the larger (probe phase).

Cost=3(nR+nS)(ifbuildrelationfitsinmemory)\mathrm{Cost} = 3 \cdot (n_R + n_S) \mathrm{ (if build relation fits in memory)}

Best for equi-joins when one relation fits in memory.

Index nested-loop join. For each tuple in RRUse an index on SS to find matching tuples.

Cost=nR(indexlookupcost)\mathrm{Cost} = n_R \cdot (\mathrm{index} lookup cost)

Efficient if SS has an index on the join attribute and nRn_R is small.

7.4 Query Plan Selection

The optimiser explores the space of equivalent logical plans and physical implementations. For kk Joins, the number of join orderings is O(k!)O(k!) (left-deep trees) or O(3k)O(3^k) (bushy trees). Practical optimisers use dynamic programming with pruning.

Heuristic transformations:

  • Push selections down (reduce intermediate result sizes).
  • Push projections down (reduce column widths).
  • Convert cross products to joins when possible.
  • Reorder joins based on estimated cardinalities.