Query Optimisation
7.1 Query Processing Pipeline
7.2 Cost-Based Optimisation
The optimiser estimates the cost of alternative execution plans and chooses the cheapest.
Cost model. Cost = I/O cost (disk page accesses) + CPU cost. For disk-bound queries, I/O Dominates.
Catalog …/4-statistics-and-probability/2_statistics: Table cardinality (), attribute value cardinality, number of distinct Values, histogram of value distribution, index information.
Selectivity estimation. For a predicate The selectivity is approximately where is the number of distinct values of in .
| Predicate type | Selectivity estimate |
|---|---|
7.3 Join Algorithms
Nested-loop join. For each tuple in Scan all of .
If one relation fits in memory, buffer it and scan the other: cost = .
Block nested-loop join. Use buffer pages. Load blocks of into buffers, scan With the remaining buffer.
Sort-merge join. Sort both relations on the join attribute, then merge.
Efficient for large relations, especially when both are already sorted.
Hash join. Build a hash table on the smaller relation (build phase), then probe with the larger (probe phase).
Best for equi-joins when one relation fits in memory.
Index nested-loop join. For each tuple in Use an index on to find matching tuples.
Efficient if has an index on the join attribute and is small.
7.4 Query Plan Selection
The optimiser explores the space of equivalent logical plans and physical implementations. For Joins, the number of join orderings is (left-deep trees) or (bushy trees). Practical optimisers use dynamic programming with pruning.
Heuristic transformations:
- Push selections down (reduce intermediate result sizes).
- Push projections down (reduce column widths).
- Convert cross products to joins when possible.
- Reorder joins based on estimated cardinalities.