Skip to content

Data Structures (Advanced)

1. Balanced Search Trees

1.1 Red-Black Trees

A red-black tree is a self-balancing BST satisfying five invariants:

  1. Every node is either red or black.
  2. The root is black.
  3. Every leaf (NIL sentinel) is black.
  4. If a node is red, both its children are black (no two consecutive reds on any path).
  5. For every node, all simple paths from the node to descendant leaves contain the same number of black nodes. This number is the black-height of the node.

Theorem 1.1. A red-black tree with nn internal nodes has height h2log2(n+1)h \leq 2 \log_2(n + 1).

Proof. By invariant 4, on any path from root to leaf, at most half the nodes (rounded up) can be red. So the length of any root-to-leaf path is at most twice the black-height bhbh of the root. We now bound bhbh in terms of nn.

Consider the subtree rooted at any node xx. If this subtree has height hxh_xThen it contains at least 2bh(x)12^{bh(x)} - 1 internal nodes (…/1-number-and-algebra/3_proof-and-logic by induction on hxh_x: if xx is a leaf, it has 0=2010 = 2^0 - 1 internal nodes; otherwise, each child has black-height at least bh(x)1bh(x) - 1 if it is red, or bh(x)bh(x) if it is black, so each child has at least 2bh(x)112^{bh(x)-1} - 1 internal nodes, giving at least 2(2bh(x)11)+1=2bh(x)12(2^{bh(x)-1} - 1) + 1 = 2^{bh(x)} - 1 for xx).

Therefore n2bh(root)1n \geq 2^{bh(\mathrm{root})} - 1Giving bh(root)log2(n+1)bh(\mathrm{root}) \leq \log_2(n+1). Since h2bh(root)h \leq 2 \cdot bh(\mathrm{root})We have h2log2(n+1)h \leq 2\log_2(n+1). \blacksquare

Corollary. Search, insert, and delete in a red-black tree take O(logn)O(\log n) time.

1.1.1 Insertion

Insertion follows the standard BST insert, then colours the new node red. This may violate invariant 4 (two consecutive reds). Fix-up uses at most three rotations and recolouring.

Cases after insertion (let zz be the newly inserted red node, pp its parent, uu its uncle, and gg its grandparent):

CaseUncle colourConfigurationFix
1RedAnyRecolour pp and uu black, gg red; push problem up to gg
2Blackzz and pp same direction (left-left or right-right)Single rotation at gg; recolour pp black, gg red
3Blackzz and pp opposite directions (left-right or right-left)Double rotation: first at ppThen at gg; recolour zz black, gg red

Theorem 1.2. Insertion into a red-black tree with nn nodes takes O(logn)O(\log n) time and performs at most 2 rotations.

Proof. The BST insert takes O(logn)O(\log n) time. The fix-up loop ascends the tree. Each iteration either terminates (cases 2 and 3 perform one or two rotations and terminate) or moves the problem two levels up (case 1 recolours and continues with gg). Since the tree height is O(logn)O(\log n)The loop runs O(logn)O(\log n) times, but only cases 2 and 3 involve rotations, and at most one of these is reached. \blacksquare

Worked Example: Red-Black Tree Insertion

Insert keys 7, 3, 18, 10, 22, 8, 11, 26, 2, 6 into an initially empty red-black tree.

Insert 7: Tree is empty. 7 becomes the root, coloured black.

7(B)

Insert 3: BST insert left of 7, colour red. Parent 7 is black — no violation.

7(B)
/
3(R)

Insert 18: BST insert right of 7, colour red. Parent 7 is black — no violation.

7(B)
/ \
3(R) 18(R)

Insert 10: BST insert left of 18, colour red. Parent 18 is red — violation. Uncle 3 is red (case 1). Recolour: 3 and 18 become black, 7 becomes red. Now 7 is the root, so colour it black.

7(B)
/ \
3(B) 18(B)
/
10(R)

Insert 22: BST insert right of 18, colour red. Parent 18 is black — no violation.

7(B)
/ \
3(B) 18(B)
/ \
10(R) 22(R)

Insert 8: BST insert left of 10, colour red. Parent 10 is red — violation. Uncle 22 is red (case 1). Recolour: 10 and 22 black, 18 red. Now 18”s parent 7 is black — no further violation.

7(B)
/ \
3(B) 18(R)
/ \
10(B) 22(B)
/
8(R)

Insert 11: BST insert right of 10, colour red. Parent 10 is black — no violation.

7(B)
/ \
3(B) 18(R)
/ \
10(B) 22(B)
/ \
8(R) 11(R)

Insert 26: BST insert right of 22, colour red. Parent 22 is black — no violation.

Insert 2: BST insert left of 3, colour red. Parent 3 is black — no violation.

Insert 6: BST insert right of 3, colour red. Parent 3 is red — violation. Uncle 18 is red (case 1). Recolour: 3 and 18 black, 7 red. Root must be black.

Final tree:

7(B)
/ \
3(B) 18(B)
/ \ / \
2(R) 6(R) 10(R) 22(B)
/ \ \
8(B) 11(B) 26(R)

Verify: every root-to-leaf path has exactly 2 black internal nodes (plus the black NIL). No two consecutive reds. Root is black.

1.1.2 Deletion

Deletion is more complex than insertion. The key difficulty arises when removing a black node, which changes the black-height of paths.

Algorithm outline:

  1. Perform standard BST deletion. If the removed node yy is black, let xx be the node that replaces yy in its position. The “extra black” is pushed onto xx.
  2. If xx is red-black (red with an extra black), colour it black.
  3. Otherwise, fix the “double-black” violation at xx using the fix-up procedure.

Fix-up cases (let ww be the sibling of xx):

CaseSibling wwww‘s childrenFix
1RedAnyLeft-rotate at parent, recolour; reduces to cases 2—4
2BlackBoth blackRecolour ww red; push double-black up to parent
3BlackLeft child red, right child blackRight-rotate at wwRecolour children; reduces to case 4
4BlackRight child redLeft-rotate at parent, recolour ww and parent’s parent; terminate

Theorem 1.3. Deletion from a red-black tree with nn nodes takes O(logn)O(\log n) time and performs at most 3 rotations.

Proof. BST deletion is O(logn)O(\log n). The fix-up loop either terminates (cases 1, 3, and 4 involve rotations and terminate or move to case 4 which terminates) or moves the problem one level up (case 2). At most O(logn)O(\log n) iterations, but at most 3 rotations total (case 1 does one, then case 3 does one, then case 4 does one). \blacksquare

Worked Example: Red-Black Tree Deletion

Starting from the tree built in the previous example, delete key 18.

7(B)
/ \
3(B) 18(B)
/ \ / \
2(R) 6(R) 10(R) 22(B)
/ \ \
8(B) 11(B) 26(R)

The node 18 has two children, so we find its successor (22) and replace 18 with 22, then delete the original 22 node. The successor 22 has only a right child (26). Since 22 is black, removing it creates a double-black at 26.

Double-black at 26 (x = 26). Sibling ww is NIL (black). This is case 2 with NIL sibling.

Actually, let me reconsider. When we replace 18 with 22, the structure becomes:

7(B)
/ \
3(B) 22(B)
/ \ /
2(R) 6(R) 10(R)
/ \
8(B) 11(B)
\
26(R)

Wait — we need to handle this more carefully. Let us replace 18’s key with 22’s key and remove the original 22 node. Node 22 had one right child (26, red). We replace 22 with 26 and colour 26 black (since 22 was black). No double-black violation arises because 26 was red and is recoloured to black.

7(B)
/ \
3(B) 22(B)
/ \ /
2(R) 6(R) 10(R)
/ \
8(B) 11(B)

Node 26 is now detached. The tree is still a valid red-black tree. Deletion complete with 0 rotations.

:::caution Common Pitfall The most common error in red-black tree deletion is forgetting to handle the case where the node to be deleted has two children. In this case, one must find the successor (or predecessor), copy its key/value to the node being deleted, and then delete the successor node from its original position. The successor always has at most one child, simplifying the actual removal. :::

1.2 B-Trees and B+ Trees

1.2.1 B-Trees

A B-tree of minimum degree tt (where t2t \geq 2) is a rooted tree satisfying:

  1. Every node other than the root has at least t1t - 1 keys. The root has at least 1 key.
  2. Every node has at most 2t12t - 1 keys.
  3. Every internal node with kk keys has exactly k+1k + 1 children.
  4. All leaves appear at the same depth.

Theorem 1.4. A B-tree of height hh with minimum degree t2t \geq 2 has at least 2th12t^{h-1} leaves (for h1h \geq 1) and stores at least n2th11n \geq 2t^{h-1} - 1 keys.

Proof. The root has at least 2 children (since it has at least 1 key). Every internal node at depth 1\geq 1 has at least tt children. Therefore the number of nodes at depth dd is at least 2td12t^{d-1} for d1d \geq 1. The number of leaves (at depth hh) is at least 2th12t^{h-1}. Each leaf has at least t1t - 1 keys, so n2th1(t1)2th11n \geq 2t^{h-1}(t-1) \geq 2t^{h-1} - 1 (for t2t \geq 2). \blacksquare

Corollary. The height of a B-tree storing nn keys is hlogtn+12=O(logtn)h \leq \log_t \frac{n+1}{2} = O(\log_t n).

For example, with t=1001t = 1001 and n=109n = 10^9, hlog1001(5×108)2.8h \leq \log_{1001}(5 \times 10^8) \approx 2.8So at most 3 disk accesses.

1.2.2 B-Tree Operations

Search. Starting at the root, scan the keys in the current node for the target. If found, return. Otherwise, determine which child pointer to follow and recurse. Takes O(logtn)O(\log_t n) node accesses.

Insertion. Insert the key into the appropriate leaf. If the leaf overflows (has 2t2t keys), split it: promote the median key to the parent, and create two new nodes each with t1t - 1 keys. If the parent overflows, split it recursively. If the root splits, create a new root with one key and two children.

Split cost. Splitting a node costs O(t)O(t) time (copying keys and children) but occurs infrequently. The amortised cost of insertion is O(logtn+t)O(\log_t n + t).

Worked Example: B-Tree Insertion ($t = 2$)

A B-tree with minimum degree t=2t = 2 allows 1—3 keys per node (and 2—4 children per internal node).

Insert keys: 10, 20, 5, 6, 12, 30, 7, 17.

Insert 10: Root = [10]. (1 key, valid)

Insert 20: Root = [10, 20]. (2 keys, valid)

Insert 5: Root = [5, 10, 20]. (3 keys, valid)

Insert 6: Root = [5, 6, 10, 20]. (4 keys = 2t2tOverflow!) Split: promote 6, create [5] and [10, 20].

[6]
/ \
[5] [10, 20]

Insert 12: Insert into [10, 20]: [10, 12, 20]. (3 keys, valid)

Insert 30: Insert into [10, 12, 20]: [10, 12, 20, 30]. (4 keys, overflow!) Split: promote 20, create [10, 12] and [30].

[6, 20]
/ | \
[5] [10,12] [30]

Insert 7: Insert into [5]: [5, 7]. (2 keys, valid)

Insert 17: Insert into [10, 12]: [10, 12, 17]. (3 keys, valid)

Final B-tree:

[6, 20]
/ | \
[5,7] [10,12,17] [30]

Deletion. Deletion from a B-tree is more complex. The key cases are:

  1. Key in a leaf with excess keys (>t1> t - 1): remove.
  2. Key in a leaf with minimum keys (t1t - 1): try to borrow from an immediate sibling (if the sibling has >t1> t - 1 keys, rotate through the parent). If no sibling can lend, merge with a sibling (this reduces the parent’s key count by 1, which may cascade upward).
  3. Key in an internal node: replace with the predecessor (or successor), then delete the predecessor from its leaf position (which reduces to cases 1 or 2).

Theorem 1.5. Deletion from a B-tree takes O(tlogtn)O(t \log_t n) time in the worst case.

Worked Example: B-Tree Deletion ($t = 2$)

Starting from the B-tree:

[6, 20]
/ | \
[5,7] [10,12,17] [30]

Delete key 12. Key 12 is in an internal node. Find predecessor: 10 (in the same node, left of 12). Replace 12 with 10, then delete 10 from its position.

Wait — actually in this B-tree, 12 is in an internal node [10, 12, 17] which is also a leaf (it has no children). So case 1 applies: [10, 12, 17] has 3 keys > t1=1t - 1 = 1So remove 12.

[6, 20]
/ | \
[5,7] [10,17] [30]

Now delete key 20. Key 20 is in the root [6, 20]. Replace with successor: 30 (from [30]). Delete 30 from its leaf: [30] has 1 key = t1t - 1So we need to handle carefully.

Actually, we replace 20 with 30, and the leaf [30] becomes []. Since [30] is now empty and its sibling [10, 17] has 2 keys > t1t - 1We can borrow. But [10, 17] is not an immediate sibling of [30] in terms of being a child of the same parent.

Let me reconsider. The parent of [30] is the root [6, 20]. [30]‘s siblings are [5, 7] and [10, 17]. The right sibling does not exist; the left sibling is [10, 17].

Borrow from [10, 17]: move 17 up to replace 20 in the root, move 20 down to [30].

[6, 17]
/ | \
[5,7] [10] [20, 30]

Valid B-tree: all nodes have 1—3 keys, all leaves at the same depth.

1.2.3 B+ Trees

A B+ tree is a variant of the B-tree where:

  1. All data records (key-value pairs) are stored in leaf nodes only.
  2. Internal nodes store only keys for routing.
  3. Leaf nodes are linked together in a linked list (in sorted order).

Advantages over B-trees:

  • Range queries are efficient: find the start of the range, then follow leaf pointers.
  • Internal nodes have higher fan-out (they store only keys, not values), reducing tree height.
  • All leaves are at the same depth, ensuring uniform access time.

Theorem 1.6. A B+ tree of order dd (maximum dd keys per internal node) with nn records has height hlogd(n)+1h \leq \lceil \log_d(n) \rceil + 1.

1.2.4 B+ Tree vs B-Tree Comparison

PropertyB-TreeB+ Tree
Data storageAll nodesLeaf nodes only
Internal node contentKeys + dataKeys only
Leaf linkingNoYes (linked list)
Range queryMultiple tree traversalsSingle traversal + leaf scan
Fan-outLowerHigher
HeightHigherLower
Use caseGeneral-purposeDatabases, file systems

:::caution Common Pitfall Students often confuse the minimum degree tt of a B-tree with its order. A B-tree of order mm has maximum mm children per internal node, which means t=m/2t = \lceil m/2 \rceil. A B-tree of minimum degree tt has maximum 2t12t - 1 keys per node. Always verify which convention the question or textbook uses. :::

1.3 Tries and Prefix Trees

A trie (from “retrieval”) is a tree data structure for storing strings. Each node represents a prefix, and edges are labelled with characters.

Definition. A trie for a set of strings over alphabet Σ\Sigma is a rooted tree where:

  1. Each edge is labelled with a character from Σ\Sigma.
  2. Each node represents the string formed by the concatenation of edge labels from the root to that node.
  3. A node is marked as a “terminal” node if the string it represents is in the stored set.

Operations:

OperationTime ComplexityNotes
InsertO(L)O(L)LL = length of string
SearchO(L)O(L)Exact match
Prefix searchO(L+k)O(L + k)kk = number of matches
DeleteO(L)O(L)May need to prune empty nodes

Space. A trie storing nn strings of total length NN uses O(N)O(N) nodes. Each node has up to Σ|\Sigma| children pointers.

Theorem 1.7. A trie storing nn distinct strings over alphabet Σ\Sigma has at most nLmaxn \cdot L_{\max} nodes, where LmaxL_{\max} is the maximum string length.

1.3.1 Compressed Tries (Radix Trees)

A compressed trie (or radix tree or Patricia trie) eliminates chains of single-child nodes by compressing the edge labels. Each edge stores a substring rather than a single character.

Space savings. If the strings share long common prefixes, compression significantly reduces the number of nodes.

Worked Example: Trie Construction

Insert the strings: “apple”, “app”, “application”, “apt”, “bat”, “bar”.

Standard trie (only showing non-null paths):

(root)
├── a
│ ├── p
│ │ ├── p [terminal: "app"]
│ │ │ ├── l
│ │ │ │ └── e [terminal: "apple"]
│ │ │ └── l
│ │ │ └── i
│ │ │ └── c
│ │ │ └── a
│ │ │ └── t
│ │ │ └── i
│ │ │ └── o
│ │ │ └── n [terminal: "application"]
│ │ └── t [terminal: "apt"]
│ └── ...
├── b
│ ├── a
│ │ ├── t [terminal: "bat"]
│ │ └── r [terminal: "bar"]
│ └── ...

Compressed trie:

(root)
├── "app" [terminal]
│ ├── "le" [terminal: "apple"]
│ └── "lication" [terminal: "application"]
├── "apt" [terminal]
├── "bat" [terminal]
└── "bar" [terminal]

The compressed trie has only 7 nodes instead of 21+ nodes in the standard trie.

1.3.2 Ternary Search Trees

A ternary search tree (TST) is a hybrid between a trie and a BST. Each node stores a single character and has three children: left (smaller character), middle (equal character, continue), and right (larger character).

Properties:

  • Space: O(N)O(N) where NN is the total length of all strings.
  • Search: O(L+logΣ)O(L + \log |\Sigma|) per string (BST search at each character position).
  • More space-efficient than a standard trie for sparse datasets.

1.4 Splay Trees

A splay tree is a self-adjusting BST with no explicit balance information. After every access (search, insert, or delete), the accessed node is moved to the root using a sequence of rotations called splaying.

Splay operations:

  1. Zig (node is child of root): single rotation.
  2. Zig-zig (node and parent are both left or both right children): rotate parent first, then node.
  3. Zig-zag (node is left child of right child or vice versa): rotate node, then rotate node again.

Theorem 1.8 (Static Optimality). The total time for a sequence of accesses in a splay tree is O(i(1+logWwi))O\left(\sum_{i} \left(1 + \log \frac{W}{w_i}\right)\right) where W=iwiW = \sum_i w_i is the total access count and wiw_i is the number of accesses of the ii-th element.

This means splay trees are asymptotically as fast as a static optimal binary search tree for any access sequence.

Theorem 1.9 (Static Finger). The total access time is O(nlogn+nlogminjkeyikeyj)O\left(n \log n + n \log \min_{j} |\mathrm{key_i} - \mathrm{key_j}|\right) where jj is a fixed “finger” position.

Theorem 1.10 (Working Set). The access time for element ii is O(log(ti))O(\log(t_i)) where tit_i is the number of distinct elements accessed since the last access to element ii.

Worked Example: Splay Tree Operations

Build a splay tree by inserting 1, 2, 3, 4, 5 in order, then access 1.

Insert 1: Root = 1.

Insert 2: Insert 2 as right child of 1. Splay 2: zig at 1.

2
/
1

Insert 3: Insert 3 as right child of 2. Splay 3: zig-zig (3 is right child of 2, 2 is right child of root).

First rotate 2:

2
/
1

Becomes

2
\
3

Then rotate 2 again (now parent of root… Wait, let me redo).

Initial:

2
/ \
1 3

Splay 3: zig (3 is right child of root 2). Rotate 2 left:

3
/
2

Wait, this should be:

3
/
2
\
1

Hmm, let me be more careful.

Starting state after inserting 1 and splaying 2:

2
/
1

Insert 3: 3 > 2, so insert as right child.

2
/ \
1 3

Splay 3: parent of 3 is 2, and 2 is the root. This is a zig case (node is child of root). Rotate 2 left around 3.

Result:

3
/
2
/
1

Insert 4: Insert as right child of 3. Splay 4: zig (4 is right child of root 3).

4
/
3
/
2
/
1

Insert 5: Insert as right child of 4. Splay 5: zig.

5
/
4
/
3
/
2
/
1

Access 1: Search for 1 (traverse left 4 times). Splay 1: this is a zig-zig-zig-zig (all left children). Apply zig-zig repeatedly.

Zig-zig on 1-2-3: rotate 2 left, then rotate 1 left.

2
/ \
1 3
\
4
\
5

Wait, zig-zig means: node and parent are both left children. Rotate parent first, then rotate node.

Step 1: 1 is left of 2, 2 is left of 3. Rotate 2 (zig on 2): 3 becomes parent of 2, 2 goes right.

3
/
2
/ \
1 4
\
5

Step 2: Now 2 is left of 3, 3 is left of 4. Rotate 3 (zig on 3): 4 becomes parent of 3.

4
/
3
/
2
/ \
1 5

Step 3: 2 is left of 3, 3 is left of 4. Rotate 3.

4
/
3
/
2
/ \
1 5

Hmm, I need to be more systematic. Let me use the standard zig-zig rule.

Zig-zig (left-left): let the chain be g -> p -> x (all left children). First rotate p to the right of g (making p the new root of this subtree, with g as right child). Then rotate x to the right of p.

Starting:

4
/
3
/
2
/
1
\
5 (was under 4, now let's ignore 5 for simplicity)

Actually, 5 is the right child of 4. Let me include it properly.

4
/ \
3 5
/
2
/
1

Zig-zig on chain 4-3-2: rotate 3 (right of 4), then rotate 2 (right of 3).

Rotate 3 right: 3 becomes root, 4 goes to 3’s right. 4’s left was 3, so 4’s left becomes 3’s right (which was 2). So:

3
/ \
2 4
/ \
1 5

Now zig-zig on chain 3-2-1: rotate 2 (right of 3), then rotate 1 (right of 2).

Rotate 2 right: 2 becomes root, 3 goes to 2’s right. 3’s left becomes 2’s right (which is NIL… Wait, 2’s right subtree was the subtree between 2 and 3, which is empty). Actually, 2’s right child in the tree above is NIL (2 only has left child 1). So after rotating 2 right of 3:

2
/ \
1 3
\
4
/ \
5

Wait, 3’s right was 4, and 4 has left NIL and right 5. So:

2
/ \
1 3
\
4
\
5

Now 1 is left of 2, and 2 is the root. This is a zig case. Rotate 1 right of 2:

1
\
2
\
3
\
4
\
5

The tree is now right-skewed again! This is because zig-zig followed by zig on a completely skewed tree reverses the direction of skew. The amortised analysis guarantees good performance over a sequence, but individual operations can be O(n)O(n).

2. Advanced Priority Queues

2.1 Binomial Heaps

A binomial heap is a collection of binomial trees satisfying the heap property. It supports merge in O(logn)O(\log n) and insert, extract-min, and decrease-key in O(logn)O(\log n).

Binomial tree BkB_k. Defined recursively: B0B_0 is a single node. BkB_k is formed by linking two Bk1B_{k-1} trees, making one the leftmost child of the other’s root.

Properties of BkB_k:

PropertyValue
Number of nodes2k2^k
Heightkk
Number of children of rootkk
Minimum number of nodes at depth dd(kd)\binom{k}{d}

Theorem 2.1. A binomial heap with nn nodes contains at most log2n+1\lfloor \log_2 n \rfloor + 1 binomial trees.

Proof. The binary representation of nn determines which binomial trees are present. If n=i2bin = \sum_{i} 2^{b_i} (binary decomposition), then the heap contains exactly the trees Bb1,Bb2,B_{b_1}, B_{b_2}, \ldots. The number of terms is at most log2n+1\lfloor \log_2 n \rfloor + 1. \blacksquare

Operations:

  • Merge: Analogous to binary addition. Combine trees of the same degree by linking. O(logn)O(\log n).
  • Insert: Create a single-node heap and merge. O(logn)O(\log n).
  • Extract-min: Remove the root with minimum key, merge its children into a new heap, merge with the remaining heap. O(logn)O(\log n).
  • Decrease-key: Bubble the decreased key up to the root. O(logn)O(\log n).
Worked Example: Binomial Heap Operations

Create two binomial heaps and merge them.

Heap H1H_1: insert 3, 7, 1.

  • Insert 3: H1={B0:3}H_1 = \{B_0: 3\}
  • Insert 7: H1={B0:3}{B0:7}H_1 = \{B_0: 3\} \cup \{B_0: 7\}. Link: H1={B1:root3,child7}H_1 = \{B_1: \text{root} 3, \text{child} 7\}
  • Insert 1: H1={B1:(3,7)}{B0:1}H_1 = \{B_1: (3,7)\} \cup \{B_0: 1\}. No linking (different degrees). H1={B0:1,B1:(3,7)}H_1 = \{B_0: 1, B_1: (3,7)\}

Heap H2H_2: insert 5, 2.

  • Insert 5: H2={B0:5}H_2 = \{B_0: 5\}
  • Insert 2: H2={B0:5}{B0:2}H_2 = \{B_0: 5\} \cup \{B_0: 2\}. Link: H2={B1:root2,child5}H_2 = \{B_1: \text{root} 2, \text{child} 5\}

Merge H1H_1 and H2H_2: {B0:1,B1:(3,7)}{B1:(2,5)}\{B_0: 1, B_1: (3,7)\} \cup \{B_1: (2,5)\}.

Both have B1B_1 trees. Link them (root 2 < root 3, so 3 becomes child of 2): B2B_2: root 2, children: 3 (with child 7), 5.

Final merged heap: {B0:1,B2:root2,children[3,5],3schild7}\{B_0: 1, B_2: \text{root} 2, \text{children} [3, 5], \text{3}'s child 7\}

Minimum element: 1 (root of B0B_0).

Extract-min: remove B0:1B_0: 1. Remaining: {B2:2,}\{B_2: 2, \ldots\}. Minimum is now 2.

2.2 Fibonacci Heaps

A Fibonacci heap is a collection of min-heap-ordered trees supporting:

OperationAmortised CostNote
InsertO(1)O(1)Add tree to root list
Find-minO(1)O(1)Pointer to min root
Extract-minO(logn)O(\log n)Consolidate trees
Decrease-keyO(1)O(1)Cut and cascade
DeleteO(logn)O(\log n)Decrease-key to -\inftyThen extract-min
UnionO(1)O(1)Concatenate root lists

Key difference from binomial heaps: Fibonacci heaps allow trees to have any structure (not just binomial trees). Trees are only restructured during extract-min (consolidation). Decrease-key may cut a subtree from its parent and add it to the root list (“cascading cut”).

Lazy merging. Two heaps are merged by concatenating their root lists in O(1)O(1) time.

Consolidation. During extract-min, all roots are combined by degree using an array. Trees of the same degree are linked.

Theorem 2.2. The amortised cost of extract-min is O(D(n))O(D(n)) where D(n)D(n) is the maximum degree of any node in the Fibonacci heap.

Theorem 2.3. D(n)=O(logϕn)D(n) = O(\log_\phi n) where ϕ=(1+5)/2\phi = (1 + \sqrt{5})/2 is the golden ratio.

Proof (outline). Define the potential Φ=t(H)+2m(H)\Phi = t(H) + 2m(H) where t(H)t(H) is the number of trees and m(H)m(H) is the number of marked nodes. Show that each operation’s amortised cost is bounded. For decrease-key, the actual cost is O(c)O(c) where cc is the number of cascading cuts. The change in potential is at most c+22m(H)(termscancel)c + 2 - 2m'(H) \cdot (\text{terms} cancel)Giving O(1)O(1) amortised. \blacksquare

:::caution Common Pitfall Fibonacci heaps have excellent amortised bounds but poor constant factors in practice due to the overhead of maintaining the root list, marking nodes, and consolidation. For this reason, binary heaps (or pairing heaps) are often preferred in practice despite worse theoretical amortised bounds for decrease-key. :::

2.3 Pairing Heaps

A pairing heap is a simplified alternative to Fibonacci heaps. It is a single min-heap-ordered multi-way tree.

Operations:

  • Find-min: O(1)O(1) (return root).
  • Insert: O(1)O(1) (add as leftmost child of root).
  • Merge: O(1)O(1) (make the larger root a child of the smaller root).
  • Delete-min: Two-pass merging of all children: first pass pairs adjacent children, second pass merges the resulting trees left to right.
  • Decrease-key: O(1)O(1) (cut the node and merge with root).

Theorem 2.4. Insert, find-min, merge, and decrease-key take O(1)O(1) amortised time. Delete-min takes O(logn)O(\log n) amortised time.

Proof sketch. The potential function is Φ=logn\Phi = \log n per subtree (where nn is the subtree size). The key insight is that the two-pass merge of delete-min reduces the potential by at least logn\log nAbsorbing the O(n)O(n) work of merging. \blacksquare

Note. Whether pairing heaps achieve O(logn)O(\log n) amortised delete-min was an open problem for many years. It was resolved by Iacono and Ozturk (2018) who proved O(logn)O(\log n) for a variant called the “original pairing heap” and gave a Ω(loglogn)\Omega(\log \log n) lower bound for a simpler variant.

3. Advanced Graph Representations

3.1 Incidence Matrix

The incidence matrix MM of an undirected graph G=(V,E)G = (V, E) with nn vertices and mm edges is an n×mn \times m matrix where:

Mv,e={1ifvertexv isincidenttoedgee0otherwiseM_{v,e} = \begin{cases} 1 & \text{if} vertex v \text{ is} incident to edge e \\ 0 & \text{otherwise} \end{cases}

For directed graphs, Mv,e=1M_{v,e} = 1 if vv is the tail of ee, Mv,e=1M_{v,e} = -1 if vv is the head of eeAnd 00 otherwise.

Properties:

  • Space: O(nm)O(nm).
  • The rank of MM over R\mathbb{R} is ncn - c where cc is the number of connected components.
  • The number of spanning trees of GG equals any cofactor of MMTMM^T (Kirchhoff’s matrix tree theorem).

3.2 Implicit Graphs

Many graphs are not stored explicitly but defined by a rule or function. Examples:

  • State space graphs: Each vertex is a configuration; edges are valid transitions. Example: the 15-puzzle has 16!/2101316!/2 \approx 10^{13} states.
  • Geometric graphs: Vertices are points in Rd\mathbb{R}^d; edges connect nearby points. Example: Delaunay triangulation.
  • Social networks: Vertices are users; edges are friendships.

Implicit graph traversal requires a successor function that generates neighbours on-the-fly, rather than storing them.

3.3 Compressed Graph Representations

For massive graphs that do not fit in memory:

  • CSR (Compressed Sparse Row): Two arrays: offsets (size V+1|V|+1) and neighbours (size 2E2|E|). offsets[i] to offsets[i+1]-1 gives the range of neighbours for vertex ii. Space: O(V+E)O(V + E)Cache-friendly.
  • WebGraph framework: Compresses web graphs using gap encoding, reference chains, and interval encoding. Achieves 2—3 bits per edge for typical web graphs.
Worked Example: CSR Construction

Graph: vertices 0, 1, 2, 3 with edges: 0-1, 0-2, 1-2, 2-0, 2-3, 3-3.

Neighbour lists (sorted):

  • Vertex 0: [1, 2]
  • Vertex 1: [2]
  • Vertex 2: [0, 3]
  • Vertex 3: [3]

Offsets array: [0, 2, 3, 5, 6] Neighbours array: [1, 2, 2, 0, 3, 3]

To find neighbours of vertex 2: neighbours[offsets[2]..offsets[3]-1] = neighbours[3..4] = [0, 3].

Total space: (V+1)+2E=5+12=17(|V|+1) + 2|E| = 5 + 12 = 17 integers.

4. Disjoint Set Union — Advanced Analysis

4.1 The Inverse Ackermann Function

The inverse Ackermann function α(n)\alpha(n) is defined in terms of a rapidly growing function Ak(j)A_k(j):

Ak(j)={2jifk=00ifj=0 andk1Ak1(Ak(j1))ifj1 andk1A_k(j) = \begin{cases} 2j & \text{if} k = 0 \\ 0 & \text{if} j = 0 \text{ and} k \geq 1 \\ A_{k-1}(A_k(j-1)) & \text{if} j \geq 1 \text{ and} k \geq 1 \end{cases}

α(n)=min{k:Ak(1)n}\alpha(n) = \min\{k : A_k(1) \geq n\}

Key values: α(1)=0\alpha(1) = 0, α(2)=1\alpha(2) = 1, α(4)=2\alpha(4) = 2, α(16)=3\alpha(16) = 3, α(265536)=4\alpha(2^{65536}) = 4.

For all practical purposes, α(n)4\alpha(n) \leq 4.

4.2 Detailed Analysis of Union-Find

Theorem 4.1. A sequence of mm MakeSet, Union, and Find operations on nn elements, using union by rank with path compression, takes O(mα(n))O(m \alpha(n)) time.

Proof (high-level outline).

Define the “level” of a node based on its rank. The key idea is to partition the nodes into groups and bound the total charges.

Let Ak(j)A_k(j) be as defined above. Node xx has level \ell if rank(x)[A(log2n),A+1(log2n))\text{rank}(x) \in [A_\ell(\lfloor \log_2 n \rfloor), A_{\ell+1}(\lfloor \log_2 n \rfloor)).

For a Find operation along a path x1,x2,,xkx_1, x_2, \ldots, x_kPath compression makes all nodes point to the root. We charge the cost of the Find as follows:

  • For each node xix_i that is not the root, if xix_i‘s parent changes, charge O(1)O(1) to xix_i.
  • We need to show each node is charged O(α(n))O(\alpha(n)) times total.

The …/1-number-and-algebra/3_proof-and-logic proceeds by showing that a node at level \ell can be charged at most a constant number of times before it moves to a higher level or its parent’s level is too high for further charges. The total charges sum to O(nα(n)+m)O(n \alpha(n) + m). \blacksquare

4.3 Union-Find with Partial Persistence

A persistent data structure preserves all previous versions. Partial persistence allows queries on any previous version but modifications only on the latest version.

Persistent Union-Find (Bender et al., 2002): Supports mm operations on nn elements with O(loglogn)O(\log \log n) amortised time per operation and O(mα(n))O(m \alpha(n)) total space.

5. Interval Trees and Segment Trees

5.1 Interval Trees

An interval tree stores a set of intervals [li,ri][l_i, r_i] and supports:

  • Insert an interval: O(logn)O(\log n).
  • Delete an interval: O(logn)O(\log n).
  • Query: find all intervals overlapping a given point qq (or interval). O(logn+k)O(\log n + k) where kk is the number of reported intervals.

Structure. An augmented BST where:

  • In-order traversal of keys gives the intervals sorted by their left endpoint (or by midpoint).
  • Each node stores a key xmidx_{\mathrm{mid}} (the median endpoint) and a max-endpoint for the subtree.

Query algorithm: Starting at the root, compare qq with xmidx_{\mathrm{mid}}:

  1. If q<xmidq < x_{\mathrm{mid}}: report all intervals in the left subtree that overlap qq (check max-endpoint), then recurse into the left subtree. Also check if any interval stored at the current node overlaps qq.
  2. If qxmidq \geq x_{\mathrm{mid}}: similar for the right subtree.

Theorem 5.1. Query in an interval tree takes O(logn+k)O(\log n + k) time where kk is the number of reported intervals.

5.2 Segment Trees

A segment tree stores an array A[1..n]A[1..n] and supports:

  • Point update: set A[i]=vA[i] = v. O(logn)O(\log n).
  • Range query: compute combine(A[l],A[l+1],,A[r])\mathrm{combine}(A[l], A[l+1], \ldots, A[r]) for any associative operation (sum, min, max, gcd). O(logn)O(\log n).

Structure. A binary tree where:

  • The root represents the range [1,n][1, n].
  • Each internal node represents a sub-range, split in half between its children.
  • Leaves represent individual elements.

Space. O(n)O(n) (at most 4n4n nodes in the array representation).

Theorem 5.2. A segment tree on nn elements uses O(n)O(n) space and supports point update and range query in O(logn)O(\log n) time each.

Proof. The segment tree is a complete binary tree of height log2n\lceil \log_2 n \rceil. The number of nodes is at most 22log2n+14n2 \cdot 2^{\lceil \log_2 n \rceil + 1} \leq 4n. Each update or query visits at most 2log2n2 \log_2 n nodes (one per level, on each side). \blacksquare

Worked Example: Segment Tree Range Sum

Array A=[3,1,4,1,5,9,2,6]A = [3, 1, 4, 1, 5, 9, 2, 6], n=8n = 8.

Build the segment tree for range sum queries:

[0,7]: sum=31
/ \
[0,3]: sum=9 [4,7]: sum=22
/ \ / \
[0,1]: sum=4 [2,3]:5 [4,5]:14 [6,7]:8
/ \ / \ / \ / \
[0]:3 [1]:1 [2]:4 [3]:1 [4]:5 [5]:9 [6]:2 [7]:6

Query: sum of A[2..5]A[2..5].

  1. Start at root [0,7]. Need [2,5]. This is contained in [0,7] but not equal. Split: go left [0,3] and right [4,7].
  2. [0,3] vs [2,5]: overlap is [2,3]. Split [0,3]: go right to [2,3].
  3. [2,3] vs [2,5]: [2,3] is contained in [2,5]. Add sum = 5.
  4. [4,7] vs [2,5]: overlap is [4,5]. Split [4,7]: go left to [4,5].
  5. [4,5] vs [2,5]: [4,5] is contained in [2,5]. Add sum = 14.

Total: 5+14=195 + 14 = 19. Verify: A[2]+A[3]+A[4]+A[5]=4+1+5+9=19A[2] + A[3] + A[4] + A[5] = 4 + 1 + 5 + 9 = 19. Correct.

Update: set A[3]=10A[3] = 10.

  1. Start at root [0,7]. Go left to [0,3], then right to [2,3], then right to [3].
  2. Set leaf [3] = 10.
  3. Update [2,3]: 4+10=144 + 10 = 14.
  4. Update [0,3]: 4+14=184 + 14 = 18.
  5. Update [0,7]: 18+22=4018 + 22 = 40.

All updates take O(logn)=O(3)O(\log n) = O(3) steps.

5.3 Fenwick Trees (Binary Indexed Trees)

A Fenwick tree (BIT) is a space-efficient alternative to the segment tree for prefix sum queries and point updates.

Structure. An array BIT[1..n] where BIT[i] stores the sum of a specific range ending at index ii. The range is determined by the lowest set bit of ii: if lsb(i)=2k\mathrm{lsb}(i) = 2^kThen BIT[i] stores the sum of A[i2k+1..i]A[i - 2^k + 1..i].

Operations:

  • Prefix sum j=1iA[j]\sum_{j=1}^{i} A[j]: Traverse BIT by removing lowest set bits. O(logn)O(\log n).
  • Point update A[i]+=δA[i] += \delta: Traverse BIT by adding lowest set bits. O(logn)O(\log n).
  • Range sum A[l..r]A[l..r]: prefix(r)prefix(l1)\mathrm{prefix}(r) - \mathrm{prefix}(l-1). O(logn)O(\log n).

Advantages over segment trees: Simpler to implement, lower constant factor, O(n)O(n) space (exactly n+1n+1).

Limitations: Only supports prefix queries (not arbitrary range combinations unless the operation has an inverse, like sum or XOR).

Theorem 5.3. A Fenwick tree on nn elements supports prefix query and point update in O(logn)O(\log n) time and uses O(n)O(n) space.

Worked Example: Fenwick Tree Construction

Array A=[3,1,4,1,5,9,2,6]A = [3, 1, 4, 1, 5, 9, 2, 6], n=8n = 8.

Binary representations: 1=001, 2=010, 3=011, 4=100, 5=101, 6=110, 7=111, 8=1000.

BIT[i] stores sum of A[i2k+1..i]A[i - 2^k + 1..i] where k=lsb(i)k = \mathrm{lsb}(i).

  • BIT[1] = A[1]A[1] = 3 (lsb=1, range [1,1])
  • BIT[2] = A[1]+A[2]A[1] + A[2] = 4 (lsb=2, range [1,2])
  • BIT[3] = A[3]A[3] = 4 (lsb=1, range [3,3])
  • BIT[4] = A[1]+A[2]+A[3]+A[4]A[1] + A[2] + A[3] + A[4] = 9 (lsb=4, range [1,4])
  • BIT[5] = A[5]A[5] = 5 (lsb=1, range [5,5])
  • BIT[6] = A[5]+A[6]A[5] + A[6] = 14 (lsb=2, range [5,6])
  • BIT[7] = A[7]A[7] = 2 (lsb=1, range [7,7])
  • BIT[8] = A[1]++A[8]A[1] + \cdots + A[8] = 31 (lsb=8, range [1,8])

Query prefix sum up to 6: BIT[6] + BIT[4] = 14 + 9 = 23. Verify: 3+1+4+1+5+9=233 + 1 + 4 + 1 + 5 + 9 = 23. Correct.

Query prefix sum up to 5: BIT[5] + BIT[4] = 5 + 9 = 14. Verify: 3+1+4+1+5=143 + 1 + 4 + 1 + 5 = 14. Correct.

Range sum A[3..6]A[3..6]: prefix(6) - prefix(2) = 23 - 4 = 19. Verify: 4+1+5+9=194 + 1 + 5 + 9 = 19. Correct.

Update A[3]+=5A[3] += 5: Update BIT[3] += 5 (BIT[3] = 9). Then BIT[4] += 5 (BIT[4] = 14). Then BIT[8] += 5 (BIT[8] = 36).

6. String Data Structures

6.1 Suffix Arrays

A suffix array SA\mathrm{SA} of a string SS of length nn is a permutation of {0,1,,n1}\{0, 1, \ldots, n-1\} such that S[SA[0]..]<S[SA[1]..]<<S[SA[n1]..]S[\mathrm{SA}[0]..] < S[\mathrm{SA}[1]..] < \cdots < S[\mathrm{SA}[n-1]..] (lexicographic order of suffixes).

Construction: The most efficient algorithm (SA-IS) constructs the suffix array in O(n)O(n) time. A simpler approach uses radix sort on prefixes of length 1,2,4,8,1, 2, 4, 8, \ldots in O(nlog2n)O(n \log^2 n) time.

Theorem 6.1. The suffix array of a string of length nn can be constructed in O(n)O(n) time (SA-IS algorithm) or O(nlogn)O(n \log n) time (using the doubling algorithm with radix sort).

6.2 Suffix Trees

A suffix tree for a string SS of length nn is a compressed trie of all nn suffixes of SS (plus a unique terminator \text{\}$).

Theorem 6.2 (Ukkonen). A suffix tree can be constructed in O(n)O(n) time online.

Theorem 6.3. A suffix tree for a string of length nn has at most 2n2n nodes and at most 2n12n - 1 edges.

Applications:

ProblemSuffix tree solutionTime
Exact pattern matchingTraverse from rootO(m+k)O(m + k)
Longest repeated substringFind deepest internal nodeO(n)O(n)
Longest common substringGeneralised suffix treeO(n+m)O(n + m)
Longest palindromeSuffix tree of SS + reverse(SS)O(n)O(n)

6.3 LCP Arrays

The LCP (Longest Common Prefix) array stores \mathrm{LCP[i] = the length of the longest common prefix of suffixes \mathrm{SA[i] and \mathrm{SA[i-1].

Theorem 6.4 (Kasai). The LCP array can be computed from the suffix array in O(n)O(n) time.

Kasai’s algorithm. Uses the inverse suffix array \mathrm{SA^{-1}[\mathrm{SA[i]] = i and processes suffixes in text order. When computing \mathrm{LCP[\mathrm{SA^{-1}[j]]The result is at least \mathrm{LCP[\mathrm{SA^{-1}[j-1]] - 1.

Worked Example: Suffix Array and LCP Array

String S = \text{banana\},,N = 7$.

All suffixes: 0: banana1:anana1: anana 2: nana3:ana3: ana 4: na5:a5: a 6: $

Sorted suffixes: 6: 5:a5: a 3: ana1:anana1: anana 0: banana4:na4: na 2: nana$

Suffix array: SA=[6,5,3,1,0,4,2]\mathrm{SA} = [6, 5, 3, 1, 0, 4, 2]

LCP array (LCP with previous suffix): LCP[0]=0\mathrm{LCP}[0] = 0 (undefined for first) LCP[1]=0\mathrm{LCP}[1] = 0 (LCP("","a", "a") = 0) LCP[2]=1\mathrm{LCP}[2] = 1 (LCP(“a","ana", "ana”) = 1) LCP[3]=3\mathrm{LCP}[3] = 3 (LCP(“ana","anana", "anana”) = 3) LCP[4]=0\mathrm{LCP}[4] = 0 (LCP(“anana","banana", "banana”) = 0) LCP[5]=0\mathrm{LCP}[5] = 0 (LCP(“banana","na", "na”) = 0) LCP[6]=2\mathrm{LCP}[6] = 2 (LCP(“na","nana", "nana”) = 2)

LCP array: [0,0,1,3,0,0,2][0, 0, 1, 3, 0, 0, 2]

Longest repeated substring: The maximum LCP value is 3, between “ana"and"anana" and "anana”, giving “ana”.

7. Amortised Analysis — Detailed Framework

7.1 The Three Methods Compared

MethodKey ideaWhen to use
AggregateTotal cost / number of operationsSimple cases where total cost is easy to compute
AccountingAssign amortised cost, accumulate creditWhen you can identify a “credit” scheme
PotentialDefine potential function Φ\PhiMost general; works when credit is hard to localise

Equivalence. All three methods give the same amortised bounds when applied correctly. The accounting method is a special case of the potential method where Φ\Phi is the total credit.

7.2 Choosing a Potential Function

A good potential function satisfies:

  1. Φ(D0)=0\Phi(D_0) = 0 (initial potential is 0).
  2. Φ(Di)0\Phi(D_i) \geq 0 for all ii (potential is never negative).
  3. Φ\Phi increases before expensive operations and decreases during them.

Common potential functions:

Data structurePotential function Φ\Phi
Dynamic arrayΦ=2numsize\Phi = 2 \cdot \mathrm{num} - \mathrm{size} (credit per element)
Binary counterΦ=\Phi = number of 1-bits
StackΦ=\Phi = number of elements
Splay treeΦ=xlog(size(x))\Phi = \sum_x \log(\mathrm{size}(x))
Union-FindΦ\Phi based on node levels

7.3 Limitations of Amortised Analysis

:::caution Common Pitfall Amortised bounds do not guarantee worst-case performance for individual operations. In real-time systems, an O(n)O(n) operation (even if amortised O(1)O(1)) may violate timing constraints. For real-time applications, use data structures with worst-case bounds (e.g., balanced BSTs instead of splay trees, or dynamic arrays with geometric resizing only when safe). :::

Theorem 7.1. There exist sequences of operations where any data structure supporting dynamic array operations must pay Ω(logn)\Omega(\log n) per operation in the worst case (cell-probe model lower bound).

8. Balanced BST Variants

8.1 Treaps (Tree + Heap)

A treap is a BST where each node has a key (BST property) and a priority (min-heap or max-heap property). The priorities are assigned randomly at insertion time.

Property: Expected height is O(logn)O(\log n) (by the same argument as random BSTs).

Rotations. If the heap property is violated, perform rotations to restore it:

  • If a node’s priority is less than its parent’s (min-heap treap), rotate the node up.
  • This is equivalent to inserting the node and then rotating it up to its correct heap position.

Split and merge operations:

  • Split(T,k)(T, k): Split treap TT into two treaps LL and RR where all keys in LL are k\leq k and all keys in RR are >k> k. O(logn)O(\log n).
  • Merge(L,R)(L, R): Merge two treaps where all keys in LL are less than all keys in RR. O(logn)O(\log n).

Theorem 8.1. Split and merge in a treap take O(logn)O(\log n) expected time.

Proof. Split traverses a root-to-leaf path, performing rotations. The expected depth of any node is O(logn)O(\log n)So the expected path length is O(logn)O(\log n). Merge follows a single path from the root of one treap downward. \blacksquare

Worked Example: Treap Insertion

Insert keys 5, 3, 7, 1, 4 with random priorities (min-heap):

Insert 5 (priority 0.7): Root = (5, 0.7).

(5, 0.7)

Insert 3 (priority 0.2): 3 goes left of 5. Priority 0.2 < 0.7, so rotate 3 up.

(3, 0.2)
\
(5, 0.7)

Insert 7 (priority 0.9): 7 goes right of 5. Priority 0.9 > 0.7, no rotation needed.

(3, 0.2)
\
(5, 0.7)
\
(7, 0.9)

Insert 1 (priority 0.5): 1 goes left of 3. Priority 0.5 > 0.2, no rotation needed.

(3, 0.2)
/ \
(1, 0.5) (5, 0.7)
\
(7, 0.9)

Insert 4 (priority 0.1): 4 goes left of 5 (since 4 < 5). Priority 0.1 < 0.7, rotate 4 up past 5.

(3, 0.2)
/ \
(1, 0.5) (4, 0.1)
/ \
(5, 0.7)
\
(7, 0.9)

Now check: 4’s priority (0.1) < parent 3’s priority (0.2). Rotate 4 up past 3.

(4, 0.1)
/ \
(3, 0.2) (5, 0.7)
/ \
(1, 0.5) (7, 0.9)

Verify: BST property holds (1 < 3 < 4 < 5 < 7). Min-heap property holds (0.1 < 0.2, 0.1 < 0.7; 0.2 > 0.5… Wait, 0.5 > 0.2, so the min-heap property is violated between 3 and 1).

Hmm, priority 0.5 > 0.2 (parent). This violates the min-heap. So 1 should rotate up past 3.

(4, 0.1)
/ \
(1, 0.5) (5, 0.7)
\ \
(3, 0.2) (7, 0.9)

Now 1’s priority 0.5 > 4’s priority 0.1. No violation (min-heap: parent has lower priority). But BST: 3 > 1 and is right child, OK. 5 > 4 and is right child, OK.

This is a valid treap.

8.2 Weight-Balanced Trees

A weight-balanced tree (also known as BB[α\alpha] tree or Adelson-Velsky-Landis tree) maintains the balance condition:

12αTLT12\frac{1}{2 - \alpha} \leq \frac{|T_L|}{|T|} \leq \frac{1}{2}

For some fixed α(1/4,12/2)\alpha \in (1/4, 1 - \sqrt{2}/2)Where TL|T_L| is the size of the left subtree and T|T| is the total size.

Theorem 8.2. A weight-balanced tree with nn nodes has height O(logn)O(\log n).

Proof. At each level, the subtree size decreases by a factor of at least α\alpha. After hh levels, the minimum size is αhn1\alpha^h n \geq 1Giving hlog1/αn=O(logn)h \leq \log_{1/\alpha} n = O(\log n). \blacksquare

8.3 scapegoat Trees

A scapegoat tree is a BST where no rebalancing is done during insertion (only during deletion if needed). When an insertion causes the height to exceed log3/2n\log_{3/2} nThe algorithm finds a “scapegoat” ancestor whose subtree is unbalanced and rebuilds it.

Height bound. A scapegoat tree with nn nodes has height at most log3/2n\log_{3/2} n (the “scapegoat bound”).

Theorem 8.3. Insertion into a scapegoat tree takes O(logn)O(\log n) amortised time.

Proof. The key insight is that the number of insertions between rebuilds is at least proportional to the size of the rebuilt subtree. Each rebuild of a subtree of size ss costs O(s)O(s). Since each element is charged O(1)O(1) per rebuild, and each element participates in at most O(logn)O(\log n) rebuilds, the amortised cost is O(logn)O(\log n). \blacksquare

9. Graph Decomposition Structures

9.1 Heavy-Light Decomposition

Heavy-light decomposition (HLD) partitions a tree into paths, enabling efficient path queries (sum, max, min) and updates on trees.

Definitions. For each node uuThe child vv with the largest subtree is the heavy child. All other children are light children.

Property. Every root-to-leaf path has at most O(logn)O(\log n) light edges.

Proof. When traversing a light edge from uu to its parent ppThe subtree size at least doubles: |\text{subtree(p)| \geq 2 \cdot |\text{subtree(u)|. Since the tree has nn nodes, there can be at most log2n\log_2 n light edges on any root-to-leaf path. \blacksquare

Algorithm:

  1. Root the tree and identify heavy/light children.
  2. Decompose into heavy paths (maximal chains of heavy edges).
  3. Each heavy path is stored in a segment tree (or Fenwick tree) for efficient queries.
  4. A path query between two nodes is decomposed into O(logn)O(\log n) heavy path segments.

Theorem 9.1. HLD supports path queries and updates on a tree in O(log2n)O(\log^2 n) time.

Proof. A path between two nodes is decomposed into O(logn)O(\log n) heavy path segments (due to O(logn)O(\log n) light edges). Each segment query on the segment tree takes O(logn)O(\log n). Total: O(log2n)O(\log^2 n). \blacksquare

Worked Example: Heavy-Light Decomposition

Tree:

1
/|\
2 3 4
/| |
5 6 7
|
8

Subtree sizes: 1:8, 2:5, 3:1, 4:2, 5:2, 6:1, 7:1, 8:1.

Heavy children (largest subtree):

  • Node 1: children have sizes 5, 1, 2. Heavy child: 2.
  • Node 2: children have sizes 2, 1. Heavy child: 5.
  • Node 4: child has size 1. Heavy child: 7.
  • Nodes 3, 5, 6, 7, 8: no children (leaves).

Heavy paths:

  • Path 1: 1 — 2 — 5 — 8
  • Path 2: 3
  • Path 3: 4 — 7
  • Path 4: 6

Path query from 6 to 8:

  1. LCA(6, 8) = 1.
  2. From 6 to 1: 6 — 2 (light edge). Then 2 — 1 (light edge, since 2 is heavy child of 1, going up). Actually: 6 is a light child of 2. So we jump from 6 to the top of 6’s heavy path (just 6), then from 2’s heavy path. Segments: [6], then [5, 2, 1] (reversed: 1, 2, 5) then [8] (but 8 is under 5).

Let me be more precise. From 6 upward:

  • 6 is in heavy path {6}. Top = 6, parent = 2. Move to 2.
  • 2 is in heavy path {1, 2, 5, 8}. Top = 1. Segment: [2, 5, 8]… Wait, the segment from 2 to the top of the heavy path (1) is just [2] (the part from 2 up to 1).

Actually the segment tree stores each heavy path. To go from 6 to 8:

  • 6 to LCA = 1: segments on the path from 6 to 1.
  • 8 to LCA = 1: segments on the path from 8 to 1.

This example is simplified. In practice, HLD requires careful implementation of the decompose``queryAnd update functions.

9.2 Centroid Decomposition

Centroid decomposition recursively decomposes a tree by finding the centroid (node whose removal leaves no subtree with more than n/2n/2 nodes) and recursing on each resulting component.

Theorem 9.2. Every tree has a centroid.

Proof. Start at any node and move toward the heaviest subtree. The subtree size strictly decreases (since we always move away from the heaviest child). When we reach a node where no subtree exceeds n/2n/2That node is the centroid. \blacksquare

Theorem 9.3. Centroid decomposition has depth O(logn)O(\log n).

Proof. At each level, every component has at most n/2n/2 nodes. After kk levels, the largest component has at most n/2kn/2^k nodes. When n/2k<1n/2^k < 1The decomposition is complete, giving depth O(logn)O(\log n). \blacksquare

Applications:

  • Finding the shortest path between all pairs of nodes passing through each centroid.
  • Distance queries on trees.
  • Dynamic connectivity on trees.

10. Persistent and Functional Data Structures

10.1 Persistent Data Structures

A persistent data structure preserves previous versions. Fully persistent structures allow queries and updates on any version. Partially persistent structures allow queries on any version but updates only on the latest version.

10.2 Persistent Segment Trees

A persistent segment tree creates O(logn)O(\log n) new nodes per update (only the path from the root to the modified leaf is copied).

Space. After mm updates, the persistent segment tree has O(n+mlogn)O(n + m \log n) nodes.

Theorem 10.1. A persistent segment tree supports point update and range query on any version in O(logn)O(\log n) time and O(n+mlogn)O(n + m \log n) space after mm updates.

Worked Example: Persistent Segment Tree

Array: A=[3,1,4,1]A = [3, 1, 4, 1].

Version 0: Build segment tree for [3, 1, 4, 1].

[0,3]: sum=9
/ \
[0,1]: sum=4 [2,3]: sum=5
/ \ / \
[0]:3 [1]:1 [2]:4 [3]:1

Version 1: Update A[2]=10A[2] = 10. Create new nodes on path root -> [0,1]… Wait, we need [2,3] -> [2].

New path: root -> [2,3] -> [2]. Create new versions of these nodes.

Version 0: Version 1:
[0,3]:9 [0,3]:15
/ \ / \
[0,1]:4 [2,3]:5 [0,1]:4 [2,3]:11
/ \ / \ / \ / \
[0]:3 [1]:1 [2]:4 [3]:1 [0]:3 [1]:1 [2]:10 [3]:1

Shared nodes: [0]:3, [1]:1, [3]:1, [0,1]:4. New nodes: root, [2,3], [2]. Total new nodes per update: O(logn)=O(2)O(\log n) = O(2) (for n=4n = 4).

Query version 0, range [0,2]: 3+1+4=83 + 1 + 4 = 8. Query version 1, range [0,2]: 3+1+10=143 + 1 + 10 = 14.

10.3 Functional Queues (Batched)

A functional queue supports enqueue and dequeue in O(1)O(1) amortised time using two stacks (a technique due to Okasaki).

Structure: Two lists: front (for dequeue) and rear (for enqueue).

  • enqueue: Add to rear. O(1)O(1).
  • dequeue: If front is empty, reverse rear and set front = reversed rearClear rear. Then pop from front. O(1)O(1) amortised.

Theorem 10.2. The batched functional queue supports enqueue and dequeue in O(1)O(1) amortised time per operation.

Proof. Each element is pushed onto rear once (O(1)O(1)), reversed (when front is empty, O(k)O(k) for kk elements), and popped from front once (O(1)O(1)). The reversal cost is amortised over the kk dequeue operations that follow. \blacksquare

11. Problem Set

8.1 Balanced Trees (Problems 1—4)

Problem 1. Prove that the number of rotations performed during insertion into a red-black tree is at most 2. Give a concrete sequence of insertions that achieves exactly 2 rotations.

Problem 2. Show the result of inserting keys 1, 2, 3, 4, 5, 6, 7, 8 into an initially empty B-tree with minimum degree t=3t = 3. Show every split.

Problem 3. Prove that in a B+ tree, the number of leaf nodes is at least n/d\lceil n / d \rceil where nn is the number of records and dd is the maximum number of keys per leaf.

Problem 4. Given the following strings, construct a compressed trie: “abracadabra”, “abracad”, “ab”, “abra”, “cad”.

8.2 Priority Queues (Problems 5—7)

Problem 5. Show the state of a binomial heap after inserting 4, 8, 2, 15, 7, 3, 1 (in that order). Show the tree structure after each insert.

Problem 6. Explain why Fibonacci heaps are not widely used in practice despite their superior amortised bounds for decrease-key. Discuss at least three practical disadvantages.

Problem 7. Perform delete-min on a pairing heap with root 3 and children (in left-to-right order): 5, 1, 8, 4. Show the two-pass merge process step by step.

8.3 Advanced Structures (Problems 8—11)

Problem 8. Given an array A=[7,2,5,1,8,3,6,4]A = [7, 2, 5, 1, 8, 3, 6, 4]Build a segment tree for range minimum queries. Then query min(A[2..5])\min(A[2..5])Update A[3]=0A[3] = 0And query min(A[1..6])\min(A[1..6]).

Problem 9. Build a Fenwick tree for the array A=[5,3,7,1,4,6,2]A = [5, 3, 7, 1, 4, 6, 2]. Compute the prefix sum up to index 5, the range sum A[2..6]A[2..6]And show the effect of adding 10 to A[4]A[4].

Problem 10. Compute the suffix array and LCP array for the string “mississippi$”. Use the result to find the longest repeated substring.

Problem 11. Prove that the LCP array satisfies \sum_{i=1}^{n} \mathrm{LCP{}[i] = O(n^2) in the worst case, and construct a string that achieves this bound (up to constants).

8.4 Analysis (Problems 12—15)

Problem 12. Using the potential method, prove that the amortised cost of insert into a dynamic array that doubles when full and halves when less than 1/4 full is O(1)O(1). Define the potential function and verify both conditions.

Problem 13. A deque (double-ended queue) supports push-front, push-back, pop-front, pop-back, all in O(1)O(1) amortised time. Design such a deque using three stacks and prove the amortised bounds using the potential method.

Problem 14. Prove that a Fibonacci heap with nn nodes has at most O(logϕn)O(\log_\phi n) children of the minimum node after consolidation.

Problem 15. Show that the interval tree query algorithm is correct: prove that it visits at most O(logn)O(\log n) nodes plus the kk nodes that overlap the query point.

Solution to Problem 1

Two rotations during insertion. Consider inserting into an empty red-black tree: 1, 2, 3, 4, 5.

After inserting 1, 2, 3: The tree is a chain 1(B) - 2(R) - 3(R). Inserting 3 requires fixing: parent 2 is red, uncle NIL (black). Case 3 (right-right): double rotation at 1. Result: 2(B) with children 1(R) and 3(R).

Now insert 4: Insert as right child of 3, colour red. Parent 3 is red, uncle 1 is red. Case 1: recolour 1, 3 black, 2 red. But 2 is the root, so colour it black. Result: 2(B), 1(B), 3(B), 4(R).

Insert 5: Insert as right child of 4, colour red. Parent 4 is red, uncle NIL (black). Case 3 (right-right): double rotation at 3. Left-rotate 4, then left-rotate 3. This performs exactly 2 rotations.

If you get this wrong, revise: Section 1.1.1.

Solution to Problem 8

Array: A=[7,2,5,1,8,3,6,4]A = [7, 2, 5, 1, 8, 3, 6, 4] (1-indexed).

Segment tree for range minimum:

[1,8]: min=1
/ \
[1,4]: min=1 [5,8]: min=3
/ \ / \
[1,2]:min=2 [3,4]:min=1 [5,6]:min=3 [7,8]:min=4
/ \ / \ / \ / \
[1]:7 [2]:2 [3]:5 [4]:1 [5]:8 [6]:3 [7]:6 [8]:4

Query min(A[2..5])\min(A[2..5]):

  • Root [1,8]: need [2,5]. Split.
  • Left [1,4]: need [2,4]. Split.
  • [1,2]: need [2,2]. Split. -> [2]:2.
  • [3,4]: contained in [2,4]. -> min=1.
  • min so far: min(2, 1) = 1.
  • Right [5,8]: need [5,5]. Split.
  • [5,6]: need [5,5]. Split. -> [5]:8.
  • min so far: min(1, 8) = 1.

Result: min(A[2..5])=1\min(A[2..5]) = 1. Verify: min(2,5,1,8)=1\min(2, 5, 1, 8) = 1. Correct.

Update A[3]=0A[3] = 0: Update leaf [3] to 0. Then [3,4]: min(0,1)=0\min(0, 1) = 0. Then [1,4]: min(2,0)=0\min(2, 0) = 0. Then [1,8]: min(0,3)=0\min(0, 3) = 0.

Query min(A[1..6])\min(A[1..6]):

  • Root [1,8]: need [1,6]. Split.
  • Left [1,4]: contained in [1,6]. -> min=0.
  • Right [5,8]: need [5,6]. Split. -> [5,6]: min=3.
  • min(0, 3) = 0.

Result: min(A[1..6])=0\min(A[1..6]) = 0.

If you get this wrong, revise: Section 5.2.

Solution to Problem 10

String: “mississippi"(" (N = 12$).

All suffixes (with starting index): 0: mississippi1:ississippi1: ississippi 2: ssissippi3:sissippi3: sissippi 4: issippi5:ssippi5: ssippi 6: sippi7:ippi7: ippi 8: ppi9:pi9: pi 10: i11:11:

Sorted suffixes: 11: 10:i10: i 7: ippi4:issippi4: issippi 1: ississippi9:pi9: pi 8: ppi6:sippi6: sippi 3: sissippi5:ssippi5: ssippi 2: ssissippi0:mississippi0: mississippi

SA = [11, 10, 7, 4, 1, 9, 8, 6, 3, 5, 2, 0]

LCP array: LCP[0] = 0 LCP[1] = 0 (LCP("","i", "i")) LCP[2] = 1 (LCP(“i","ippi", "ippi”)) LCP[3] = 1 (LCP(“ippi","issippi", "issippi”)) LCP[4] = 4 (LCP(“issippi","ississippi", "ississippi”)) LCP[5] = 0 (LCP(“ississippi","pi", "pi”)) LCP[6] = 0 (LCP(“pi","ppi", "ppi”)) LCP[7] = 0 (LCP(“ppi","sippi", "sippi”)) LCP[8] = 2 (LCP(“sippi","sissippi", "sissippi”)) LCP[9] = 2 (LCP(“sissippi","ssippi", "ssippi”)) LCP[10] = 4 (LCP(“ssippi","ssissippi", "ssissippi”)) LCP[11] = 1 (LCP(“ssissippi","mississippi", "mississippi”))

Maximum LCP = 4 (at index 4, between “issippi"and"ississippi" and "ississippi”, giving “issi”; and at index 10, between “ssippi"and"ssissippi" and "ssissippi”, giving “ssip”).

Longest repeated substring: “issi” (length 4) or “ssip” (length 4). Both are valid answers.

If you get this wrong, revise: Section 6.3.

Common Pitfalls

  • Confusing average and worst-case complexity. Quicksort: average O(nlogn)O(n \log n), worst case O(n2)O(n^2). Fix: Always state which case; worst case is the guaranteed upper bound.
  • Wrong BST deletion. Deleting a node with two children requires finding the in-order successor (or predecessor), not removing the node. Fix: Replace the node with its in-order successor, then delete the successor from its original position.
  • Confusing amortised and worst-case analysis. Amortised: average cost per operation over a sequence. Worst-case: maximum cost of a single operation. Fix: Dynamic array: O(1)O(1) amortised append, O(n)O(n) worst case (when resizing).

Worked Examples

Problem. In a sorted array of 1000 elements, how many comparisons does binary search need in the worst case?

Solution. Binary search eliminates half the remaining elements each step. Worst case: log21000=10\lceil \log_2 1000 \rceil = 10 comparisons.

\blacksquare

Example 2: Hash table collision

Problem. A hash table of size 7 uses linear probing. Insert keys 10, 22, 31, 4, 15, 28, 17 with hash function h(k)=kmod7h(k) = k \bmod 7.

Solution. h(10)=3h(10)=3: slot 3. h(22)=1h(22)=1: slot 1. h(31)=3h(31)=3: collision, slot 4. h(4)=4h(4)=4: collision, slot 5. h(15)=1h(15)=1: collision, slot 2. h(28)=0h(28)=0: slot 0. h(17)=3h(17)=3: collision, slots 4, 5, 6.

\blacksquare

Summary

  • Sorting: merge sort O(nlogn)O(n \log n) guaranteed; quicksort O(nlogn)O(n \log n) average; heap sort O(nlogn)O(n \log n) in-place.
  • Searching: linear O(n)O(n), binary O(logn)O(\log n), hash O(1)O(1) average.
  • Data structures: arrays, linked lists, stacks, queues, trees, hash tables, heaps, graphs.
  • Amortised analysis: dynamic arrays O(1)O(1) amortised append; splay trees O(logn)O(\log n) amortised.

Cross-References

TopicSiteLink
[Data Structures]A-LevelView
[Data Structures]IBView
[Data Structures]UniversityView