Problem Set
Problems 1—3: Introduction and Data Models
Explain the three levels of the ANSI-SPARC architecture with an example. Show how logical data independence allows a new column to be added to the conceptual schema without modifying existing external views.
Compare the relational and graph data models. Give two concrete scenarios where a graph database would be preferable to a relational database, explaining why.
A university uses a relational database for student records and a document store for course materials. Discuss the benefits and challenges of this polyglot persistence approach.
Problems 4—7: Relational Model and Algebra
Given relation
R(A, B, C, D, E)with FDs : (a) Find all candidate keys. (b) Compute the attribute closure . (c) Compute and .Given
R(A, B, C)with tuples andS(B, C)with tuples : (a) Compute . (b) Compute . (c) Compute .Express the following query in relational algebra: “Find the names of students who have enrolled in at least two courses taught by the same instructor.”
Prove that the cross product is commutative: . Prove that the cross product is associative: .
Problems 8—10: SQL
Write a recursive SQL query that computes the total cost of all parts (direct and indirect) required to assemble product
P100Given aBOM(parent_part, child_part, quantity)table.Write a SQL query that, for each department, returns the student with the highest GPA, their GPA, and the difference between their GPA and the department average. Use window functions only (no subqueries).
Consider the Python code:
query = f"DELETE FROM Student WHERE dept = "{dept}' AND gpa < {min_gpa}"(a) Identify the SQL injection vulnerability. (b) Show an input that exploits it. (c) Rewrite using parameterised queries.
Problems 11—14: Normalisation
Given
R(A, B, C, D, E)with FDs : (a) Find all candidate keys. (b) Determine the highest normal form of . Justify your answer. (c) If is not in BCNF, decompose it into BCNF relations. Verify each decomposition is lossless.Given
R(A, B, C, D)with FDs : (a) Find all candidate keys. (b) Decompose into BCNF. Is the decomposition dependency-preserving? (c) Show that is actually already in BCNF.Prove Theorem 4.5: every relation in 4NF is in BCNF.
Given `R(A, B, C, D)A \twoheadrightarrow BA \twoheadrightarrow CA \to DR$ in 4NF? If not, decompose into 4NF.
Problems 15—16: Indexing
Insert the keys 8, 5, 1, 7, 3, 12, 9, 6 into a B+ tree of order 3 (maximum 2 keys per node). Show the tree after each operation that causes a split. How many leaf-level and internal-level splits occur in total?
A table has 100000 tuples stored in 5000 pages. The buffer pool has 101 pages. Compare the estimated I/O cost of answering
SELECT * FROM T WHERE A = 42using: (a) A full table scan. (b) A B+ tree index on of height 3. (c) A static hash index on with 500 buckets and average chain length 0.4. State any assumptions you make.
Problems 17—18: Transaction Management
For each schedule below, determine if it is conflict-serialisable by drawing the precedence graph. If serialisable, give the equivalent serial order. (a) (b) (c)
Explain the difference between strict 2PL and rigorous 2PL. Give a schedule that is allowed by basic 2PL but not by strict 2PL. Explain why strict 2PL is preferred in practice.
Problems 19—20: Distributed Databases
Explain the two-phase commit protocol. Describe what happens in each of the following failure scenarios: (a) A participant crashes after voting
YESbut before receiving the coordinator’s decision. (b) The coordinator crashes after sendingCOMMITto some but not all participants. (c) The coordinator crashes after phase 1 (all votes received) but before sending any decision.A distributed key-value store uses replicas with quorum settings and . (a) Show that any read is guaranteed to see the latest write. (b) What is the maximum number of node failures the system can tolerate while still serving both reads and writes? (c) How does the system behave during a network partition that isolates 3 nodes from the remaining 4?