Normalisation
4.1 Functional Dependencies
A functional dependency holds on relation if for any two tuples in implies .
Armstrong”s axioms:
- Reflexivity: If Then .
- Augmentation: If Then .
- Transitivity: If and Then .
Derived rules:
- Union: If and Then .
- Decomposition: If Then and .
- Pseudotransitivity: If and Then .
Attribute closure. is the set of all attributes functionally determined by . Computed By starting with and repeatedly applying Armstrong’s axioms until no new attributes can be added.
Theorem 4.1. holds if and only if .
4.2 Normal Forms
First Normal Form (1NF). All attribute values are atomic (indivisible). No repeating groups or Nested relations.
Second Normal Form (2NF). In 1NF, and no non-prime attribute is partially dependent on any Candidate key. Equivalently: every non-prime attribute is fully functionally dependent on every Candidate key.
A partial dependency is where is a proper subset of a candidate key and is Non-prime.
Third Normal Form (3NF). In 2NF, and for every non-trivial FD in Either is a Superkey or is a prime attribute.
A prime attribute is an attribute that belongs to some candidate key.
Boyce-Codd Normal Form (BCNF). For every non-trivial FD in , is a superkey.
Theorem 4.2. Every relation in BCNF is in 3NF, but the converse does not hold.
Proof. BCNF requires to be a superkey for every non-trivial FD . 3NF allows to be A superkey or to be prime. Since the BCNF condition is stricter, every BCNF relation is in 3NF. For the converse, consider with FDs and . Candidate keys: , . does not violate 3NF ( is prime) but violates BCNF ( is not a Superkey).
4.3 Decomposition
A decomposition of into must satisfy:
- Lossless-join: . No information is lost.
- Dependency preservation: Every FD in the original schema is implied by the FDs in the decomposed schemas.
Theorem 4.3 (Lossless-join test). A decomposition of into is lossless if and Only if or .
Proof. Let be an instance of and let , . We must Show under the given condition. Since and are projections of Every tuple in agrees with some tuple of on every attribute. It suffices to show That no spurious tuple is produced. Suppose where and . Since and agree on And by the condition (WLOG), there exists with . Since and , we have . Similarly . Thus corresponds to a real tuple So no spurious tuple exists.
Dependency preservation. Let be the set of FDs for . The decomposition Preserves dependencies if the closure of (where is the restriction Of to ) equals . In practice, we check that every FD in a minimal cover of can be Tested within a single .
Dependency preservation is important for efficient constraint checking: when updating The DBMS can verify relevant FDs locally without joining all decomposed relations.
4.4 Normalisation Examples
Example 1. Enrolment(StudentID, CourseID, StudentName, Dept, Grade) with FDs: , .
- Candidate key: .
- 1NF: satisfied (atomic values).
- 2NF violation: is a partial dependency (StudentID is a proper subset of the key).
- Decompose:
Student(StudentID, StudentName, Dept)andEnrolment(StudentID, CourseID, Grade). Both are in 3NF and BCNF.
Example 2 (3NF but not BCNF). CourseOffering(Course, Instructor, Room, Time) with FDs: .
- Candidate key: .
- 3NF: — Instructor is not a superkey, but Room is not prime. Wait — Room is not prime (not in any candidate key). So this actually violates 3NF too.
Let us correct: CourseOffering(Course, Instructor, Textbook) with FDs: .
- Candidate key: (since Course determines everything transitively).
- 3NF: . Instructor is not a superkey. Textbook is not prime. Violates 3NF.
Better example. Class(Course, Instructor, Student) with FDs: .
- Candidate keys: , .
- 3NF check for : Instructor is not a superkey. But Course is prime (in candidate key ). So 3NF is satisfied.
- BCNF check: violates BCNF (Instructor is not a superkey).
Decompose: Teaches(Instructor, Course) and Attends(Instructor, Student). This is lossless ( is common and holds in Teaches). But the dependency is Not preserved.
Theorem 4.4. Not every relation can be decomposed into BCNF while preserving dependencies. 3NF Is the strongest normal form guaranteeing dependency-preserving, lossless-join decomposition.
Worked Example 4.1: BCNF Decomposition Algorithm
Relation: with FDs .
Step 1. Find candidate keys. Compute . . So is a candidate key. Check : . So is Also a candidate key. Prime attributes: .
Step 2. Check BCNF. : is not a superkey (since ). Violates BCNF. Decompose on : and .
Step 3. is in BCNF (FD with as superkey).
For with FDs restricted to : involves which is not in So it is dropped. holds. Also, involves not in So it is dropped. Check for implied FDs: in implies (augmentation), so in (since ). But is not a superkey of … Wait, let us recheck.
Actually, from and the fact that is not in We cannot directly use In . We should compute the projection of onto : . So the only non-trivial FD in is . Is a superkey of ? (missing ). So is not a superkey of .
Hmm, we need to be more careful. Let us check if is determined by in the original . in Which does not include . So is not determined.
This means has no non-trivial FDs that hold (other than keys determining all Attributes). Check: candidate keys of must be superkeys. Since holds but does Not determine We need as a key: (all of ). So in , is a candidate key (since it determines all attributes of : and in , so ). is in BCNF since the only non-trivial FD is and we need to check if is a superkey of . Since , is not a superkey.
But wait — there are no other non-trivial FDs in . The only one is Which violates BCNF. Decompose: and .
Hmm, alone is a relation with no non-trivial FDs, so it is in BCNF. With : is a superkey (it determines all three attributes). BCNF.
Final decomposition: , , .
Lossless check: , (from ), so is lossless. … This is problematic. shares no attributes with The others.
The issue is that is a “dangling” attribute. This is correct — appears only in the key Of the original relation but is not functionally determined by anything except the full key. The Decomposition is technically correct but by itself cannot be joined losslessly with The others.
This illustrates a limitation of BCNF decomposition: it may produce a relation with no common Attributes, making lossless join impossible to verify with the pairwise test. In practice, the 3NF Synthesis algorithm avoids this issue.
:::caution Common Pitfall Do not confuse partial dependency (2NF violation) with transitive dependency (3NF violation). A Partial dependency involves a proper subset of a candidate key determining a non-prime attribute. A transitive dependency involves a non-key attribute determining another non-prime attribute.
4.5 Multivalued Dependencies and 4NF
A multivalued dependency (MVD) holds on relation if for any two Tuples with There exists a tuple such that:
: determines independently of .
Trivial MVDs: where or .
Fourth Normal Form (4NF). is in 4NF if for every non-trivial MVD That holds on , is a superkey.
Theorem 4.5. Every relation in 4NF is in BCNF.
Proof. Every FD implies the MVD . In 4NF, every non-trivial MVD requires to be a superkey. Therefore, every non-trivial FD Also requires to be a superkey, which is the BCNF condition.
4NF decomposition. Given with MVD where is not a superkey, Decompose into and . The decomposition is lossless-join.
Worked Example 4.2: Decomposition to 4NF
Relation: CourseInstructor(Course, Instructor, Textbook) where each course can have multiple Instructors and multiple textbooks, independently.
MVDs: .
Sample data:
| Course | Instructor | Textbook |
|---|---|---|
| CS101 | Smith | DB Systems |
| CS101 | Lee | DB Systems |
| CS101 | Smith | Algorithm Design |
| CS101 | Lee | Algorithm Design |
The redundancy is clear: each instructor-textbook pair is repeated for each course.
4NF check: is non-trivial, and is not a superkey. Violates 4NF.
Decompose:
CI(Course, Instructor)with MVDCT(Course, Textbook)with MVD
Both are in 4NF (the determining attribute Course is a candidate key in each).
:::