Skip to content

Normalisation

4.1 Functional Dependencies

A functional dependency XYX \to Y holds on relation RR if for any two tuples t1,t2t_1, t_2 in RR t1[X]=t2[X]t_1[X] = t_2[X] implies t1[Y]=t2[Y]t_1[Y] = t_2[Y].

Armstrong”s axioms:

  1. Reflexivity: If YXY \subseteq XThen XYX \to Y.
  2. Augmentation: If XYX \to YThen XZYZXZ \to YZ.
  3. Transitivity: If XYX \to Y and YZY \to ZThen XZX \to Z.

Derived rules:

  1. Union: If XYX \to Y and XZX \to ZThen XYZX \to YZ.
  2. Decomposition: If XYZX \to YZThen XYX \to Y and XZX \to Z.
  3. Pseudotransitivity: If XYX \to Y and YWZYW \to ZThen XWZXW \to Z.

Attribute closure. X+X^+ is the set of all attributes functionally determined by XX. Computed By starting with XX and repeatedly applying Armstrong’s axioms until no new attributes can be added.

Theorem 4.1. XYX \to Y holds if and only if YX+Y \subseteq X^+.

4.2 Normal Forms

First Normal Form (1NF). All attribute values are atomic (indivisible). No repeating groups or Nested relations.

Second Normal Form (2NF). In 1NF, and no non-prime attribute is partially dependent on any Candidate key. Equivalently: every non-prime attribute is fully functionally dependent on every Candidate key.

A partial dependency is ABA \to B where AA is a proper subset of a candidate key and BB is Non-prime.

Third Normal Form (3NF). In 2NF, and for every non-trivial FD XAX \to A in RREither XX is a Superkey or AA is a prime attribute.

A prime attribute is an attribute that belongs to some candidate key.

Boyce-Codd Normal Form (BCNF). For every non-trivial FD XAX \to A in RR, XX is a superkey.

Theorem 4.2. Every relation in BCNF is in 3NF, but the converse does not hold.

Proof. BCNF requires XX to be a superkey for every non-trivial FD XAX \to A. 3NF allows XX to be A superkey or AA to be prime. Since the BCNF condition is stricter, every BCNF relation is in 3NF. For the converse, consider R(A,B,C)R(A, B, C) with FDs ABCAB \to C and CBC \to B. Candidate keys: AB\\{AB\\}, AC\\{AC\\}. CBC \to B does not violate 3NF (BB is prime) but violates BCNF (CC is not a Superkey). \blacksquare

4.3 Decomposition

A decomposition of RR into R1,R2,,RkR_1, R_2, \ldots, R_k must satisfy:

  1. Lossless-join: R=R1R2RkR = R_1 \bowtie R_2 \bowtie \cdots \bowtie R_k. No information is lost.
  2. Dependency preservation: Every FD in the original schema is implied by the FDs in the decomposed schemas.

Theorem 4.3 (Lossless-join test). A decomposition of RR into R1,R2R_1, R_2 is lossless if and Only if R1R2R1R_1 \cap R_2 \to R_1 or R1R2R2R_1 \cap R_2 \to R_2.

Proof. Let rr be an instance of RR and let r1=πR1(r)r_1 = \pi_{R_1}(r), r2=πR2(r)r_2 = \pi_{R_2}(r). We must Show r=r1r2r = r_1 \bowtie r_2 under the given condition. Since r1r_1 and r2r_2 are projections of rr Every tuple in r1r2r_1 \bowtie r_2 agrees with some tuple of rr on every attribute. It suffices to show That no spurious tuple is produced. Suppose (t1,t2)r1r2(t_1, t_2) \in r_1 \bowtie r_2 where t1r1t_1 \in r_1 and t2r2t_2 \in r_2. Since t1t_1 and t2t_2 agree on R1R2R_1 \cap R_2And by the condition R1R2R1R_1 \cap R_2 \to R_1 (WLOG), there exists TrT' \in r with T[R1R2]=t1[R1R2]=t2[R1R2]T'[R_1 \cap R_2] = t_1[R_1 \cap R_2] = t_2[R_1 \cap R_2]. Since T[R1R2]=t1[R1R2]T'[R_1 \cap R_2] = t_1[R_1 \cap R_2] and R1R2R1R_1 \cap R_2 \to R_1, we have t[R1]=t1[R1]t'[R_1] = t_1[R_1]. Similarly t[R2]=t2[R2]t'[R_2] = t_2[R_2]. Thus (t1,t2)(t_1, t_2) corresponds to a real tuple trt' \in rSo no spurious tuple exists. \blacksquare

Dependency preservation. Let FF be the set of FDs for RR. The decomposition R1,,RkR_1, \ldots, R_k Preserves dependencies if the closure of i=1kFi+\bigcup_{i=1}^{k} F_i^+ (where FiF_i is the restriction Of FF to RiR_i) equals F+F^+. In practice, we check that every FD in a minimal cover of FF can be Tested within a single RiR_i.

Dependency preservation is important for efficient constraint checking: when updating RiR_iThe DBMS can verify relevant FDs locally without joining all decomposed relations.

4.4 Normalisation Examples

Example 1. Enrolment(StudentID, CourseID, StudentName, Dept, Grade) with FDs: StudentIDStudentName\mathrm{StudentID} \to \mathrm{StudentName}, StudentIDDept\mathrm{StudentID} \to \mathrm{Dept} StudentID,CourseIDGrade\\{\mathrm{StudentID}, \mathrm{CourseID}\\} \to \mathrm{Grade}.

  • Candidate key: StudentID,CourseID\\{\mathrm{StudentID}, \mathrm{CourseID}\\}.
  • 1NF: satisfied (atomic values).
  • 2NF violation: StudentIDStudentName\mathrm{StudentID} \to \mathrm{StudentName} is a partial dependency (StudentID is a proper subset of the key).
  • Decompose: Student(StudentID, StudentName, Dept) and Enrolment(StudentID, CourseID, Grade). Both are in 3NF and BCNF.

Example 2 (3NF but not BCNF). CourseOffering(Course, Instructor, Room, Time) with FDs: Course,TimeInstructor,Room\\{\mathrm{Course}, \mathrm{Time}\\} \to \\{\mathrm{Instructor}, \mathrm{Room}\\} InstructorRoom\mathrm{Instructor} \to \mathrm{Room}.

  • Candidate key: Course,Time\\{\mathrm{Course}, \mathrm{Time}\\}.
  • 3NF: InstructorRoom\mathrm{Instructor} \to \mathrm{Room} — Instructor is not a superkey, but Room is not prime. Wait — Room is not prime (not in any candidate key). So this actually violates 3NF too.

Let us correct: CourseOffering(Course, Instructor, Textbook) with FDs: CourseInstructor\mathrm{Course} \to \mathrm{Instructor} InstructorTextbook\mathrm{Instructor} \to \mathrm{Textbook}.

  • Candidate key: Course\\{\mathrm{Course}\\} (since Course determines everything transitively).
  • 3NF: InstructorTextbook\mathrm{Instructor} \to \mathrm{Textbook}. Instructor is not a superkey. Textbook is not prime. Violates 3NF.

Better example. Class(Course, Instructor, Student) with FDs: Course,StudentInstructor\\{\mathrm{Course}, \mathrm{Student}\\} \to \mathrm{Instructor} InstructorCourse\mathrm{Instructor} \to \mathrm{Course}.

  • Candidate keys: Course,Student\\{\mathrm{Course}, \mathrm{Student}\\}, Instructor,Student\\{\mathrm{Instructor}, \mathrm{Student}\\}.
  • 3NF check for InstructorCourse\mathrm{Instructor} \to \mathrm{Course}: Instructor is not a superkey. But Course is prime (in candidate key Course,Student\\{\mathrm{Course}, \mathrm{Student}\\}). So 3NF is satisfied.
  • BCNF check: InstructorCourse\mathrm{Instructor} \to \mathrm{Course} violates BCNF (Instructor is not a superkey).

Decompose: Teaches(Instructor, Course) and Attends(Instructor, Student). This is lossless (Instructor\mathrm{Instructor} is common and InstructorCourse\mathrm{Instructor} \to \mathrm{Course} holds in Teaches). But the dependency Course,StudentInstructor\\{\mathrm{Course}, \mathrm{Student}\\} \to \mathrm{Instructor} is Not preserved.

Theorem 4.4. Not every relation can be decomposed into BCNF while preserving dependencies. 3NF Is the strongest normal form guaranteeing dependency-preserving, lossless-join decomposition.

Worked Example 4.1: BCNF Decomposition Algorithm

Relation: R(A,B,C,D,E)R(A, B, C, D, E) with FDs F=AB,BCE,EDAF = \\{A \to B, BC \to E, ED \to A\\}.

Step 1. Find candidate keys. Compute A+=A,BA^+ = \\{A, B\\}. BC+=B,C,E,D,A=A,B,C,D,EBC^+ = \\{B, C, E, D, A\\} = \\{A, B, C, D, E\\}. So BCBC is a candidate key. Check EDED: ED+=E,D,A,B,C=A,B,C,D,EED^+ = \\{E, D, A, B, C\\} = \\{A, B, C, D, E\\}. So EDED is Also a candidate key. Prime attributes: B,C,D,EB, C, D, E.

Step 2. Check BCNF. ABA \to B: AA is not a superkey (since A+RA^+ \neq R). Violates BCNF. Decompose on ABA \to B: R1=A,BR_1 = \\{A, B\\} and R2=A,C,D,ER_2 = \\{A, C, D, E\\}.

Step 3. R1=A,BR_1 = \\{A, B\\} is in BCNF (FD ABA \to B with AA as superkey).

For R2=A,C,D,ER_2 = \\{A, C, D, E\\} with FDs restricted to R2R_2: BCEBC \to E involves BB which is not in R2R_2 So it is dropped. EDAED \to A holds. Also, ABA \to B involves BB not in R2R_2So it is dropped. Check for implied FDs: ABA \to B in RR implies ACBCAC \to BC (augmentation), so ACEAC \to E in RR (since BCEBC \to E). But ACAC is not a superkey of R2R_2… Wait, let us recheck.

Actually, from BCEBC \to E and the fact that BB is not in R2R_2We cannot directly use BCEBC \to E In R2R_2. We should compute the projection of FF onto R2R_2: F2=EDA,AF_2 = \\{ED \to A, A \to \varnothing\\}. So the only non-trivial FD in R2R_2 is EDAED \to A. Is EDED a superkey of R2R_2? ED+=E,D,AA,C,D,EED^+ = \\{E, D, A\\} \neq \\{A, C, D, E\\} (missing CC). So EDED is not a superkey of R2R_2.

Hmm, we need to be more careful. Let us check if CC is determined by EDED in the original RR. ED+ED^+ in R=E,D,A,BR = \\{E, D, A, B\\}Which does not include CC. So CC is not determined.

This means R2=A,C,D,ER_2 = \\{A, C, D, E\\} has no non-trivial FDs that hold (other than keys determining all Attributes). Check: candidate keys of R2R_2 must be superkeys. Since EDAED \to A holds but EDED does Not determine CCWe need EDCEDC as a key: EDC+=E,D,C,A,B=REDC^+ = \\{E, D, C, A, B\\} = R (all of RR). So in R2R_2, EDCEDC is a candidate key (since it determines all attributes of R2R_2: EDCAEDC \to A and AA \to \varnothing in R2R_2, so EDC+=A,C,D,EEDC^+ = \\{A, C, D, E\\}). R2R_2 is in BCNF since the only non-trivial FD is EDAED \to A and we need to check if EDED is a superkey of R2R_2. Since ED+R2=A,D,ER2ED^+ \cap R_2 = \\{A, D, E\\} \neq R_2, EDED is not a superkey.

But wait — there are no other non-trivial FDs in R2R_2. The only one is EDAED \to AWhich violates BCNF. Decompose: R2a=E,D,AR_{2a} = \\{E, D, A\\} and R2b=A,C,D,EE,D,A=CR_{2b} = \\{A, C, D, E\\} \setminus \\{E, D, A\\} = \\{C\\}.

Hmm, C\\{C\\} alone is a relation with no non-trivial FDs, so it is in BCNF. R2a=E,D,AR_{2a} = \\{E, D, A\\} With EDAED \to A: EDED is a superkey (it determines all three attributes). BCNF.

Final decomposition: R1=A,BR_1 = \\{A, B\\}, R2a=A,D,ER_{2a} = \\{A, D, E\\}, R2b=CR_{2b} = \\{C\\}.

Lossless check: R1R2a=AR_1 \cap R_{2a} = \\{A\\}, ABA \to B (from FF), so R1R2aR_1 \bowtie R_{2a} is lossless. R2aR2b=R_{2a} \cap R_{2b} = \varnothing… This is problematic. R2b=CR_{2b} = \\{C\\} shares no attributes with The others.

The issue is that CC is a “dangling” attribute. This is correct — CC appears only in the key BCBC Of the original relation but is not functionally determined by anything except the full key. The Decomposition is technically correct but R2b=CR_{2b} = \\{C\\} by itself cannot be joined losslessly with The others.

This illustrates a limitation of BCNF decomposition: it may produce a relation with no common Attributes, making lossless join impossible to verify with the pairwise test. In practice, the 3NF Synthesis algorithm avoids this issue.

:::caution Common Pitfall Do not confuse partial dependency (2NF violation) with transitive dependency (3NF violation). A Partial dependency involves a proper subset of a candidate key determining a non-prime attribute. A transitive dependency involves a non-key attribute determining another non-prime attribute.

4.5 Multivalued Dependencies and 4NF

A multivalued dependency (MVD) XYX \twoheadrightarrow Y holds on relation RR if for any two Tuples t1,t2Rt_1, t_2 \in R with t1[X]=t2[X]t_1[X] = t_2[X]There exists a tuple t3Rt_3 \in R such that:

  • t3[X]=t1[X]t_3[X] = t_1[X]
  • t3[Y]=t1[Y]t_3[Y] = t_1[Y]
  • t3[RXY]=t2[RXY]t_3[R - X - Y] = t_2[R - X - Y]

: XX determines YY independently of RXYR - X - Y.

Trivial MVDs: XYX \twoheadrightarrow Y where YXY \subseteq X or XY=RX \cup Y = R.

Fourth Normal Form (4NF). RR is in 4NF if for every non-trivial MVD XYX \twoheadrightarrow Y That holds on RR, XX is a superkey.

Theorem 4.5. Every relation in 4NF is in BCNF.

Proof. Every FD XYX \to Y implies the MVD XYX \twoheadrightarrow Y. In 4NF, every non-trivial MVD XYX \twoheadrightarrow Y requires XX to be a superkey. Therefore, every non-trivial FD XYX \to Y Also requires XX to be a superkey, which is the BCNF condition. \blacksquare

4NF decomposition. Given RR with MVD XYX \twoheadrightarrow Y where XX is not a superkey, Decompose into R1=XYR_1 = X \cup Y and R2=RYR_2 = R - Y. The decomposition is lossless-join.

Worked Example 4.2: Decomposition to 4NF

Relation: CourseInstructor(Course, Instructor, Textbook) where each course can have multiple Instructors and multiple textbooks, independently.

MVDs: CourseInstructor\mathrm{Course} \twoheadrightarrow \mathrm{Instructor} CourseTextbook\mathrm{Course} \twoheadrightarrow \mathrm{Textbook}.

Sample data:

CourseInstructorTextbook
CS101SmithDB Systems
CS101LeeDB Systems
CS101SmithAlgorithm Design
CS101LeeAlgorithm Design

The redundancy is clear: each instructor-textbook pair is repeated for each course.

4NF check: CourseInstructor\mathrm{Course} \twoheadrightarrow \mathrm{Instructor} is non-trivial, and Course\mathrm{Course} is not a superkey. Violates 4NF.

Decompose:

  • CI(Course, Instructor) with MVD CourseInstructor\mathrm{Course} \twoheadrightarrow \mathrm{Instructor}
  • CT(Course, Textbook) with MVD CourseTextbook\mathrm{Course} \twoheadrightarrow \mathrm{Textbook}

Both are in 4NF (the determining attribute Course is a candidate key in each).

:::