Skip to content

Relational Model

2.1 Relations, Tuples, Attributes

A relation RR over attributes A1,,AnA_1, \ldots, A_n is a set of tuples (a1,,an)(a_1, \ldots, a_n) where Each aia_i is drawn from the domain of AiA_i. A relation is a subset of D1×D2××DnD_1 \times D_2 \times \cdots \times D_n.

Properties of relations:

  • Each relation has a unique name.
  • Each attribute has a unique name within a relation.
  • Tuples are unordered (a relation is a set).
  • All tuples are distinct (no duplicates in a mathematical relation, though SQL tables allow them).
  • Attributes are unordered.

2.2 Keys

Let RR be a relation schema.

  • Superkey: A set of attributes KRK \subseteq R such that no two distinct tuples agree on all attributes in KK.
  • Candidate key: A minimal superkey (no proper subset is also a superkey).
  • Primary key: One of the candidate keys, chosen by the designer.
  • Foreign key: An attribute (or set) FKFK in relation R1R_1 that references the primary key PKPK of relation R2R_2. Enforces referential integrity: every value of FKFK must appear as a value of PKPK in R2R_2Or be NULL.

Example. Student(ID, Name, Email). ID is a candidate key (unique). If Email is also unique, It is another candidate key. One is chosen as the primary key.

Theorem 2.1. Every relation has at least one candidate key (possibly the entire set of attributes).

2.3 Relational Algebra

Relational algebra provides a formal query language based on operations on relations.

Selection σθ(R)\sigma_{\theta}(R): Return tuples from RR satisfying condition θ\theta.

σdept="CS(Student)\sigma_{\mathrm{dept} = \mathrm{"CS'}(\mathrm{Student})}

Projection πA1,,Ak(R)\pi_{A_1, \ldots, A_k}(R): Return a relation containing only attributes A1,,AkA_1, \ldots, A_k.

πname,gpa(Student)\pi_{\mathrm{name}, \mathrm{gpa}(\mathrm{Student})}

Union RSR \cup S: All tuples in RR or SS (both must be union-compatible: same arity and Attribute domains).

Difference RSR - S: Tuples in RR but not in SS.

Cartesian product R×SR \times S: All pairs (r,s)(r, s) where rRr \in R and sSs \in S.

Rename ρx(R)\rho_{x}(R): Rename relation RR to xx.

Natural join RSR \bowtie S: Combine tuples from RR and SS that agree on all common attributes.

RS=πRS(σR.common=S.common(R×S))R \bowtie S = \pi_{R \cup S}(\sigma_{R.\mathrm{common} = S.\mathrm{common}(R \times S))}

Theta join RθSR \bowtie_{\theta} S: σθ(R×S)\sigma_{\theta}(R \times S).

Equi-join RR.A=S.BSR \bowtie_{R.A = S.B} S: A theta join where θ\theta is an equality on specific Attributes. Keeps both join columns.

Left outer join RleftSR \bowtie_{\mathrm{left} S}: All tuples from RRMatched with SS where possible; NULL-padded otherwise.

Right outer join RrightSR \bowtie_{\mathrm{right} S}: All tuples from SSMatched with RR.

Full outer join RfullSR \bowtie_{\mathrm{full} S}: All tuples from both RR and SS.

Division R÷SR \div S: Tuples tt in πRS(R)\pi_{R-S}(R) such that for every tuple sSs \in S, (t,s)R(t, s) \in R.

R÷S=πRS(R)πRS((πRS(R)×S)R)R \div S = \pi_{R-S}(R) - \pi_{R-S}\Bigl(\bigl(\pi_{R-S}(R) \times S\bigr) - R\Bigr)

Example. Find students who have taken all courses:

πsid,cid(Takes)÷πcid(Course)\pi_{\mathrm{sid}, \mathrm{cid}(\mathrm{Takes}) \div \pi_{\mathrm{cid}(\mathrm{Course})}}

Worked Example 2.1: Complex Relational Algebra Query

Query: Find the names of students who have enrolled in every course offered by the CS department.

Schema: Student(sid, name, dept)``Course(cid, title, dept)``Takes(sid, cid, semester).

Step 1. Get CS course IDs:

CCS=πcid(σdept=CS(Course))C_{\mathrm{CS} = \pi_{\mathrm{cid}\bigl(\sigma_{\mathrm{dept} = \mathrm{'CS'}(\mathrm{Course})\bigr)}}}

Step 2. Get student-course pairs from enrolments:

T=πsid,cid(Takes)T = \pi_{\mathrm{sid}, \mathrm{cid}(\mathrm{Takes})}

Step 3. Students who have taken all CS courses (division):

Sall=T÷CCSS_{\mathrm{all} = T \div C_{\mathrm{CS}}}

Step 4. Get names:

πname(SallStudent)\pi_{\mathrm{name}(S_{\mathrm{all} \bowtie \mathrm{Student})}}

Combined:

πname((πsid,cid(Takes)÷πcid(σdept=CS(Course)))Student)\pi_{\mathrm{name}\Bigl(\bigl(\pi_{\mathrm{sid}, \mathrm{cid}(\mathrm{Takes}) \div \pi_{\mathrm{cid}(\sigma_{\mathrm{dept}=\mathrm{'CS'}(\mathrm{Course}))\bigr) \bowtie \mathrm{Student}\Bigr)}}}}

Worked Example 2.2: Relational Algebra with Outer Joins

Query: Find all students and the number of CS courses they have taken, including students who have Taken no CS courses (count should be 0).

Step 1. Filter enrolments to CS courses:

ECS=πsid,cid(Takesσdept=CS(Course))E_{\mathrm{CS} = \pi_{\mathrm{sid}, \mathrm{cid}\bigl(\mathrm{Takes} \bowtie \sigma_{\mathrm{dept}=\mathrm{'CS'}(\mathrm{Course})\bigr)}}}

Step 2. Left outer join with Student to include those with no CS courses:

Result=StudentleftECS\mathrm{Result} = \mathrm{Student} \bowtie_{\mathrm{left} E_{\mathrm{CS}}}

Note: aggregation over outer join results handles NULL values (they are excluded from COUNT).

In SQL, this translates directly to:

SELECT S.sid, S.name, COUNT(E.cid) AS num_cs_courses
FROM Student S
LEFT JOIN Takes T ON S.sid = T.sid
LEFT JOIN Course C ON T.cid = C.cid AND C.dept = 'CS'
GROUP BY S.sid, S.name;

2.4 Relational Calculus

Tuple relational calculus. A query has the form tP(t)\\{t \mid P(t)\\} where tt is a tuple variable And PP is a well-formed formula. The formula is built from:

  • Atoms: tRt \in R (tuple tt is in relation RR), t[A]ops[A]t[A] \mathbin{\mathrm{op} s[A]} (comparison), t[A]opct[A] \mathbin{\mathrm{op} c} (comparison with constant), where op=,,<,>,,\mathrm{op} \in \\{=, \neq, \lt, \gt, \le, \ge\\}.
  • Logical connectives: \land (and), \lor (or), ¬\lnot (not).
  • Quantifiers: t\exists t (there exists), t\forall t (for all).

{tsTakes(t[name]=s[name]s[grade]=A)}\{t \mid \exists s \in \mathrm{Takes}(t[\mathrm{name}] = s[\mathrm{name}] \land s[\mathrm{grade}] = \mathrm{'A'})\}

Domain relational calculus. Variables range over individual attribute domains (not entire tuples). A query has the form x1,,xkP(x1,,xk)\\{\langle x_1, \ldots, x_k \rangle \mid P(x_1, \ldots, x_k)\\}.

{ns,g  (Takes(s,CS101,g)Student(s,n,)g=A)}\{ \langle n \rangle \mid \exists s, g \;(\mathrm{Takes}(s, \mathrm{'CS101'}, g) \land \mathrm{Student}(s, n, \ldots) \land g = \mathrm{'A'})\}

Safety. A calculus expression is safe if it yields a finite relation. The expression t¬(tR)\\{t \mid \lnot(t \in R)\\} is unsafe (it includes every tuple not in RRAn infinite set). We Restrict variables to domains of the relations appearing in the query.

Theorem 2.2 (Codd). Relational algebra and relational calculus (both tuple and domain) are Equally expressive: every query expressible in one is expressible in the other.

Worked Example 2.3: Tuple Relational Calculus

Query: Find students who have taken no CS course.

Using tuple relational calculus:

{ttStudent¬sTakes(s[sid]=t[sid]cCourse(c[cid]=s[cid]c[dept]=CS)}\{t \mid t \in \mathrm{Student} \land \lnot \exists s \in \mathrm{Takes}\bigl(s[\mathrm{sid}] = t[\mathrm{sid}] \land \exists c \in \mathrm{Course}(c[\mathrm{cid}] = s[\mathrm{cid}] \land c[\mathrm{dept}] = \mathrm{'CS'}\bigr)\}

Translation to relational algebra:

πname(Student)πname(StudentTakesσdept=CS(Course))\pi_{\mathrm{name}(\mathrm{Student}) - \pi_{\mathrm{name}(\mathrm{Student} \bowtie \mathrm{Takes} \bowtie \sigma_{\mathrm{dept}=\mathrm{'CS'}(\mathrm{Course}))}}}

2.5 Equivalence Rules for Relational Algebra

The following rules allow the optimiser to transform queries without changing their meaning. Each rule States that two expressions produce the same relation for any input relations.

Rule 1 (Cascade of selections). σθ1θ2(R)σθ1(σθ2(R))\sigma_{\theta_1 \land \theta_2}(R) \equiv \sigma_{\theta_1}(\sigma_{\theta_2}(R)).

Proof. A tuple satisfies θ1θ2\theta_1 \land \theta_2 if and only if it satisfies both θ1\theta_1 and θ2\theta_2. Applying σθ2\sigma_{\theta_2} first removes tuples failing θ2\theta_2; then σθ1\sigma_{\theta_1} removes tuples failing θ1\theta_1 among the remainder. The result is exactly the Tuples satisfying both conditions. \blacksquare

Rule 2 (Commutativity of selections). σθ1(σθ2(R))σθ2(σθ1(R))\sigma_{\theta_1}(\sigma_{\theta_2}(R)) \equiv \sigma_{\theta_2}(\sigma_{\theta_1}(R)).

Proof. Immediate from the commutativity of logical conjunction (θ1θ2θ2θ1\theta_1 \land \theta_2 \equiv \theta_2 \land \theta_1). \blacksquare

Rule 3 (Selection pushdown through cross product). If θ\theta involves only attributes of RR Then σθ(R×S)σθ(R)×S\sigma_{\theta}(R \times S) \equiv \sigma_{\theta}(R) \times S.

Proof. For each pair (r,s)(r, s) with rRr \in R and sSs \in SThe condition θ\theta depends only on rr. Filtering (r,s)(r, s) by θ\theta on R×SR \times S is equivalent to first filtering RR by θ\theta And then forming the cross product, since ss does not affect the result of θ\theta. \blacksquare

Rule 4 (Selection pushdown through join). If θ\theta involves only attributes of RRThen σθ(RS)σθ(R)S\sigma_{\theta}(R \bowtie S) \equiv \sigma_{\theta}(R) \bowtie S.

Proof. The join RSR \bowtie S combines matching pairs from RR and SS. Applying σθ\sigma_{\theta} After the join filters these pairs by θ\theta on RR‘s attributes. Filtering RR first removes Non-matching RR-tuples before the join, yielding the same final set of pairs. \blacksquare

Rule 5 (Commutativity of joins). RSSRR \bowtie S \equiv S \bowtie R.

Proof. The natural join combines tuples agreeing on common attributes. This relation is symmetric In RR and SS. \blacksquare

Rule 6 (Associativity of joins). (RS)TR(ST)(R \bowtie S) \bowtie T \equiv R \bowtie (S \bowtie T).

Proof. Both expressions produce tuples from R×S×TR \times S \times T that agree on all attributes Shared by any pair of the three relations. \blacksquare

Rule 7 (Projection pushdown). If L1L2L_1 \subseteq L_2 and L2L_2 includes all attributes used in Join conditions, then πL1(RS)πL1(πL2(R)πL2(S))\pi_{L_1}(R \bowtie S) \equiv \pi_{L_1}(\pi_{L_2}(R) \bowtie \pi_{L_2}(S)).

Proof. Projecting after the join retains only attributes in L1L_1. Since L2L_2 includes all Attributes needed for the join, performing the join on projected versions of RR and SS yields the Same join result, from which L1L_1 is then projected. \blacksquare

These rules form the foundation of heuristic query optimisation (see Section 7).