Mathematical proof techniques: direct proof, proof by contradiction, mathematical induction.
Single-variable calculus: limits, continuity, differentiation (for the matrix calculus later in the course).
Students who have completed an introductory proof-writing course (e.g. a Discrete Mathematics module) will have the necessary background. The definitions and proofs below are self-contained; the prerequisites are listed only to indicate the level of mathematical maturity expected.
1. Vectors and Vector Spaces
1.1 Definition of a Vector Space
A vector space over a field F ( R or C) is a set V equipped With two operations:
Vector addition: +:V×V→V
Scalar multiplication: ⋅:F×V→V
Satisfying the following axioms for all u,v,w∈V and all α,β∈F:
Commutativity: u+v=v+u
Associativity of addition: (u+v)+w=u+(v+w)
Additive identity: There exists 0∈V such that v+0=v
Additive inverse: For each vThere exists −v such that v+(−v)=0
Compatibility of scalar multiplication: α(βv)=(αβ)v
Identity element of scalar multiplication: 1⋅v=v
Distributivity over vector addition: α(u+v)=αu+αv
Distributivity over scalar addition: (α+β)v=αv+βv
Intuition. The abstract definition captures the algebraic structure shared by diverse objects: Geometric arrows, polynomials, functions, matrices. The axioms encode exactly what we need for Linear combinations to behave reasonably.
1.2 Examples
Example 1.Rn with component-wise addition and scalar multiplication is a vector space Over R.
Example 2. The set Pn of all polynomials of degree at most n with real coefficients, With the usual polynomial addition and scalar multiplication, is a vector space over R. Its dimension is n+1With standard basis {1,x,x2,…,xn}.
Example 3. The set C[a,b] of all continuous real-valued functions on [a,b]With point-wise Addition and scalar multiplication, is a vector space over R. This space is Infinite-dimensional.
Example 4. The set Mm×n(R) of all m×n real matrices is a Vector space over R.
Example 5 (Function spaces). The set F(R,R) of all functions f:R→R is a vector space over R under point-wise addition (f+g)(x)=f(x)+g(x) and scalar multiplication (αf)(x)=α⋅f(x). The spaces Ck(R) of k-times continuously differentiable functions and L2[a,b] of square-integrable functions are important subspaces of F(R,R).
Example 6 (Sequence spaces). The set ℓ2 of all real sequences (a1,a2,a3,…) With ∑n=1∞an2<∞ is a vector space over R. This is the Infinite-dimensional analogue of Rn and is fundamental in functional analysis.
1.3 Subspaces
A subspaceW of a vector space V is a subset W⊆V that is itself a vector space Under the same operations.
Theorem 1.1 (Subspace Criterion). A non-empty subset W⊆V is a subspace if and only If for all u,v∈W and all α∈F:
u+v∈W (closed under addition)
αu∈W (closed under scalar multiplication)
Proof. If W is a subspace, closure is immediate from the definition. Conversely, if W is Non-empty and closed under both operations, pick u∈W. Then −u=(−1)u∈W By closure under scalar multiplication, and u+(−u)=0∈W by closure Under addition. The remaining axioms are inherited from V. ■
Proposition 1.2 (Closure under Linear Combinations). If W is a subspace of VThen W is Closed under all finite linear combinations: for all v1,…,vk∈W and All α1,…,αk∈F
α1v1+α2v2+⋯+αkvk∈W
Proof. We proceed by induction on k. For k=1, α1v1∈W by closure under Scalar multiplication. Assume the result holds for k−1 vectors. Then
α1v1+⋯+αkvk=(α1v1+⋯+αk−1vk−1)+αkvk
By the inductive hypothesis, α1v1+⋯+αk−1vk−1∈WAnd αkvk∈W by closure under scalar multiplication. Their sum is in W by Closure under addition. ■
Example 7. The set of all solutions to the homogeneous equation Ax=0 forms a Subspace of RnCalled the null space of A.
1.4 Worked Example: Verifying Subspace Criteria
Problem. Determine whether each of the following subsets of R3 is a subspace.
(a) W1={(x,y,z)∈R3:x+2y−z=0}
(b) W2={(x,y,z)∈R3:x2+y2=1}
(c) W3={(x,y,z)∈R3:x=0andy=z}
Solution
(a) Let u=(x1,y1,z1) and v=(x2,y2,z2) be in W1So x1+2y1−z1=0 and x2+2y2−z2=0. Then
So αu∈W1. Since W1 is non-empty (e.g., 0∈W1), it is a subspace.
(b)W2 is not a subspace. For instance, (1,0,0)∈W2 since 12+02=1But 2⋅(1,0,0)=(2,0,0)∈/W2 since 22+02=4=1. So W2 is not closed Under scalar multiplication.
(c) Let u=(0,a,a) and v=(0,b,b) be in W3. Then u+v=(0,a+b,a+b)∈W3 and αu=(0,αa,αa)∈W3. Since (0,0,0)∈W3It is a non-empty subspace.
■
1.5 Worked Example: Sum and Intersection of Subspaces
Problem. Let U={(x,y,z)∈R3:z=0} (the xy-plane) and W={(x,y,z)∈R3:x=0} (the yz-plane). Find U+W and U∩W And verify the dimension formula.
Solution
U has basis {(1,0,0),(0,1,0)} and dim(U)=2. W has basis {(0,1,0),(0,0,1)} and dim(W)=2.
U∩W={(x,y,z):z=0andx=0}={(0,y,0):y∈R} Which has basis {(0,1,0)} and dimension 1.
U+W=span{(1,0,0),(0,1,0),(0,1,0),(0,0,1)}=span{(1,0,0),(0,1,0),(0,0,1)}=R3 So dim(U+W)=3.
The empty set is not a vector space. The subspace criterion requires the subset to be non-empty. The trivial subspace {0} is the smallest subspace of any vector space.
Non-homogeneous conditions do not define subspaces. The set of solutions to Ax=b with b=0 is not a subspace (it is an affine subspace, or coset of the null space).
Closure must hold for all scalars. A set that is closed under addition and multiplication by positive scalars is not necessarily a subspace; it must also be closed under multiplication by −1.
2. Linear Independence, Span, Basis, and Dimension
2.1 Linear Independence
A set of vectors {v1,v2,…,vk}⊆V is linearly Independent if the equation
α1v1+α2v2+⋯+αkvk=0
Implies α1=α2=⋯=αk=0. Otherwise the set is linearly dependent.
Proposition 2.1 (Equivalent formulations). The following are equivalent for vectors v1,…,vk∈V:
{v1,…,vk} is linearly independent.
No vj can be written as a linear combination of the remaining vectors.
If ∑i=1kαivi=∑i=1kβiviThen αi=βi for all i.
Proof. (1 ⇒ 2): If vj=∑i=jαiviThen ∑i=jαivi−vj=0 gives a non-trivial linear Dependence, contradicting (1).
(2 ⇒ 3): If ∑(αi−βi)vi=0Then by linear Independence (which follows from (2)), αi=βi for all i.
(3 ⇒ 1): If ∑αivi=0=∑0⋅vi Then by (3), αi=0 for all i. ■
2.2 Span
The span of a set S⊆VDenoted span(S)Is the set of all finite linear Combinations of elements of S:
span(S)={∑i=1kαivi:k∈N,αi∈F,vi∈S}
Proposition 2.2.span(S) is always a subspace of V. In fact, span(S) is The smallest subspace containing S: if W is any subspace with S⊆WThen span(S)⊆W.
Proof.span(S) is non-empty since 0=0⋅v for any v∈S. Closure under addition and scalar multiplication follows directly from the Definition of linear combinations. For minimality, any subspace W containing S must contain all Finite linear combinations of elements of S by Proposition 1.2, so span(S)⊆W. ■
2.3 Basis and Dimension
A set B⊆V is a basis for V if:
B is linearly independent, and
span(B)=V.
Theorem 2.1. Every vector space has a basis. All bases of a finite-dimensional vector space have The same number of elements.
The dimension of VDenoted dim(V)Is the cardinality of any basis for V.
2.4 Steinitz Exchange Lemma
Lemma 2.3 (Steinitz Exchange Lemma). Let {u1,…,uk} be a linearly Independent set in VAnd let {w1,…,wm} be a spanning set for V. Then k≤mAnd after relabelling the wjThe set
{u1,…,uk,wk+1,…,wm}
Also spans V.
Proof. We proceed by induction on k. For k=0 there is nothing to prove.
Assume the result holds for k−1. Since {u1,…,uk} is linearly Independent, uk=0 and uk∈span{w1,…,wm} Since the wj span V. Therefore uk=∑j=1mαjwj for some αj∈FAnd not all αj are zero.
After relabelling, assume α1=0. Then w1=α1−1(uk−∑j=2mαjwj) So w1∈span{uk,w2,…,wm}. It follows that
span{w1,…,wm}=span{uk,w2,…,wm}=V
Now {u1,…,uk−1} is linearly independent and {uk,w2,…,wm} spans V. By the inductive hypothesis, k−1≤m−1 (so k≤m) and after relabelling, {u1,…,uk−1,wk,…,wm} spans V. Since uk is already in this span, the full set {u1,…,uk,wk+1,…,wm} also spans V. ■
Theorem 2.4 (Dimension is Well-Defined). If V is finite-dimensional, then any two bases of V have the same number of elements.
Proof. Let B1 and B2 be two bases with ∣B1∣=k and ∣B2∣=m. Applying the Steinitz exchange lemma with B1 as the Independent set and B2 as the spanning set gives k≤m. Swapping roles gives m≤k. Hence k=m. ■
2.5 Dimension Formula
Theorem 2.5 (Dimension Formula). If U and W are subspaces of a finite-dimensional vector Space VThen
dim(U+W)=dim(U)+dim(W)−dim(U∩W)
2.6 Rank-Nullity Theorem
Theorem 2.6 (Rank-Nullity Theorem). Let A∈Mm×n(F). Then
rank(A)+nullity(A)=n
Where rank(A)=dim(col(A)) and nullity(A)=dim(null(A)).
Proof. Let {v1,…,vk} be a basis for null(A)Where k=nullity(A). Extend this to a basis {v1,…,vk,vk+1,…,vn} for Fn.
We claim that {Avk+1,…,Avn} is a basis for col(A).
Spanning: For any y∈col(A)There exists x∈Fn With y=Ax. Writing x=∑i=1nαivi
y=A(∑i=1nαivi)=∑i=1nαiAvi=∑i=k+1nαiAvi
Since Avi=0 for i≤k.
Linear independence: If ∑i=k+1nαiAvi=0Then A(∑i=k+1nαivi)=0So ∑i=k+1nαivi∈null(A). Since {v1,…,vk} Is a basis for the null space, ∑i=k+1nαivi=∑i=1kβivi For some βiGiving ∑i=1n(−βi)vi+∑i=k+1nαivi=0. By linear independence of the full basis, αi=0 for all i≥k+1.
Therefore rank(A)=n−k=n−nullity(A). ■
2.7 Worked Examples
Problem. Find a basis for and the dimension of the subspace W=span{(1,2,−1,0),(3,1,0,2),(−1,3,−2,−2)} of R4.
Solution
Form the matrix whose rows are the given vectors and row-reduce:
This has pivots in columns 1, 3, and 4. The free variable is x2. Setting x2=t and Back-substituting: x4=0, x3=0, x1=−2t. The null space is {t(−2,1,0,0):t∈R}With basis {(−2,1,0,0)} and dimension 1. ■
Problem. Determine whether the vectors v1=(1,2,3), v2=(4,5,6), v3=(7,8,9) form a basis For R3.
Solution
Form the matrix A=[v1∣v2∣v3] and compute Its determinant:
det(A)=1(45−48)−2(36−42)+3(32−35)=−3+12−9=0
Since det(A)=0The columns are linearly dependent, so {v1,v2,v3} Is not a basis. In fact, v3−2v2+v1=0.
■
:::tip To check if n vectors in Rn form a basis, compute the determinant of the matrix whose Columns are those vectors. If det=0They form a basis; if det=0They do not. :::
Problem. Let V=P3(R) (polynomials of degree at most 3). Find the dimension Of the subspace W={p∈P3:p(1)=p(−1)=0}.
If you get this wrong, revise: Section 2.7 (Worked Examples).
2.8 Common Pitfalls
Linear independence of infinitely many vectors. The definition only directly applies to finite subsets. A set S is linearly independent if every finite subset of S is linearly independent.
Dimension and spanning. A set of n vectors in Rn that spans Rn must be linearly independent (and hence a basis). Similarly, n linearly independent vectors in Rn must span Rn.
The empty set spans {0}. The span of the empty set is the trivial subspace, and the empty set is a basis for {0}. The dimension of the zero space is 0.
3. Matrices
3.1 Matrix Operations
An m×n matrix A over F is a rectangular array of mn elements from FArranged in m rows and n columns. The set of all such matrices is denoted Mm×n(F).
Addition. For A,B∈Mm×n(F), (A+B)ij=Aij+Bij.
Scalar multiplication. For α∈F and A∈Mm×n(F)(αA)ij=αAij.
Matrix multiplication. For A∈Mm×n(F) and B∈Mn×p(F) The product AB∈Mm×p(F) is defined by
(AB)ij=∑k=1nAikBkj
Proposition 3.1. Matrix multiplication is associative but not commutative .
Proof. Associativity: (AB)C has (i,j)-entry ∑l(∑kAikBkl)Clj=∑kAik(∑lBklClj)=(A(BC))ij By interchanging the order of summation (both sums are finite). For non-commutativity, A=(0010) and B=(0100) give AB=(1000)=(0001)=BA. ■
3.2 Transpose
The transpose of A∈Mm×n(F)Denoted ATIs the n×m matrix With (AT)ij=Aji.
Properties of transpose:
(A+B)T=AT+BT
(αA)T=αAT
(AB)T=BTAT
(AT)T=A
3.3 Inverse Matrices
A square matrix A∈Mn×n(F) is invertible if there exists a matrix A−1∈Mn×n(F) such that
AA−1=A−1A=In
Theorem 3.1. The following are equivalent for A∈Mn×n(F):
A is invertible.
det(A)=0.
The columns of A are linearly independent.
The rows of A are linearly independent.
rank(A)=n.
The equation Ax=b has a unique solution for every b.
The only solution to Ax=0 is x=0.
3.4 Determinants
The determinant is a function det:Mn×n(F)→F defined recursively by Laplace expansion along the first row:
det(A)=∑j=1n(−1)1+ja1jM1j
Where M1j is the (1,j)-minor (the determinant of the (n−1)×(n−1) matrix obtained by Deleting row 1 and column j).
The (i,j)-cofactor is Cij=(−1)i+jMijSo det(A)=∑j=1naijCij for any fixed row i.
3.5 Properties of Determinants
Proposition 3.2 (Effect of Row Operations). Let A∈Mn×n(F).
Swapping two rows of A multiplies the determinant by −1.
Multiplying a row of A by α∈F multiplies the determinant by α.
Adding a multiple of one row to another leaves the determinant unchanged.
Proof. (1) This follows from the antisymmetry of the Leibniz formula det(A)=∑σ∈Snsgn(σ)∏i=1nai,σ(i). Swapping two rows Changes the sign of every permutation, hence the sign of the sum.
(2) Multiplying row i by α multiplies every term in the Leibniz expansion by α Hence det is multiplied by α.
(3) Adding α times row j to row i (i=j): by multilinearity in row i
det(newA)=det(A)+α⋅det(matrixwithrowsiandjequal)
A matrix with two equal rows has determinant 0 (by antisymmetry: swapping them leaves the matrix Unchanged but multiplies det by −1So det=−detHence det=0). Therefore det(newA)=det(A). ■
Theorem 3.3 (Multiplicativity). For A,B∈Mn×n(F)
det(AB)=det(A)det(B)
Proof (via elementary matrices). Every matrix B can be written as a product of elementary matrices Times an upper triangular matrix: B=E1E2⋯EkU. For an elementary matrix E:
If E swaps rows, det(E)=−1 and det(AE)=−det(A)=det(A)det(E).
If E multiplies a row by α, det(E)=α and det(AE)=αdet(A)=det(A)det(E).
If E adds a multiple of one row to another, det(E)=1 and det(AE)=det(A)=det(A)det(E).
Thus det(AE)=det(A)det(E) for every elementary matrix. By induction,
Corollary 3.4.det(AT)=det(A)And for invertible A, det(A−1)=1/det(A).
Proof.AA−1=ISo det(A)det(A−1)=det(I)=1Giving det(A−1)=1/det(A). For the transpose, use the Leibniz formula or observe that row Operations and column operations have the same effects on the determinant. ■
3.6 Adjugate and Inverse Formula
Definition. The adjugate (or adjoint) of A∈Mn×n(F) is
adj(A)=(Cji)i,j=1n
Where Cij is the (i,j)-cofactor of A. That is, adj(A) is the transpose of the Cofactor matrix.
Theorem 3.5. For any A∈Mn×n(F)
A⋅adj(A)=adj(A)⋅A=det(A)⋅In
In particular, if det(A)=0Then A−1=det(A)1adj(A).
Proof. The (i,j)-entry of A⋅adj(A) is ∑k=1naikCjk. When i=jThis is ∑k=1naikCik=det(A) (cofactor expansion along row i). When i=jThis is the cofactor expansion of a matrix obtained from A by replacing row j With row iWhich has two equal rows and hence determinant 0. ■
3.7 Worked Examples
Problem. Compute det(A) where
A=105216340
Solution
Expanding along the first column:
det(A)=1⋅det(1640)−0+5⋅det(2134)
=1⋅(0−24)+5⋅(8−3)=−24+25=1
■
Problem. Compute det(A) by row reduction where
A=2461123135811351
Solution
Apply row operations and track their effect on the determinant:
:::caution Common Pitfall The determinant is only defined for square matrices. There is no meaningful determinant for an m×n matrix with m=n. Do not confuse det(AB)=det(A)det(B) with a Non-existent formula for non-square matrices. :::
3.8 Worked Example: Determinant via Row Reduction (Efficient Method)
Problem. Compute det(A) where
A=1111123413610141020
Solution
This matrix has Pascal-like entries. We use row operations:
All operations were type 3 (adding a multiple of one row to another), so the determinant is Unchanged. The upper triangular matrix has diagonal entries 1,1,1,1So det(A)=1. ■
Proposition 3.6 (Determinant of a Triangular Matrix). If A is upper or lower triangular, Then det(A)=∏i=1naii.
Proof. By repeated cofactor expansion along the first column (for upper triangular), or induction. At each step, all terms involving off-diagonal entries vanish due to the zero structure, leaving Only the product of diagonal entries. ■
3.9 Common Pitfalls
det(A+B)=det(A)+det(B) . For example, with A=B=I2det(A+B)=det(2I2)=4But det(A)+det(B)=2.
The adjugate formula is theoretically important but computationally inefficient. For large matrices, use Gaussian elimination or LU decomposition to compute inverses.
A matrix with det(A)=0 has no inverse. Do not attempt to divide by zero.
4. Systems of Linear Equations
4.1 Gaussian Elimination
A system of m linear equations in n unknowns can be written as Ax=bWhere A∈Mm×n(R), x∈RnAnd b∈Rm.
Gaussian elimination transforms the augmented matrix [A∣b] into row echelon Form (REF) using elementary row operations:
Swap two rows (Ri↔Rj).
Multiply a row by a non-zero scalar (Ri→αRi).
Add a multiple of one row to another (Ri→Ri+αRj).
A matrix is in REF if:
All zero rows are at the bottom.
The leading entry (pivot) of each non-zero row is strictly to the right of the pivot above.
All entries below a pivot are zero.
Reduced row echelon form (RREF) additionally requires:
Each pivot is 1.
Each pivot is the only non-zero entry in its column.
Theorem 4.1. Every matrix has a unique RREF. Two matrices are row-equivalent if and only if they Have the same RREF.
4.2 Existence and Uniqueness
Theorem 4.2 (Rouché—Capelli). The system Ax=b is consistent (has at least one Solution) if and only if
rank(A)=rank([A∣b])
If consistent, the solution set has dim(null(A)) free parameters, where dim(null(A))=n−rank(A).
Proof. Let the RREF of [A∣b] have r=rank(A) pivots in the coefficient Columns. The system is inconsistent if and only if the last non-zero row is [0⋯0∣1] Which occurs precisely when the augmented column contains a pivot, i.e., when rank([A∣b])>r.
If consistent, the r pivot variables are determined by the n−r free variables, yielding n−rank(A) degrees of freedom. ■
4.3 LU Decomposition
An LU decomposition of A∈Mn×n(R) writes A=LU where L is Lower triangular with 1s on the diagonal, and U is upper triangular.
Theorem 4.3. If Gaussian elimination can be performed on A without row exchanges, then A Admits an LU decomposition.
Algorithm. Store the multipliers mij (used to eliminate entry aij) in the lower Triangular portion. The resulting upper triangular matrix is UAnd the multipliers form L.
To solve Ax=bFirst solve Ly=b (forward substitution), Then Ux=y (back substitution).
4.4 Gaussian Elimination with Partial Pivoting
When a pivot is zero (or very small), we swap rows to bring the largest available entry in the Current column into the pivot position. This is partial pivoting, and it improves numerical Stability.
Problem. Solve the system using Gaussian elimination with partial pivoting:
x1+2x2+x33x1+x2−x32x1+3x2+4x3=5=2=11
Solution
Augmented matrix:
[A∣b]=1322131−145211
Step 1. Column 1: largest entry is 3 in row 2. Swap R1↔R2:
312123−1142511
R2→R2−31R1, R3→R3−32R1:
30015/37/3−14/314/3213/329/3
Step 2. Column 2: largest entry below pivot is 7/3 in row 3. Swap R2↔R3:
30017/35/3−114/34/3229/313/3
R3→R3−75R2:
30017/30−114/3−2/7229/3−6/7
Back substitution. From row 3: −72x3=−76So x3=3.
From row 2: 37x2+314(3)=329So 37x2=329−14=−313 Giving x2=−713.
From row 1: 3x1+(−713)−3=2So 3x1=2+3+713=748 Giving x1=716.
Solution:x1=716, x2=−713, x3=3. ■
4.5 Least Squares Solutions
When Ax=b is overdetermined (more equations than unknowns) and inconsistent, We seek x that minimises ∥Ax−b∥2.
Theorem 4.4 (Normal Equations). The least squares solution x^ satisfies
ATAx^=ATb
If A has full column rank, then ATA is invertible and x^=(ATA)−1ATb.
Proof. The error vector e=Ax−b is minimised when e⊥col(A) I.e., when ATe=0. This gives AT(Ax−b)=0 Or ATAx=ATb. If A has full column rank, then ker(A)={0} So ker(ATA)=ker(A)={0}Meaning ATA is invertible. ■
Problem. Find the least squares line y=ax+b fitting the data points (1,1), (2,1), (3,3).
Solution
The system is 123111(ab)=113 I.e., Ax=b with A=123111x=(a,b)T, b=(1,1,3)T.
Now the system is consistent. Pivots in columns 1 and 3; free variables are x2 and x4. From row 2: x3=x4. From row 1: x1=1−2x2+x3−3x4=1−2x2−2x4.
Solution set: {(1−2s−2t,s,t,t):s,t∈R}.
In parametric form: x1x2x3x4=1000+s−2100+t−2011.
The solution space is a 2-dimensional affine subspace (a plane) in R4. ■
4.7 Common Pitfalls
Partial pivoting is not optional for numerical work. Without it, Gaussian elimination can produce catastrophically wrong results due to rounding errors.
The normal equations can be ill-conditioned. For better numerical stability, use QR decomposition to solve least squares problems instead of forming ATA.
Not every system has a solution. Always check consistency via the Rouché—Capelli theorem before attempting to solve.
5. Eigenvalues and Eigenvectors
5.1 Definitions
Let A∈Mn×n(F). A scalar λ∈F is an eigenvalue of A if there Exists a non-zero vector v∈Fn such that
Av=λv
The vector v is called an eigenvector corresponding to λ.
The eigenspace corresponding to λ is Eλ=ker(A−λI). Its dimension is The geometric multiplicity of λ.
5.2 Characteristic Polynomial
Theorem 5.1.λ is an eigenvalue of A if and only if det(A−λI)=0.
The polynomial p(λ)=det(A−λI) is called the characteristic polynomial of A. Its roots (in the algebraic closure of F) are the eigenvalues of A.
If p(λ)=(lambda−lambda1)m1(lambda−lambda2)m2⋯(lambda−lambdak)mk With lambda1,…,lambdak distinct, then mi is the algebraic multiplicity of lambdai.
Proposition 5.2. For each eigenvalue λ, 1≤dim(Eλ)≤mλ (geometric multiplicity does not exceed algebraic multiplicity).
5.3 Diagonalisation
Definition.A is diagonalisable if there exists an invertible matrix P and a diagonal Matrix D such that
A=PDP−1
Theorem 5.3.A∈Mn×n(F) is diagonalisable (over F) if and only if A has n linearly independent Eigenvectors (over F). Equivalently, the sum of the geometric multiplicities equals n.
Corollary 5.4. If A has n distinct eigenvalues, then A is diagonalisable.
Proof. Eigenvectors corresponding to distinct eigenvalues are linearly independent. With n distinct Eigenvalues, we obtain n linearly independent eigenvectors, which form a basis of Fn. ■
5.4 Cayley—Hamilton Theorem
Theorem 5.5 (Cayley—Hamilton). Every square matrix satisfies its own characteristic polynomial: If p(λ)=det(λI−A)Then p(A)=0 (the zero matrix).
Proof sketch. Let p(λ)=λn+cn−1λn−1+⋯+c1λ+c0. By the adjugate formula (Theorem 3.5), (λI−A)⋅adj(λI−A)=p(λ)⋅I. Each entry of adj(λI−A) is a polynomial in λ of degree at most n−1 So we can write adj(λI−A)=Bn−1λn−1+⋯+B1λ+B0 for Matrices Bi. Multiplying out and comparing coefficients of λk:
Bn−1=I,Bn−2−ABn−1=cn−1I,…,−AB0=c0I
Multiplying the k-th equation on the left by Ak and summing over k:
But the left side telescopes to zero, so p(A)=0. ■
5.5 Jordan Normal Form
When a matrix is not diagonalisable, the Jordan normal form provides the next-best canonical Representation.
Theorem 5.6. Let A∈Mn×n(C). Then A is similar to a block-diagonal Matrix
J=J1⋱Jk
Where each Jordan block has the form
Ji=λi1λi⋱⋱1λi
The Jordan form is unique up to permutation of the blocks.
Intuition. Each Jordan block corresponds to one eigenvalue. The size of the block equals the Number of steps in the chain v,(A−λI)v,(A−λI)2v,… Of generalised eigenvectors. A diagonalisable matrix has all Jordan blocks of size 1×1.
Problem. Find the Jordan normal form of
A=(3013)
Solution
The characteristic polynomial is det(A−λI)=(3−λ)2So λ=3 is the Only eigenvalue with algebraic multiplicity 2.
A−3I=(0010)Which has rank 1, so the geometric Multiplicity is dim(ker(A−3I))=2−1=1.
Since the geometric multiplicity (1) is less than the algebraic multiplicity (2), A is not Diagonalisable. The Jordan form has one block of size 2:
J=(3013)
(In this case, A is already in Jordan form.) ■
5.6 Spectral Theorem for Real Symmetric Matrices
Theorem 5.7 (Spectral Theorem). If A∈Mn×n(R) is symmetric (A=AT), then:
All eigenvalues of A are real.
A has n linearly independent orthonormal eigenvectors.
A is orthogonally diagonalisable: A=QDQT where Q is orthogonal (QTQ=I).
Proof. We prove (1) and then (2) and (3) by induction on n.
(1) Let λ∈C be an eigenvalue with eigenvector v∈Cnv=0. Then
vTAv=vT(λv)=λvTv
Since A=AT and A has real entries, A=A=ATSo
vTAv=(Av)Tv=(Av)Tv=(λv)Tv=λvTv
Therefore (λ−λ)vTv=0. Since vTv>0 We have λ=λSo λ∈R.
(2) and (3) By induction. For n=1 the result is trivial. Assume it holds for (n−1)×(n−1) Symmetric matrices. Since all eigenvalues are real, A has a real eigenvalue λ1 with real Eigenvector v1. Normalise: q1=v1/∥v1∥.
Let W=q1⊥={w∈Rn:q1Tw=0}. For any w∈W:
q1T(Aw)=(Aq1)Tw=(λ1q1)Tw=λ1⋅0=0
So Aw∈W. Therefore A restricts to a symmetric linear map A∣W:W→W on an (n−1)-dimensional space. By the inductive hypothesis, W has an orthonormal basis {q2,…,qn} of eigenvectors of A∣W.
Then {q1,q2,…,qn} is an orthonormal eigenbasis for Rn And A=QDQT with Q=[q1∣⋯∣qn]. ■
5.7 Worked Example: Full Diagonalisation
Problem. Find the eigenvalues, eigenvectors, and diagonalise
Problem. Use the Cayley—Hamilton theorem to compute A10 for the same matrix A above.
Solution
The characteristic polynomial is p(λ)=λ2−7λ+10So by Cayley—Hamilton, A2=7A−10I.
To find A10Divide λ10 by p(λ):
λ10=q(λ)(λ2−7λ+10)+r(λ)
Where r(λ)=aλ+b has degree less than 2. Then A10=r(A)=aA+bI.
To find a and bEvaluate at the eigenvalues:
λ10λ=5=510=9765625=5a+b
λ10λ=2=210=1024=2a+b
Subtracting: 3a=9765625−1024=9764601So a=3254867.
b=1024−2⋅3254867=1024−6509734=−6508710.
Therefore A10=3254867⋅A−6508710⋅I. ■
:::caution Common Pitfall Not every matrix is diagonalisable. For example, A=(1011) has Eigenvalue λ=1 with algebraic multiplicity 2 but geometric multiplicity 1. It has only one Linearly independent eigenvector and is not diagonalisable. :::
5.8 Worked Example: Spectral Decomposition of a Symmetric Matrix
Problem. Orthogonally diagonalise the symmetric matrix
For λ1=1: A−I=111111111→100100100. Eigenspace: {(s,t,−s−t):s,t∈R}. An orthonormal basis: q1=21(1,−1,0), q2=61(1,1,−2).
For λ2=2: A−2I=011101110→100010110. Eigenspace: {(−t,−t,t):t∈R}. Normalised: q3=31(−1,−1,1).
For λ3=3: A−3I=−1111−1111−1→100−100−100. Eigenspace: {(s+t,s,t):s,t∈R}. Normalised: q4=31(1,1,1).
A=QDQT where Q=613−3011−2−2−22222 and D=diag(1,2,3).
■
5.9 Common Pitfalls
Algebraic multiplicity ≥ geometric multiplicity, not the reverse. A matrix is diagonalisable if and only if equality holds for every eigenvalue.
The characteristic polynomial depends on the choice of eigenvalue variable convention.det(A−λI) and det(λI−A) differ by a factor of (−1)n but have the same roots.
Similarity preserves eigenvalues but not eigenvectors. If A=PBP−1 and Bv=λv then A(Pv)=λ(Pv); the eigenvectors transform by P.
6. Linear Transformations
6.1 Definition
A linear transformation (or linear map) T:V→W between vector spaces V and W over F Is a function satisfying:
T(u+v)=T(u)+T(v) for all u,v∈V
T(αv)=αT(v) for all α∈F, v∈V
Equivalently, T(αu+βv)=αT(u)+βT(v) for all α,β∈F and u,v∈V.
The set of all linear transformations from V to W is denoted L(V,W).
Proposition 6.1. For any linear transformation T:
T(0)=0.
T(−v)=−T(v).
T(∑i=1kαivi)=∑i=1kαiT(vi).
Proof.T(0)=T(0⋅0)=0⋅T(0)=0. T(−v)=T((−1)v)=(−1)T(v)=−T(v). Property (3) follows by Induction. ■
6.2 Matrix Representation
If V and W are finite-dimensional with bases BV={v1,…,vn} and BW={w1,…,wm}Then every T∈L(V,W) is Uniquely represented by a matrix [T]BVBW∈Mm×n(F) Where the j-th column is the coordinate vector of T(vj) with respect to BW.
6.3 Kernel and Image
The kernel (null space) and image (range) of T are:
ker(T)={v∈V:T(v)=0}im(T)={T(v):v∈V}
Proposition 6.2.ker(T) is a subspace of V and im(T) is a subspace of W.
6.4 Rank-Nullity Theorem for Linear Maps
Theorem 6.3 (Rank-Nullity). For T∈L(V,W) with V finite-dimensional:
dim(ker(T))+dim(im(T))=dim(V)
Proof. Let {u1,…,uk} be a basis for ker(T)Where k=dim(ker(T)). Extend to a basis {u1,…,uk,uk+1,…,un} of V Where n=dim(V).
We claim {T(uk+1),…,T(un)} is a basis for im(T).
Spanning: For any w∈im(T)Write w=T(v) for some v=∑i=1nαiui∈V. Then
Linear independence: If ∑i=k+1nαiT(ui)=0Then T(∑i=k+1nαiui)=0So ∑i=k+1nαiui∈ker(T). Thus ∑i=k+1nαiui=∑j=1kβjuj For some βj. By linear independence of the full basis, all coefficients are zero.
A linear transformation T:V→W is an isomorphism if it is bijective. We write V≅W.
Theorem 6.4.T is an isomorphism if and only if ker(T)={0} and im(T)=W.
Corollary 6.5. If dim(V)=dim(W)<∞Then T is injective if and only if T is surjective.
Proof. If T is injective, ker(T)={0}So dim(im(T))=dim(V)=dim(W) Hence im(T)=W (a subspace of full dimension equals the whole space). Conversely, If T is surjective, dim(im(T))=dim(W)=dim(V)So dim(ker(T))=0Giving ker(T)={0}. ■
6.6 Change of Basis
If P is the change-of-basis matrix from basis B to basis B′Then for a Linear transformation T with matrix representations [T]B and [T]B′:
[T]B′=P−1[T]BP
This is the similarity transformation. Similar matrices represent the same linear transformation In different bases and share the same eigenvalues, determinant, and trace.
6.7 Worked Example: Matrix of a Transformation with Change of Basis
Problem. Let T:R2→R2 be defined by T(x,y)=(2x+y,x+2y). (a) Find [T]E where E is the standard basis. (b) Find [T]B where B={(1,1),(1,−1)}. (c) Verify that [T]B=P−1[T]EP.
Solution
(a)T(1,0)=(2,1) and T(0,1)=(1,2)So
[T]E=(2112)
(b) Compute T on the basis B:
T(1,1)=(3,3)=3(1,1)+0(1,−1)So coordinates are (30)B.
T(1,−1)=(1,−1)=0(1,1)+1(1,−1)So coordinates are (01)B.
[T]B=(3001)
(c) The change-of-basis matrix from E to B is
P=(111−1),P−1=(1/21/21/2−1/2)
P−1[T]EP=(1/21/21/2−1/2)(2112)(111−1)
=(3/21/23/2−1/2)(111−1)=(3001)=[T]B. ■
6.8 Dual Spaces
Definition. The dual space of a vector space V over FDenoted V∗Is the space of all Linear functionals f:V→F. Elements of V∗ are called covectors.
Proposition 6.6. If dim(V)=n<∞Then dim(V∗)=n.
Proof. Let {e1,…,en} be a basis for V. Define the dual basis{φ1,…,φn}⊆V∗ by φi(ej)=δij (the Kronecker Delta). Each φi is a well-defined linear functional since it is defined on a basis and Extended linearly. These are linearly independent: if ∑ciφi=0Then applying to ej gives cj=0. They span V∗: for any f∈V∗, f=∑i=1nf(ei)φi. ■
Definition. The double dual of V is V∗∗=(V∗)∗.
Theorem 6.7. If V is finite-dimensional, the map Φ:V→V∗∗ defined by Φ(v)(f)=f(v) is a natural isomorphism.
Intuition. The double dual “recovers” the original space. A vector v can be Identified with the functional on V∗ that evaluates each covector at v.
Example. For V=R3 with standard basis, the dual basis {φ1,φ2,φ3} Is given by φi(x1,x2,x3)=xi. The functional f(x1,x2,x3)=3x1−2x2+x3 Corresponds to the covector 3φ1−2φ2+φ3 in V∗.
Remark. In infinite dimensions, V and V∗∗ need not be isomorphic. The double dual Isomorphism is a special feature of finite-dimensional spaces.
6.9 Annihilators
Definition. For a subset S⊆VThe annihilator of S is
S0={f∈V∗:f(s)=0foralls∈S}
Proposition 6.8.S0 is a subspace of V∗And if W is a subspace of V with dim(V)=nThen dim(W0)=n−dim(W).
Proof.S0 is the intersection of the kernels ker(s) as s ranges over SWhere each s Is viewed as an element of V∗∗ via Φ. Each ker(s) is a subspace of V∗And any Intersection of subspaces is a subspace.
For the dimension: let dim(W)=k and extend a basis {w1,…,wk} Of W to a basis {w1,…,wk,wk+1,…,wn} of V. Let {φ1,…,φn} be the dual basis. Then f∈W0 iff f(wi)=0 For i=1,…,k. Writing f=∑cjφjWe need ci=0 for i=1,…,k. So f=∑j=k+1ncjφjGiving dim(W0)=n−k. ■
7. Inner Product Spaces
7.1 Definition of an Inner Product
An inner product on a vector space V over F (where F=R or C) is a Function ⟨⋅,⋅⟩:V×V→F satisfying:
Conjugate symmetry: ⟨u,v⟩=⟨v,u⟩
Linearity in the first argument: ⟨αu+βw,v⟩=α⟨u,v⟩+β⟨w,v⟩
Positive definiteness: ⟨v,v⟩≥0 with equality iff v=0
A vector space equipped with an inner product is called an inner product space.
Example. The standard inner product on Rn is ⟨x,y⟩=∑i=1nxiyi. On Cn⟨x,y⟩=∑i=1nxiyi.
Example. On C[a,b]The L2 inner product is ⟨f,g⟩=∫abf(x)g(x)dx.
7.2 Norms
Every inner product induces a norm:
∥v∥=⟨v,v⟩
Theorem 7.1 (Cauchy—Schwarz Inequality). For all u,v∈V
∣⟨u,v⟩∣≤∥u∥∥v∥
With equality if and only if u and v are linearly dependent.
Proof. If v=0Both sides are 0 and the result holds. Assume v=0. For any t∈R (or C), positive definiteness gives
0≤⟨u−tv,u−tv⟩=⟨u,u⟩−t⟨v,u⟩−t⟨u,v⟩+∣t∣2⟨v,v⟩
Set t=⟨v,v⟩⟨u,v⟩ (the value that minimises the right side):
0≤∥u∥2−∥v∥2∣⟨u,v⟩∣2
Rearranging: ∣⟨u,v⟩∣2≤∥u∥2∥v∥2. Taking square roots gives the result. Equality holds iff u−tv=0 I.e., u and v are linearly dependent. ■
Theorem 7.2 (Triangle Inequality).
∥u+v∥≤∥u∥+∥v∥
Proof.
∥u+v∥2=⟨u+v,u+v⟩=∥u∥2+2Re⟨u,v⟩+∥v∥2
By Cauchy—Schwarz, Re⟨u,v⟩≤∣⟨u,v⟩∣≤∥u∥∥v∥So
∥u+v∥2≤∥u∥2+2∥u∥∥v∥+∥v∥2=(∥u∥+∥v∥)2
Taking square roots gives the result. ■
7.3 Orthogonality
Two vectors u,v are orthogonal if ⟨u,v⟩=0. We write u⊥v.
An orthonormal set{e1,…,ek} satisfies ⟨ei,ej⟩=δij.
Theorem 7.3 (Pythagorean Theorem). If u⊥vThen
∥u+v∥2=∥u∥2+∥v∥2
Proof.∥u+v∥2=∥u∥2+2⟨u,v⟩+∥v∥2=∥u∥2+∥v∥2. ■
Proposition 7.4. Every orthonormal set is linearly independent.
Proof. If ∑i=1kαiei=0Then taking the inner product with ej: αj=⟨∑αiei,ej⟩=⟨0,ej⟩=0 for each j. ■
7.4 Gram—Schmidt Process
The Gram—Schmidt process converts a linearly independent set {v1,…,vn} into an orthonormal set {e1,…,en}:
u1=v1,e1=∥u1∥u1
uk=vk−∑i=1k−1⟨vk,ei⟩ei,ek=∥uk∥uk
Proposition 7.5. At each step, span{e1,…,ek}=span{v1,…,vk}.
Proof. By construction, uk is vk minus its projection onto span{e1,…,ek−1}=span{v1,…,vk−1}. So uk∈span{v1,…,vk} and vk=uk+∑i=1k−1⟨vk,ei⟩ei∈span{u1,…,uk}. Since each ei is a scalar multiple of uiThe spans coincide. ■
7.5 Orthogonal Projection
The orthogonal projection of v onto a subspace W with orthonormal basis {e1,…,ek} is
projW(v)=∑i=1k⟨v,ei⟩ei
Theorem 7.6 (Best Approximation). Among all vectors in WThe orthogonal projection projW(v) minimises the distance to v:
∥v−projW(v)∥≤∥v−w∥forallw∈W
Proof. For any w∈WWrite v−w=(v−projW(v))+(projW(v)−w). The first term is orthogonal to W (hence to the second term, which lies in W), so by the Pythagorean theorem:
A fundamental application of orthogonal projection is fitting functions to data. Given a subspace W of an inner product space V and a target v∈VThe best approximation in W Is the orthogonal projection projW(v).
7.7 Worked Example: Gram—Schmidt
Problem. Apply the Gram—Schmidt process to v1=(1,1,0)v2=(1,0,1), v3=(0,1,1) in R3 with the standard inner Product.
The orthonormal basis is {21(1,1,0),61(1,−1,2),31(−1,1,1)}. ■
:::caution Common Pitfall The Gram—Schmidt process requires a linearly independent starting set. If the input vectors are Linearly dependent, one of the uk will be the zero vector, and the process will fail (attempting to divide by zero in the normalisation step). :::
7.8 Worked Example: Orthogonal Projection onto a Plane
Problem. Find the orthogonal projection of v=(3,−1,2) onto the plane W spanned by (1,0,1) and (0,1,1) in R3 with the standard inner product. Also find the distance from v to W.
Solution
First, apply Gram—Schmidt to obtain an orthonormal basis for W.
The residual is v−projW(v)=(0,0,0)So the distance is 0. This means v∈W itself. Indeed, v=3(1,0,1)−(0,1,1)∈span{(1,0,1),(0,1,1)}. ■
7.9 Worked Example: L2 Least Squares Approximation
Problem. Find the constant function c (i.e., the best approximation by a degree-0 polynomial) That minimises ∫01(ex−c)2dx.
Solution
We want the orthogonal projection of f(x)=ex onto the subspace W=span{1} in the L2[0,1] inner product space. The orthonormal basis for W is e1=1 (since ∥1∥2=∫011dx=1).
projW(f)=⟨f,1⟩⋅1=(∫01exdx)⋅1=(e−1)⋅1
So the best constant approximation is c=e−1≈1.718.
Verification: The error is ex−(e−1). Expanding ex as a Taylor series around x=1/2: The constant term is e1/2≈1.649But our answer e−1≈1.718 is the L2-optimal constant, not the Taylor approximation. The two optimisation criteria differ. ■
7.10 Common Pitfalls
The Cauchy—Schwarz inequality is not the triangle inequality. Cauchy—Schwarz bounds the inner product by the product of norms; the triangle inequality bounds the norm of a sum by the sum of norms. They are related (the triangle inequality follows from Cauchy—Schwarz) but distinct.
Gram—Schmidt is numerically unstable. For floating-point computation, modified Gram—Schmidt or Householder reflections are preferred.
Orthogonal projection decomposes v uniquely.v=projW(v)+v⊥ where v⊥∈W⊥. This decomposition is unique and is called the orthogonal decomposition.
8. Singular Value Decomposition
8.1 Existence of the SVD
Theorem 8.1 (Singular Value Decomposition). Every matrix A∈Mm×n(R) can be factored as
A=UΣVT
Where U∈Mm×m(R) is orthogonal, V∈Mn×n(R) is orthogonal, and Σ∈Mm×n(R) is diagonal with non-negative entries σ1≥σ2≥⋯≥σr≥0 (where r=rank(A)).
The σi are called the singular values of A. The columns of U are the left singular vectors, and the columns of V are the right singular vectors.
Proof. The matrix ATA is an n×n symmetric positive semi-definite matrix, so by the spectral theorem it has n real non-negative eigenvalues and an orthonormal eigenbasis. Let σ12,σ22,…,σn2 be these eigenvalues (some may be zero) with corresponding orthonormal eigenvectors v1,…,vn forming the columns of V.
For each i with σi>0Define ui=Avi/σi. We verify that these form an orthonormal set:
Proposition 8.2. The singular values of A are the square roots of the eigenvalues of ATA (equivalently, of AAT). The non-zero singular values of A and AT are identical.
Proof. From the construction above, ATAvi=σi2vi. For AAT:
AATui=σiA(ATA)vi=σiσi2Avi=σi2ui
So ui is an eigenvector of AAT with eigenvalue σi2. The non-zero eigenvalues of ATA and AAT coincide (since if ATAv=λv with λ=0Then AAT(Av)=A(ATAv)=λ(Av) and Av=0). ■
Proposition 8.3. If A is symmetric with eigenvalues λ1,…,λnThen the singular values of A are ∣λ1∣,…,∣λn∣.
Proof.ATA=A2Whose eigenvalues are λi2. The singular values are λi2=∣λi∣. ■
8.3 Geometric Interpretation
The SVD decomposes the linear transformation T:Rn→Rm into three steps:
VT rotates (or reflects) Rn into a coordinate system aligned with the right singular vectors.
Σ scales each axis by the corresponding singular value (and possibly drops dimensions where σi=0).
U rotates (or reflects) the result into the coordinate system of the left singular vectors.
Unit circle image. Under AThe unit circle in R2 is mapped to an ellipse with semi-axes σ1 and σ2 aligned with the columns of U.
8.4 Low-Rank Approximation
Theorem 8.4 (Eckart—Young—Mirsky). Let A=UΣVT be the SVD with singular values σ1≥σ2≥⋯≥σr>0. For any k<rThe best rank-k approximation to A (in both the Frobenius and spectral norms) is
Ak=∑i=1kσiuiviT=UkΣkVkT
And the approximation error is
∥A−Ak∥F=σk+12+⋯+σr2,∥A−Ak∥2=σk+1
Proof (Frobenius norm). Any rank-k matrix B can be written in terms of an orthonormal basis of its column space. Let W∈Mn×k(R) have orthonormal columns spanning the column space of B. Then B=CWT for some CAnd:
∥A−B∥F2=∥A(I−WWT)∥F2+∥(A−C)WT∥F2≥∥A(I−WWT)∥F2
The minimum over W is achieved when W spans the subspace spanned by v1,…,vk (the top k right singular vectors), giving:
∥A−Ak∥F2=∑i=k+1rσi2
The spectral norm result follows because ∥A−Ak∥2=σk+1 is the largest singular value of A−Ak. ■
8.5 Pseudoinverse
Definition. The Moore—Penrose pseudoinverse of A=UΣVT is
A+=VΣ+UT
Where Σ+ is obtained from Σ by transposing and inverting each non-zero singular value:
(Σ+)ii={1/σi0ifσi>0ifσi=0
Theorem 8.5. The pseudoinverse satisfies the four Moore—Penrose conditions:
AA+A=A
A+AA+=A+
(AA+)T=AA+
(A+A)T=A+A
Proof. Direct computation using A=UΣVT and A+=VΣ+UT:
AA+A=UΣVTVΣ+UTUΣVT=UΣΣ+ΣVT=UΣVT=A
Since ΣΣ+Σ=Σ (the non-zero singular values are preserved, zeros remain zero). The remaining conditions follow similarly. ■
Theorem 8.6 (Minimum-Norm Least Squares). If Ax=b is inconsistent, then x∗=A+b is the least-squares solution of minimum norm.
Proof. The least-squares solutions to Ax≈b are x=A+b+(I−A+A)z for arbitrary z. Since (I−A+A)z∈ker(A) and A+b∈im(AT)These two components are orthogonal. The minimum-norm solution is obtained when z=0Giving x∗=A+b. ■
8.6 Condition Number
Definition. The condition number of A (with respect to the spectral norm) is
κ(A)=∥A∥2⋅∥A+∥2=σrσ1
Where σ1 is the largest and σr is the smallest non-zero singular value.
Theorem 8.7 (Sensitivity of Linear Systems). If Ax=b and A(x+δx)=b+δbThen
∥x∥∥δx∥≤κ(A)⋅∥b∥∥δb∥
Proof. From Aδx=δb: ∥δx∥=∥A−1δb∥≤∥A−1∥∥δb∥=σr−1∥δb∥. From Ax=b: ∥b∥=∥Ax∥≥σ1−1∥x∥… Wait, ∥Ax∥≥σr∥x∥ and ∥b∥≤σ1∥x∥. Combining: ∥δx∥/∥x∥≤(σ1/σr)(∥δb∥/∥b∥). ■
A matrix with large condition number is ill-conditioned: small perturbations in b can cause large changes in x.
Setting to zero and solving: λ3−5λ2+6λ+1=0. Testing λ=3: 27−45+18+1=1=0. Testing λ=4: 64−80+24+1=9=0. By numerical methods or the rational root theorem (no rational roots), the eigenvalues are approximately λ1≈5.25, λ2≈1.31, λ3≈−0.56.
Wait, ATA should be positive semi-definite, so all eigenvalues should be non-negative. Let me recompute.
ATA=211110102.
tr(ATA)=5So λ1+λ2+λ3=5.
det(ATA)=2(2)−1(2)−1(−1)=4−2+1=3.
∑ of principal 2×2 minors =(2−1)+(4−1)+(2−0)=1+3+2=6.
By the trigonometric method for cubics or numerical approximation, λ1≈3.35, λ2≈1.35, λ3≈0.30.
The best rank-1 approximation uses only σ1=λ1≈1.83 and its corresponding singular vectors, yielding A1=σ1u1v1T with error ∥A−A1∥F=λ2+λ3≈1.65≈1.28. ■
8.9 Worked Example: Condition Number and Numerical Stability
Problem. Compute the condition number of A=(1111.0001) and discuss its implications.
This means relative errors in b can be amplified by a factor of up to 20000 in the solution x. The matrix is nearly singular (the two rows are almost linearly dependent), and solving Ax=b in floating-point arithmetic will lose approximately log10(20000)≈4.3 digits of precision. ■
8.10 Common Pitfalls
Singular values are always non-negative. Unlike eigenvalues, which can be negative or complex, singular values are the square roots of eigenvalues of ATAHence always real and non-negative.
The SVD is not unique. If A has repeated singular values, the corresponding singular vectors can be any orthonormal basis of the eigenspace. The signs of singular vectors can also be flipped in pairs.
The pseudoinverse equals the inverse only for square, full-rank matrices. When A is not full rank, A+A=I; instead, A+A is the orthogonal projection onto im(AT).
The SVD and eigendecomposition are different decompositions. The SVD always exists for any matrix, but the eigendecomposition requires the matrix to be square. Even for symmetric matrices, the singular values are ∣λi∣Not λi.
9. Problem Set
Problem 1. Let V=R3 and W={(x,y,z)∈R3:x−y+z=0}. Show that W is a subspace of V and find its dimension.
Solution
W is non-empty since 0=(0,0,0)∈W. If (x1,y1,z1),(x2,y2,z2)∈WThen (x1−y1+z1)+(x2−y2+z2)=0+0=0So their sum is in W. Similarly, α(x−y+z)=0 for any scalar α. Hence W is a subspace.
W is defined by one linear equation, so dim(W)=3−1=2. A basis is {(1,1,0),(−1,0,1)}.
If you get this wrong, revise: Section 1.3 (Subspace Criterion).
Problem 2. Is the set S={(x,y)∈R2:xy=0} a subspace of R2?
Solution
No. (1,0)∈S and (0,1)∈SBut (1,0)+(0,1)=(1,1)∈/S since 1⋅1=0. S is not closed under addition.
If you get this wrong, revise: Section 1.3 (Subspace Criterion).
Problem 3. Determine whether the set {1−x,1+x,x2} is linearly independent in P2(R).
Solution
Suppose a(1−x)+b(1+x)+cx2=0 as a polynomial. Then (a+b)+(−a+b)x+cx2=0So a+b=0, −a+b=0, c=0. From the first two equations: 2a=0So a=0Then b=0. Since a=b=c=0The set is linearly independent.
If you get this wrong, revise: Section 2.1 (Linear Independence).
Pivots are in columns 1 and 3. A basis for col(A) is {(1,2,3),(1,0,1)} (the pivot columns of the original A). dim(col(A))=2.
If you get this wrong, revise: Section 2.7 (Worked Examples).
Problem 5. Let U=span{(1,0,1),(0,1,1)} and W=span{(1,1,0)} in R3. Verify the dimension formula dim(U+W)=dim(U)+dim(W)−dim(U∩W).
Solution
dim(U)=2 (the two spanning vectors are linearly independent), dim(W)=1. Since dim(U)+dim(W)=3=dim(R3)We have U+W=R3So dim(U+W)=3. By the dimension formula: dim(U∩W)=2+1−3=0So U∩W={0}.
We can verify directly: if a(1,0,1)+b(0,1,1)=c(1,1,0)Then a=c, b=c, a+b=0 Giving c=0So only the zero vector is in the intersection.
If you get this wrong, revise: Section 2.5 (Dimension Formula).
Problem 6. Compute det(A) using cofactor expansion where
A=2010010312003021
Solution
Expand along the second column (which has the most zeros):
If you get this wrong, revise: Section 4.5 (Least Squares Solutions).
Problem 13. Find the eigenvalues and a basis for each eigenspace of
A=200120012
Is A diagonalisable?
Solution
det(A−λI)=(2−λ)3So λ=2 with algebraic multiplicity 3.
A−2I=000100010Which has rank 2. The null space is spanned by (1,0,0)T. So the geometric multiplicity is 1.
Since the geometric multiplicity (1) does not equal the algebraic multiplicity (3), A is not Diagonalisable. Its Jordan form is J=A itself (a single 3×3 Jordan block).
If you get this wrong, revise: Section 5.3 (Diagonalisation) and Section 5.5 (Jordan Normal Form).
If you get this wrong, revise: Section 5.4 (Cayley—Hamilton Theorem).
Problem 16. Let T:R3→R2 be defined by T(x,y,z)=(x+y,y+z). Find the matrix of T with respect to the standard bases, and verify the rank-nullity theorem.
If you get this wrong, revise: Section 5.4 (Cayley—Hamilton Theorem).
Problem 20. Let T:P2(R)→P2(R) be defined by T(p)=p′ (the derivative). Find the matrix of T with respect to the basis B={1,x,x2}And determine ker(T) and im(T).
Solution
T(1)=0=0⋅1+0⋅x+0⋅x2So coordinates are 000.
T(x)=1=1⋅1+0⋅x+0⋅x2So coordinates are 100.
T(x2)=2x=0⋅1+2⋅x+0⋅x2So coordinates are 020.
[T]B=000100020
ker(T)={p:p′=0}=span{1}So dim(ker(T))=1.
im(T)={p′:p∈P2}=span{1,x}So dim(im(T))=2.
Verify: dim(ker(T))+dim(im(T))=1+2=3=dim(P2). ■
If you get this wrong, revise: Section 6.2 (Matrix Representation) and Section 6.4 (Rank-Nullity).
Common Pitfalls
Forgetting to check that solutions satisfy the original equation (especially with squaring both sides or dividing by variables).
Assuming matrix multiplication is commutative — AB=BA.
Worked Examples
Example 1: Eigenvalue Decomposition
Problem. Find the eigenvalues and eigenvectors of A=(4213).
Solution. Characteristic polynomial:
det(A−λI)=(4−λ)(3−λ)−2=λ2−7λ+10=(λ−5)(λ−2)=0
Eigenvalues: λ1=5, λ2=2.
For λ1=5: (A−5I)v=0: (−121−2)v=0. Eigenvector: v1=(11).
For λ2=2: (2211)v=0. Eigenvector: v2=(1−2).
■
Example 2: Gram-Schmidt Orthogonalisation
Problem. Apply Gram-Schmidt to v1=(1,1,0), v2=(1,0,1), v3=(0,1,1).