Skip to content

Linear Algebra

Prerequisites

Before studying linear algebra, the reader should be familiar with:

  • Basic set theory: sets, subsets, functions, injectivity, surjectivity.
  • Complex numbers: arithmetic, polar form, Euler”s formula.
  • Mathematical proof techniques: direct proof, proof by contradiction, mathematical induction.
  • Single-variable calculus: limits, continuity, differentiation (for the matrix calculus later in the course).

Students who have completed an introductory proof-writing course (e.g. a Discrete Mathematics module) will have the necessary background. The definitions and proofs below are self-contained; the prerequisites are listed only to indicate the level of mathematical maturity expected.

1. Vectors and Vector Spaces

1.1 Definition of a Vector Space

A vector space over a field FF ( R\mathbb{R} or C\mathbb{C}) is a set VV equipped With two operations:

  1. Vector addition: +:V×VV+ : V \times V \to V
  2. Scalar multiplication: :F×VV\cdot : F \times V \to V

Satisfying the following axioms for all u,v,wV\mathbf{u}, \mathbf{v}, \mathbf{w} \in V and all α,βF\alpha, \beta \in F:

  1. Commutativity: u+v=v+u\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}
  2. Associativity of addition: (u+v)+w=u+(v+w)(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})
  3. Additive identity: There exists 0V\mathbf{0} \in V such that v+0=v\mathbf{v} + \mathbf{0} = \mathbf{v}
  4. Additive inverse: For each v\mathbf{v}There exists v-\mathbf{v} such that v+(v)=0\mathbf{v} + (-\mathbf{v}) = \mathbf{0}
  5. Compatibility of scalar multiplication: α(βv)=(αβ)v\alpha(\beta \mathbf{v}) = (\alpha\beta)\mathbf{v}
  6. Identity element of scalar multiplication: 1v=v1 \cdot \mathbf{v} = \mathbf{v}
  7. Distributivity over vector addition: α(u+v)=αu+αv\alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v}
  8. Distributivity over scalar addition: (α+β)v=αv+βv(\alpha + \beta)\mathbf{v} = \alpha\mathbf{v} + \beta\mathbf{v}

Intuition. The abstract definition captures the algebraic structure shared by diverse objects: Geometric arrows, polynomials, functions, matrices. The axioms encode exactly what we need for Linear combinations to behave reasonably.

1.2 Examples

Example 1. Rn\mathbb{R}^n with component-wise addition and scalar multiplication is a vector space Over R\mathbb{R}.

Example 2. The set Pn\mathcal{P}_n of all polynomials of degree at most nn with real coefficients, With the usual polynomial addition and scalar multiplication, is a vector space over R\mathbb{R}. Its dimension is n+1n + 1With standard basis {1,x,x2,,xn}\{1, x, x^2, \ldots, x^n\}.

Example 3. The set C[a,b]C[a,b] of all continuous real-valued functions on [a,b][a,b]With point-wise Addition and scalar multiplication, is a vector space over R\mathbb{R}. This space is Infinite-dimensional.

Example 4. The set Mm×n(R)\mathcal{M}_{m \times n}(\mathbb{R}) of all m×nm \times n real matrices is a Vector space over R\mathbb{R}.

Example 5 (Function spaces). The set F(R,R)\mathcal{F}(\mathbb{R}, \mathbb{R}) of all functions f:RRf : \mathbb{R} \to \mathbb{R} is a vector space over R\mathbb{R} under point-wise addition (f+g)(x)=f(x)+g(x)(f + g)(x) = f(x) + g(x) and scalar multiplication (αf)(x)=αf(x)(\alpha f)(x) = \alpha \cdot f(x). The spaces Ck(R)C^k(\mathbb{R}) of kk-times continuously differentiable functions and L2[a,b]L^2[a,b] of square-integrable functions are important subspaces of F(R,R)\mathcal{F}(\mathbb{R}, \mathbb{R}).

Example 6 (Sequence spaces). The set 2\ell^2 of all real sequences (a1,a2,a3,)(a_1, a_2, a_3, \ldots) With n=1an2<\sum_{n=1}^{\infty} a_n^2 \lt \infty is a vector space over R\mathbb{R}. This is the Infinite-dimensional analogue of Rn\mathbb{R}^n and is fundamental in functional analysis.

1.3 Subspaces

A subspace WW of a vector space VV is a subset WVW \subseteq V that is itself a vector space Under the same operations.

Theorem 1.1 (Subspace Criterion). A non-empty subset WVW \subseteq V is a subspace if and only If for all u,vW\mathbf{u}, \mathbf{v} \in W and all αF\alpha \in F:

  1. u+vW\mathbf{u} + \mathbf{v} \in W (closed under addition)
  2. αuW\alpha \mathbf{u} \in W (closed under scalar multiplication)

Proof. If WW is a subspace, closure is immediate from the definition. Conversely, if WW is Non-empty and closed under both operations, pick uW\mathbf{u} \in W. Then u=(1)uW-\mathbf{u} = (-1)\mathbf{u} \in W By closure under scalar multiplication, and u+(u)=0W\mathbf{u} + (-\mathbf{u}) = \mathbf{0} \in W by closure Under addition. The remaining axioms are inherited from VV. \blacksquare

Proposition 1.2 (Closure under Linear Combinations). If WW is a subspace of VVThen WW is Closed under all finite linear combinations: for all v1,,vkW\mathbf{v}_1, \ldots, \mathbf{v}_k \in W and All α1,,αkF\alpha_1, \ldots, \alpha_k \in F

α1v1+α2v2++αkvkW\alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \cdots + \alpha_k \mathbf{v}_k \in W

Proof. We proceed by induction on kk. For k=1k = 1, α1v1W\alpha_1 \mathbf{v}_1 \in W by closure under Scalar multiplication. Assume the result holds for k1k - 1 vectors. Then

α1v1++αkvk=(α1v1++αk1vk1)+αkvk\alpha_1 \mathbf{v}_1 + \cdots + \alpha_k \mathbf{v}_k = (\alpha_1 \mathbf{v}_1 + \cdots + \alpha_{k-1} \mathbf{v}_{k-1}) + \alpha_k \mathbf{v}_k

By the inductive hypothesis, α1v1++αk1vk1W\alpha_1 \mathbf{v}_1 + \cdots + \alpha_{k-1} \mathbf{v}_{k-1} \in WAnd αkvkW\alpha_k \mathbf{v}_k \in W by closure under scalar multiplication. Their sum is in WW by Closure under addition. \blacksquare

Example 7. The set of all solutions to the homogeneous equation Ax=0A\mathbf{x} = \mathbf{0} forms a Subspace of Rn\mathbb{R}^nCalled the null space of AA.

1.4 Worked Example: Verifying Subspace Criteria

Problem. Determine whether each of the following subsets of R3\mathbb{R}^3 is a subspace.

(a) W1={(x,y,z)R3:x+2yz=0}W_1 = \{(x, y, z) \in \mathbb{R}^3 : x + 2y - z = 0\}

(b) W2={(x,y,z)R3:x2+y2=1}W_2 = \{(x, y, z) \in \mathbb{R}^3 : x^2 + y^2 = 1\}

(c) W3={(x,y,z)R3:x=0 and y=z}W_3 = \{(x, y, z) \in \mathbb{R}^3 : x = 0 \mathrm{~and~} y = z\}

Solution

(a) Let u=(x1,y1,z1)\mathbf{u} = (x_1, y_1, z_1) and v=(x2,y2,z2)\mathbf{v} = (x_2, y_2, z_2) be in W1W_1So x1+2y1z1=0x_1 + 2y_1 - z_1 = 0 and x2+2y2z2=0x_2 + 2y_2 - z_2 = 0. Then

(x1+x2)+2(y1+y2)(z1+z2)=(x1+2y1z1)+(x2+2y2z2)=0+0=0(x_1 + x_2) + 2(y_1 + y_2) - (z_1 + z_2) = (x_1 + 2y_1 - z_1) + (x_2 + 2y_2 - z_2) = 0 + 0 = 0

So u+vW1\mathbf{u} + \mathbf{v} \in W_1. For αR\alpha \in \mathbb{R}

(αx1)+2(αy1)(αz1)=α(x1+2y1z1)=α0=0(\alpha x_1) + 2(\alpha y_1) - (\alpha z_1) = \alpha(x_1 + 2y_1 - z_1) = \alpha \cdot 0 = 0

So αuW1\alpha \mathbf{u} \in W_1. Since W1W_1 is non-empty (e.g., 0W1\mathbf{0} \in W_1), it is a subspace.

(b) W2W_2 is not a subspace. For instance, (1,0,0)W2(1, 0, 0) \in W_2 since 12+02=11^2 + 0^2 = 1But 2(1,0,0)=(2,0,0)W22 \cdot (1, 0, 0) = (2, 0, 0) \notin W_2 since 22+02=412^2 + 0^2 = 4 \neq 1. So W2W_2 is not closed Under scalar multiplication.

(c) Let u=(0,a,a)\mathbf{u} = (0, a, a) and v=(0,b,b)\mathbf{v} = (0, b, b) be in W3W_3. Then u+v=(0,a+b,a+b)W3\mathbf{u} + \mathbf{v} = (0, a + b, a + b) \in W_3 and αu=(0,αa,αa)W3\alpha \mathbf{u} = (0, \alpha a, \alpha a) \in W_3. Since (0,0,0)W3(0, 0, 0) \in W_3It is a non-empty subspace.

\blacksquare

1.5 Worked Example: Sum and Intersection of Subspaces

Problem. Let U={(x,y,z)R3:z=0}U = \{(x, y, z) \in \mathbb{R}^3 : z = 0\} (the xyxy-plane) and W={(x,y,z)R3:x=0}W = \{(x, y, z) \in \mathbb{R}^3 : x = 0\} (the yzyz-plane). Find U+WU + W and UWU \cap W And verify the dimension formula.

Solution

UU has basis {(1,0,0),(0,1,0)}\{(1, 0, 0), (0, 1, 0)\} and dim(U)=2\dim(U) = 2. WW has basis {(0,1,0),(0,0,1)}\{(0, 1, 0), (0, 0, 1)\} and dim(W)=2\dim(W) = 2.

UW={(x,y,z):z=0 and x=0}={(0,y,0):yR}U \cap W = \{(x, y, z) : z = 0 \mathrm{~and~} x = 0\} = \{(0, y, 0) : y \in \mathbb{R}\} Which has basis {(0,1,0)}\{(0, 1, 0)\} and dimension 1.

U+W=span{(1,0,0),(0,1,0),(0,1,0),(0,0,1)}=span{(1,0,0),(0,1,0),(0,0,1)}=R3U + W = \mathrm{span}\{(1,0,0), (0,1,0), (0,1,0), (0,0,1)\} = \mathrm{span}\{(1,0,0), (0,1,0), (0,0,1)\} = \mathbb{R}^3 So dim(U+W)=3\dim(U + W) = 3.

Verify: dim(U+W)=dim(U)+dim(W)dim(UW)=2+21=3\dim(U + W) = \dim(U) + \dim(W) - \dim(U \cap W) = 2 + 2 - 1 = 3. \checkmark \blacksquare

1.6 Common Pitfalls

  • The empty set is not a vector space. The subspace criterion requires the subset to be non-empty. The trivial subspace {0}\{\mathbf{0}\} is the smallest subspace of any vector space.
  • Non-homogeneous conditions do not define subspaces. The set of solutions to Ax=bA\mathbf{x} = \mathbf{b} with b0\mathbf{b} \neq \mathbf{0} is not a subspace (it is an affine subspace, or coset of the null space).
  • Closure must hold for all scalars. A set that is closed under addition and multiplication by positive scalars is not necessarily a subspace; it must also be closed under multiplication by 1-1.

2. Linear Independence, Span, Basis, and Dimension

2.1 Linear Independence

A set of vectors {v1,v2,,vk}V\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\} \subseteq V is linearly Independent if the equation

α1v1+α2v2++αkvk=0\alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \cdots + \alpha_k \mathbf{v}_k = \mathbf{0}

Implies α1=α2==αk=0\alpha_1 = \alpha_2 = \cdots = \alpha_k = 0. Otherwise the set is linearly dependent.

Proposition 2.1 (Equivalent formulations). The following are equivalent for vectors v1,,vkV\mathbf{v}_1, \ldots, \mathbf{v}_k \in V:

  1. {v1,,vk}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} is linearly independent.
  2. No vj\mathbf{v}_j can be written as a linear combination of the remaining vectors.
  3. If i=1kαivi=i=1kβivi\sum_{i=1}^k \alpha_i \mathbf{v}_i = \sum_{i=1}^k \beta_i \mathbf{v}_iThen αi=βi\alpha_i = \beta_i for all ii.

Proof. (1 \Rightarrow 2): If vj=ijαivi\mathbf{v}_j = \sum_{i \neq j} \alpha_i \mathbf{v}_iThen ijαivivj=0\sum_{i \neq j} \alpha_i \mathbf{v}_i - \mathbf{v}_j = \mathbf{0} gives a non-trivial linear Dependence, contradicting (1).

(2 \Rightarrow 3): If (αiβi)vi=0\sum (\alpha_i - \beta_i)\mathbf{v}_i = \mathbf{0}Then by linear Independence (which follows from (2)), αi=βi\alpha_i = \beta_i for all ii.

(3 \Rightarrow 1): If αivi=0=0vi\sum \alpha_i \mathbf{v}_i = \mathbf{0} = \sum 0 \cdot \mathbf{v}_i Then by (3), αi=0\alpha_i = 0 for all ii. \blacksquare

2.2 Span

The span of a set SVS \subseteq VDenoted span(S)\mathrm{span}(S)Is the set of all finite linear Combinations of elements of SS:

span(S)={i=1kαivi:kN,αiF,viS}\mathrm{span}(S) = \left\{ \sum_{i=1}^k \alpha_i \mathbf{v}_i : k \in \mathbb{N},\, \alpha_i \in F,\, \mathbf{v}_i \in S \right\}

Proposition 2.2. span(S)\mathrm{span}(S) is always a subspace of VV. In fact, span(S)\mathrm{span}(S) is The smallest subspace containing SS: if WW is any subspace with SWS \subseteq WThen span(S)W\mathrm{span}(S) \subseteq W.

Proof. span(S)\mathrm{span}(S) is non-empty since 0=0v\mathbf{0} = 0 \cdot \mathbf{v} for any vS\mathbf{v} \in S. Closure under addition and scalar multiplication follows directly from the Definition of linear combinations. For minimality, any subspace WW containing SS must contain all Finite linear combinations of elements of SS by Proposition 1.2, so span(S)W\mathrm{span}(S) \subseteq W. \blacksquare

2.3 Basis and Dimension

A set BVB \subseteq V is a basis for VV if:

  1. BB is linearly independent, and
  2. span(B)=V\mathrm{span}(B) = V.

Theorem 2.1. Every vector space has a basis. All bases of a finite-dimensional vector space have The same number of elements.

The dimension of VVDenoted dim(V)\dim(V)Is the cardinality of any basis for VV.

2.4 Steinitz Exchange Lemma

Lemma 2.3 (Steinitz Exchange Lemma). Let {u1,,uk}\{\mathbf{u}_1, \ldots, \mathbf{u}_k\} be a linearly Independent set in VVAnd let {w1,,wm}\{\mathbf{w}_1, \ldots, \mathbf{w}_m\} be a spanning set for VV. Then kmk \leq mAnd after relabelling the wj\mathbf{w}_jThe set

{u1,,uk,wk+1,,wm}\{\mathbf{u}_1, \ldots, \mathbf{u}_k, \mathbf{w}_{k+1}, \ldots, \mathbf{w}_m\}

Also spans VV.

Proof. We proceed by induction on kk. For k=0k = 0 there is nothing to prove.

Assume the result holds for k1k - 1. Since {u1,,uk}\{\mathbf{u}_1, \ldots, \mathbf{u}_k\} is linearly Independent, uk0\mathbf{u}_k \neq \mathbf{0} and ukspan{w1,,wm}\mathbf{u}_k \in \mathrm{span}\{\mathbf{w}_1, \ldots, \mathbf{w}_m\} Since the wj\mathbf{w}_j span VV. Therefore uk=j=1mαjwj\mathbf{u}_k = \sum_{j=1}^m \alpha_j \mathbf{w}_j for some αjF\alpha_j \in FAnd not all αj\alpha_j are zero.

After relabelling, assume α10\alpha_1 \neq 0. Then w1=α11(ukj=2mαjwj)\mathbf{w}_1 = \alpha_1^{-1}(\mathbf{u}_k - \sum_{j=2}^m \alpha_j \mathbf{w}_j) So w1span{uk,w2,,wm}\mathbf{w}_1 \in \mathrm{span}\{\mathbf{u}_k, \mathbf{w}_2, \ldots, \mathbf{w}_m\}. It follows that

span{w1,,wm}=span{uk,w2,,wm}=V\mathrm{span}\{\mathbf{w}_1, \ldots, \mathbf{w}_m\} = \mathrm{span}\{\mathbf{u}_k, \mathbf{w}_2, \ldots, \mathbf{w}_m\} = V

Now {u1,,uk1}\{\mathbf{u}_1, \ldots, \mathbf{u}_{k-1}\} is linearly independent and {uk,w2,,wm}\{\mathbf{u}_k, \mathbf{w}_2, \ldots, \mathbf{w}_m\} spans VV. By the inductive hypothesis, k1m1k - 1 \leq m - 1 (so kmk \leq m) and after relabelling, {u1,,uk1,wk,,wm}\{\mathbf{u}_1, \ldots, \mathbf{u}_{k-1}, \mathbf{w}_k, \ldots, \mathbf{w}_m\} spans VV. Since uk\mathbf{u}_k is already in this span, the full set {u1,,uk,wk+1,,wm}\{\mathbf{u}_1, \ldots, \mathbf{u}_k, \mathbf{w}_{k+1}, \ldots, \mathbf{w}_m\} also spans VV. \blacksquare

Theorem 2.4 (Dimension is Well-Defined). If VV is finite-dimensional, then any two bases of VV have the same number of elements.

Proof. Let B1\mathcal{B}_1 and B2\mathcal{B}_2 be two bases with B1=k\lvert\mathcal{B}_1\rvert = k and B2=m\lvert\mathcal{B}_2\rvert = m. Applying the Steinitz exchange lemma with B1\mathcal{B}_1 as the Independent set and B2\mathcal{B}_2 as the spanning set gives kmk \leq m. Swapping roles gives mkm \leq k. Hence k=mk = m. \blacksquare

2.5 Dimension Formula

Theorem 2.5 (Dimension Formula). If UU and WW are subspaces of a finite-dimensional vector Space VVThen

dim(U+W)=dim(U)+dim(W)dim(UW)\dim(U + W) = \dim(U) + \dim(W) - \dim(U \cap W)

2.6 Rank-Nullity Theorem

Theorem 2.6 (Rank-Nullity Theorem). Let AMm×n(F)A \in \mathcal{M}_{m \times n}(F). Then

rank(A)+nullity(A)=n\mathrm{rank}(A) + \mathrm{nullity}(A) = n

Where rank(A)=dim(col(A))\mathrm{rank}(A) = \dim(\mathrm{col}(A)) and nullity(A)=dim(null(A))\mathrm{nullity}(A) = \dim(\mathrm{null}(A)).

Proof. Let {v1,,vk}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} be a basis for null(A)\mathrm{null}(A)Where k=nullity(A)k = \mathrm{nullity}(A). Extend this to a basis {v1,,vk,vk+1,,vn}\{\mathbf{v}_1, \ldots, \mathbf{v}_k, \mathbf{v}_{k+1}, \ldots, \mathbf{v}_n\} for FnF^n.

We claim that {Avk+1,,Avn}\{A\mathbf{v}_{k+1}, \ldots, A\mathbf{v}_n\} is a basis for col(A)\mathrm{col}(A).

Spanning: For any ycol(A)\mathbf{y} \in \mathrm{col}(A)There exists xFn\mathbf{x} \in F^n With y=Ax\mathbf{y} = A\mathbf{x}. Writing x=i=1nαivi\mathbf{x} = \sum_{i=1}^n \alpha_i \mathbf{v}_i

y=A(i=1nαivi)=i=1nαiAvi=i=k+1nαiAvi\mathbf{y} = A\left(\sum_{i=1}^n \alpha_i \mathbf{v}_i\right) = \sum_{i=1}^n \alpha_i A\mathbf{v}_i = \sum_{i=k+1}^n \alpha_i A\mathbf{v}_i

Since Avi=0A\mathbf{v}_i = \mathbf{0} for iki \leq k.

Linear independence: If i=k+1nαiAvi=0\sum_{i=k+1}^n \alpha_i A\mathbf{v}_i = \mathbf{0}Then A(i=k+1nαivi)=0A\left(\sum_{i=k+1}^n \alpha_i \mathbf{v}_i\right) = \mathbf{0}So i=k+1nαivinull(A)\sum_{i=k+1}^n \alpha_i \mathbf{v}_i \in \mathrm{null}(A). Since {v1,,vk}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} Is a basis for the null space, i=k+1nαivi=i=1kβivi\sum_{i=k+1}^n \alpha_i \mathbf{v}_i = \sum_{i=1}^k \beta_i \mathbf{v}_i For some βi\beta_iGiving i=1n(βi)vi+i=k+1nαivi=0\sum_{i=1}^n (-\beta_i)\mathbf{v}_i + \sum_{i=k+1}^n \alpha_i \mathbf{v}_i = \mathbf{0}. By linear independence of the full basis, αi=0\alpha_i = 0 for all ik+1i \geq k + 1.

Therefore rank(A)=nk=nnullity(A)\mathrm{rank}(A) = n - k = n - \mathrm{nullity}(A). \blacksquare

2.7 Worked Examples

Problem. Find a basis for and the dimension of the subspace W=span{(1,2,1,0),(3,1,0,2),(1,3,2,2)}W = \mathrm{span}\{(1, 2, -1, 0), (3, 1, 0, 2), (-1, 3, -2, -2)\} of R4\mathbb{R}^4.

Solution

Form the matrix whose rows are the given vectors and row-reduce:

(121031021322)R23R1(121005321322)\begin{pmatrix} 1 & 2 & -1 & 0 \\ 3 & 1 & 0 & 2 \\ -1 & 3 & -2 & -2 \end{pmatrix} \xrightarrow{R_2 - 3R_1} \begin{pmatrix} 1 & 2 & -1 & 0 \\ 0 & -5 & 3 & 2 \\ -1 & 3 & -2 & -2 \end{pmatrix}

R3+R1(121005320532)R3+R2(121005320000)\xrightarrow{R_3 + R_1} \begin{pmatrix} 1 & 2 & -1 & 0 \\ 0 & -5 & 3 & 2 \\ 0 & 5 & -3 & -2 \end{pmatrix} \xrightarrow{R_3 + R_2} \begin{pmatrix} 1 & 2 & -1 & 0 \\ 0 & -5 & 3 & 2 \\ 0 & 0 & 0 & 0 \end{pmatrix}

The row echelon form has two non-zero rows, so dim(W)=2\dim(W) = 2. A basis is given by the non-zero Rows: {(1,2,1,0),(0,5,3,2)}\{(1, 2, -1, 0), (0, -5, 3, 2)\}. \blacksquare

Problem. Find a basis for the null space of

A=(121124010013)A = \begin{pmatrix} 1 & 2 & 1 & -1 \\ 2 & 4 & 0 & 1 \\ 0 & 0 & 1 & 3 \end{pmatrix}

Solution

Row-reduce AA:

(121124010013)R22R1(121100230013)R3+R2/2(121100230009/2)\begin{pmatrix} 1 & 2 & 1 & -1 \\ 2 & 4 & 0 & 1 \\ 0 & 0 & 1 & 3 \end{pmatrix} \xrightarrow{R_2 - 2R_1} \begin{pmatrix} 1 & 2 & 1 & -1 \\ 0 & 0 & -2 & 3 \\ 0 & 0 & 1 & 3 \end{pmatrix} \xrightarrow{R_3 + R_2/2} \begin{pmatrix} 1 & 2 & 1 & -1 \\ 0 & 0 & -2 & 3 \\ 0 & 0 & 0 & 9/2 \end{pmatrix}

This has pivots in columns 1, 3, and 4. The free variable is x2x_2. Setting x2=tx_2 = t and Back-substituting: x4=0x_4 = 0, x3=0x_3 = 0, x1=2tx_1 = -2t. The null space is {t(2,1,0,0):tR}\{t(-2, 1, 0, 0) : t \in \mathbb{R}\}With basis {(2,1,0,0)}\{(-2, 1, 0, 0)\} and dimension 1. \blacksquare

Problem. Determine whether the vectors v1=(1,2,3)\mathbf{v}_1 = (1, 2, 3), v2=(4,5,6)\mathbf{v}_2 = (4, 5, 6), v3=(7,8,9)\mathbf{v}_3 = (7, 8, 9) form a basis For R3\mathbb{R}^3.

Solution

Form the matrix A=[v1v2v3]A = [\mathbf{v}_1 \mid \mathbf{v}_2 \mid \mathbf{v}_3] and compute Its determinant:

det(A)=1(4548)2(3642)+3(3235)=3+129=0\det(A) = 1(45 - 48) - 2(36 - 42) + 3(32 - 35) = -3 + 12 - 9 = 0

Since det(A)=0\det(A) = 0The columns are linearly dependent, so {v1,v2,v3}\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\} Is not a basis. In fact, v32v2+v1=0\mathbf{v}_3 - 2\mathbf{v}_2 + \mathbf{v}_1 = \mathbf{0}.

\blacksquare

:::tip To check if nn vectors in Rn\mathbb{R}^n form a basis, compute the determinant of the matrix whose Columns are those vectors. If det0\det \neq 0They form a basis; if det=0\det = 0They do not. :::

Problem. Let V=P3(R)V = \mathcal{P}_3(\mathbb{R}) (polynomials of degree at most 3). Find the dimension Of the subspace W={pP3:p(1)=p(1)=0}W = \{p \in \mathcal{P}_3 : p(1) = p(-1) = 0\}.

Solution

Write p(x)=ax3+bx2+cx+dp(x) = ax^3 + bx^2 + cx + d. The conditions are:

p(1)=a+b+c+d=0p(1) = a + b + c + d = 0 and p(1)=a+bc+d=0p(-1) = -a + b - c + d = 0.

Adding: 2b+2d=02b + 2d = 0So d=bd = -b. Subtracting: 2a+2c=02a + 2c = 0So c=ac = -a.

Therefore p(x)=ax3+bx2axb=a(x3x)+b(x21)p(x) = ax^3 + bx^2 - ax - b = a(x^3 - x) + b(x^2 - 1).

A basis for WW is {x3x,x21}\{x^3 - x, x^2 - 1\}And dim(W)=2\dim(W) = 2.

If you get this wrong, revise: Section 2.7 (Worked Examples).

2.8 Common Pitfalls

  • Linear independence of infinitely many vectors. The definition only directly applies to finite subsets. A set SS is linearly independent if every finite subset of SS is linearly independent.
  • Dimension and spanning. A set of nn vectors in Rn\mathbb{R}^n that spans Rn\mathbb{R}^n must be linearly independent (and hence a basis). Similarly, nn linearly independent vectors in Rn\mathbb{R}^n must span Rn\mathbb{R}^n.
  • The empty set spans {0}\{\mathbf{0}\}. The span of the empty set is the trivial subspace, and the empty set is a basis for {0}\{\mathbf{0}\}. The dimension of the zero space is 0.

3. Matrices

3.1 Matrix Operations

An m×nm \times n matrix AA over FF is a rectangular array of mnmn elements from FFArranged in mm rows and nn columns. The set of all such matrices is denoted Mm×n(F)\mathcal{M}_{m \times n}(F).

Addition. For A,BMm×n(F)A, B \in \mathcal{M}_{m \times n}(F), (A+B)ij=Aij+Bij(A + B)_{ij} = A_{ij} + B_{ij}.

Scalar multiplication. For αF\alpha \in F and AMm×n(F)A \in \mathcal{M}_{m \times n}(F) (αA)ij=αAij(\alpha A)_{ij} = \alpha A_{ij}.

Matrix multiplication. For AMm×n(F)A \in \mathcal{M}_{m \times n}(F) and BMn×p(F)B \in \mathcal{M}_{n \times p}(F) The product ABMm×p(F)AB \in \mathcal{M}_{m \times p}(F) is defined by

(AB)ij=k=1nAikBkj(AB)_{ij} = \sum_{k=1}^n A_{ik} B_{kj}

Proposition 3.1. Matrix multiplication is associative but not commutative .

Proof. Associativity: (AB)C(AB)C has (i,j)(i,j)-entry l(kAikBkl)Clj=kAik(lBklClj)=(A(BC))ij\sum_l (\sum_k A_{ik} B_{kl}) C_{lj} = \sum_k A_{ik} (\sum_l B_{kl} C_{lj}) = (A(BC))_{ij} By interchanging the order of summation (both sums are finite). For non-commutativity, A=(0100)A = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} and B=(0010)B = \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix} give AB=(1000)(0001)=BAAB = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \neq \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix} = BA. \blacksquare

3.2 Transpose

The transpose of AMm×n(F)A \in \mathcal{M}_{m \times n}(F)Denoted ATA^TIs the n×mn \times m matrix With (AT)ij=Aji(A^T)_{ij} = A_{ji}.

Properties of transpose:

  1. (A+B)T=AT+BT(A + B)^T = A^T + B^T
  2. (αA)T=αAT(\alpha A)^T = \alpha A^T
  3. (AB)T=BTAT(AB)^T = B^T A^T
  4. (AT)T=A(A^T)^T = A

3.3 Inverse Matrices

A square matrix AMn×n(F)A \in \mathcal{M}_{n \times n}(F) is invertible if there exists a matrix A1Mn×n(F)A^{-1} \in \mathcal{M}_{n \times n}(F) such that

AA1=A1A=InAA^{-1} = A^{-1}A = I_n

Theorem 3.1. The following are equivalent for AMn×n(F)A \in \mathcal{M}_{n \times n}(F):

  1. AA is invertible.
  2. det(A)0\det(A) \neq 0.
  3. The columns of AA are linearly independent.
  4. The rows of AA are linearly independent.
  5. rank(A)=n\mathrm{rank}(A) = n.
  6. The equation Ax=bA\mathbf{x} = \mathbf{b} has a unique solution for every b\mathbf{b}.
  7. The only solution to Ax=0A\mathbf{x} = \mathbf{0} is x=0\mathbf{x} = \mathbf{0}.

3.4 Determinants

The determinant is a function det:Mn×n(F)F\det : \mathcal{M}_{n \times n}(F) \to F defined recursively by Laplace expansion along the first row:

det(A)=j=1n(1)1+ja1jM1j\det(A) = \sum_{j=1}^n (-1)^{1+j} a_{1j} M_{1j}

Where M1jM_{1j} is the (1,j)(1,j)-minor (the determinant of the (n1)×(n1)(n-1) \times (n-1) matrix obtained by Deleting row 1 and column jj).

The (i,j)(i,j)-cofactor is Cij=(1)i+jMijC_{ij} = (-1)^{i+j} M_{ij}So det(A)=j=1naijCij\det(A) = \sum_{j=1}^n a_{ij} C_{ij} for any fixed row ii.

3.5 Properties of Determinants

Proposition 3.2 (Effect of Row Operations). Let AMn×n(F)A \in \mathcal{M}_{n \times n}(F).

  1. Swapping two rows of AA multiplies the determinant by 1-1.
  2. Multiplying a row of AA by αF\alpha \in F multiplies the determinant by α\alpha.
  3. Adding a multiple of one row to another leaves the determinant unchanged.

Proof. (1) This follows from the antisymmetry of the Leibniz formula det(A)=σSnsgn(σ)i=1nai,σ(i)\det(A) = \sum_{\sigma \in S_n} \mathrm{sgn}(\sigma) \prod_{i=1}^n a_{i,\sigma(i)}. Swapping two rows Changes the sign of every permutation, hence the sign of the sum.

(2) Multiplying row ii by α\alpha multiplies every term in the Leibniz expansion by α\alpha Hence det\det is multiplied by α\alpha.

(3) Adding α\alpha times row jj to row ii (iji \neq j): by multilinearity in row ii

det(new A)=det(A)+αdet(matrix with rows i and j equal)\det(\mathrm{new}~A) = \det(A) + \alpha \cdot \det(\mathrm{matrix~with~rows~i~and~j~equal})

A matrix with two equal rows has determinant 0 (by antisymmetry: swapping them leaves the matrix Unchanged but multiplies det\det by 1-1So det=det\det = -\detHence det=0\det = 0). Therefore det(new A)=det(A)\det(\mathrm{new}~A) = \det(A). \blacksquare

Theorem 3.3 (Multiplicativity). For A,BMn×n(F)A, B \in \mathcal{M}_{n \times n}(F)

det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)

Proof (via elementary matrices). Every matrix BB can be written as a product of elementary matrices Times an upper triangular matrix: B=E1E2EkUB = E_1 E_2 \cdots E_k U. For an elementary matrix EE:

  • If EE swaps rows, det(E)=1\det(E) = -1 and det(AE)=det(A)=det(A)det(E)\det(AE) = -\det(A) = \det(A)\det(E).
  • If EE multiplies a row by α\alpha, det(E)=α\det(E) = \alpha and det(AE)=αdet(A)=det(A)det(E)\det(AE) = \alpha\det(A) = \det(A)\det(E).
  • If EE adds a multiple of one row to another, det(E)=1\det(E) = 1 and det(AE)=det(A)=det(A)det(E)\det(AE) = \det(A) = \det(A)\det(E).

Thus det(AE)=det(A)det(E)\det(AE) = \det(A)\det(E) for every elementary matrix. By induction,

det(AB)=det(AE1EkU)=det(A)det(E1)det(Ek)det(U)=det(A)det(B)\det(AB) = \det(A \cdot E_1 \cdots E_k U) = \det(A) \cdot \det(E_1) \cdots \det(E_k) \cdot \det(U) = \det(A) \cdot \det(B)

Since det(B)=det(E1)det(Ek)det(U)\det(B) = \det(E_1)\cdots\det(E_k)\det(U). \blacksquare

Corollary 3.4. det(AT)=det(A)\det(A^T) = \det(A)And for invertible AA, det(A1)=1/det(A)\det(A^{-1}) = 1/\det(A).

Proof. AA1=IAA^{-1} = ISo det(A)det(A1)=det(I)=1\det(A)\det(A^{-1}) = \det(I) = 1Giving det(A1)=1/det(A)\det(A^{-1}) = 1/\det(A). For the transpose, use the Leibniz formula or observe that row Operations and column operations have the same effects on the determinant. \blacksquare

3.6 Adjugate and Inverse Formula

Definition. The adjugate (or adjoint) of AMn×n(F)A \in \mathcal{M}_{n \times n}(F) is

adj(A)=(Cji)i,j=1n\mathrm{adj}(A) = (C_{ji})_{i,j=1}^n

Where CijC_{ij} is the (i,j)(i,j)-cofactor of AA. That is, adj(A)\mathrm{adj}(A) is the transpose of the Cofactor matrix.

Theorem 3.5. For any AMn×n(F)A \in \mathcal{M}_{n \times n}(F)

Aadj(A)=adj(A)A=det(A)InA \cdot \mathrm{adj}(A) = \mathrm{adj}(A) \cdot A = \det(A) \cdot I_n

In particular, if det(A)0\det(A) \neq 0Then A1=1det(A)adj(A)A^{-1} = \frac{1}{\det(A)} \mathrm{adj}(A).

Proof. The (i,j)(i,j)-entry of Aadj(A)A \cdot \mathrm{adj}(A) is k=1naikCjk\sum_{k=1}^n a_{ik} C_{jk}. When i=ji = jThis is k=1naikCik=det(A)\sum_{k=1}^n a_{ik} C_{ik} = \det(A) (cofactor expansion along row ii). When iji \neq jThis is the cofactor expansion of a matrix obtained from AA by replacing row jj With row iiWhich has two equal rows and hence determinant 0. \blacksquare

3.7 Worked Examples

Problem. Compute det(A)\det(A) where

A=(123014560)A = \begin{pmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{pmatrix}

Solution

Expanding along the first column:

det(A)=1det(1460)0+5det(2314)\det(A) = 1 \cdot \det\begin{pmatrix} 1 & 4 \\ 6 & 0 \end{pmatrix} - 0 + 5 \cdot \det\begin{pmatrix} 2 & 3 \\ 1 & 4 \end{pmatrix}

=1(024)+5(83)=24+25=1= 1 \cdot (0 - 24) + 5 \cdot (8 - 3) = -24 + 25 = 1

\blacksquare

Problem. Compute det(A)\det(A) by row reduction where

A=(2131425363851111)A = \begin{pmatrix} 2 & 1 & 3 & 1 \\ 4 & 2 & 5 & 3 \\ 6 & 3 & 8 & 5 \\ 1 & 1 & 1 & 1 \end{pmatrix}

Solution

Apply row operations and track their effect on the determinant:

(2131425363851111)R22R1,R33R1(21310011001201/21/21/2)\begin{pmatrix} 2 & 1 & 3 & 1 \\ 4 & 2 & 5 & 3 \\ 6 & 3 & 8 & 5 \\ 1 & 1 & 1 & 1 \end{pmatrix} \xrightarrow{R_2 - 2R_1, R_3 - 3R_1} \begin{pmatrix} 2 & 1 & 3 & 1 \\ 0 & 0 & -1 & 1 \\ 0 & 0 & -1 & 2 \\ 0 & 1/2 & -1/2 & 1/2 \end{pmatrix}

The determinant is unchanged (only type 3 operations). Now swap R2R_2 and R4R_4 (multiplies det\det by 1-1):

R2R4(213101/21/21/200120011)\xrightarrow{R_2 \leftrightarrow R_4} \begin{pmatrix} 2 & 1 & 3 & 1 \\ 0 & 1/2 & -1/2 & 1/2 \\ 0 & 0 & -1 & 2 \\ 0 & 0 & -1 & 1 \end{pmatrix}

Now R4R4R3R_4 \to R_4 - R_3 (determinant unchanged):

(213101/21/21/200120001)\begin{pmatrix} 2 & 1 & 3 & 1 \\ 0 & 1/2 & -1/2 & 1/2 \\ 0 & 0 & -1 & 2 \\ 0 & 0 & 0 & -1 \end{pmatrix}

The determinant is the product of diagonal entries, times 1-1 for the row swap:

det(A)=(1)212(1)(1)=1\det(A) = (-1) \cdot 2 \cdot \frac{1}{2} \cdot (-1) \cdot (-1) = -1

\blacksquare

Problem. Find A1A^{-1} using the adjugate formula, where

A=(1234)A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}

Solution

det(A)=1423=20\det(A) = 1 \cdot 4 - 2 \cdot 3 = -2 \neq 0So AA is invertible.

Cofactors: C11=4C_{11} = 4, C12=3C_{12} = -3, C21=2C_{21} = -2, C22=1C_{22} = 1.

adj(A)=(4231)\mathrm{adj}(A) = \begin{pmatrix} 4 & -2 \\ -3 & 1 \end{pmatrix}

A1=12(4231)=(213/21/2)A^{-1} = \frac{1}{-2}\begin{pmatrix} 4 & -2 \\ -3 & 1 \end{pmatrix} = \begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix}

Verify: AA1=(1234)(213/21/2)=(2+3116+632)=I2AA^{-1} = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix} = \begin{pmatrix} -2 + 3 & 1 - 1 \\ -6 + 6 & 3 - 2 \end{pmatrix} = I_2. \blacksquare

:::caution Common Pitfall The determinant is only defined for square matrices. There is no meaningful determinant for an m×nm \times n matrix with mnm \neq n. Do not confuse det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B) with a Non-existent formula for non-square matrices. :::

3.8 Worked Example: Determinant via Row Reduction (Efficient Method)

Problem. Compute det(A)\det(A) where

A=(1111123413610141020)A = \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 1 & 3 & 6 & 10 \\ 1 & 4 & 10 & 20 \end{pmatrix}

Solution

This matrix has Pascal-like entries. We use row operations:

(1111123413610141020)RiRi1(11110123013601410)RiRi1(1111012300130014)\begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 1 & 3 & 6 & 10 \\ 1 & 4 & 10 & 20 \end{pmatrix} \xrightarrow{R_i - R_{i-1}} \begin{pmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 2 & 3 \\ 0 & 1 & 3 & 6 \\ 0 & 1 & 4 & 10 \end{pmatrix} \xrightarrow{R_i - R_{i-1}} \begin{pmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 2 & 3 \\ 0 & 0 & 1 & 3 \\ 0 & 0 & 1 & 4 \end{pmatrix}

R4R3(1111012300130001)\xrightarrow{R_4 - R_3} \begin{pmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 2 & 3 \\ 0 & 0 & 1 & 3 \\ 0 & 0 & 0 & 1 \end{pmatrix}

All operations were type 3 (adding a multiple of one row to another), so the determinant is Unchanged. The upper triangular matrix has diagonal entries 1,1,1,11, 1, 1, 1So det(A)=1\det(A) = 1. \blacksquare

Proposition 3.6 (Determinant of a Triangular Matrix). If AA is upper or lower triangular, Then det(A)=i=1naii\det(A) = \prod_{i=1}^n a_{ii}.

Proof. By repeated cofactor expansion along the first column (for upper triangular), or induction. At each step, all terms involving off-diagonal entries vanish due to the zero structure, leaving Only the product of diagonal entries. \blacksquare

3.9 Common Pitfalls

  • det(A+B)det(A)+det(B)\det(A + B) \neq \det(A) + \det(B) . For example, with A=B=I2A = B = I_2 det(A+B)=det(2I2)=4\det(A + B) = \det(2I_2) = 4But det(A)+det(B)=2\det(A) + \det(B) = 2.
  • The adjugate formula is theoretically important but computationally inefficient. For large matrices, use Gaussian elimination or LU decomposition to compute inverses.
  • A matrix with det(A)=0\det(A) = 0 has no inverse. Do not attempt to divide by zero.

4. Systems of Linear Equations

4.1 Gaussian Elimination

A system of mm linear equations in nn unknowns can be written as Ax=bA\mathbf{x} = \mathbf{b}Where AMm×n(R)A \in \mathcal{M}_{m \times n}(\mathbb{R}), xRn\mathbf{x} \in \mathbb{R}^nAnd bRm\mathbf{b} \in \mathbb{R}^m.

Gaussian elimination transforms the augmented matrix [Ab][A \mid \mathbf{b}] into row echelon Form (REF) using elementary row operations:

  1. Swap two rows (RiRjR_i \leftrightarrow R_j).
  2. Multiply a row by a non-zero scalar (RiαRiR_i \to \alpha R_i).
  3. Add a multiple of one row to another (RiRi+αRjR_i \to R_i + \alpha R_j).

A matrix is in REF if:

  • All zero rows are at the bottom.
  • The leading entry (pivot) of each non-zero row is strictly to the right of the pivot above.
  • All entries below a pivot are zero.

Reduced row echelon form (RREF) additionally requires:

  • Each pivot is 1.
  • Each pivot is the only non-zero entry in its column.

Theorem 4.1. Every matrix has a unique RREF. Two matrices are row-equivalent if and only if they Have the same RREF.

4.2 Existence and Uniqueness

Theorem 4.2 (Rouché—Capelli). The system Ax=bA\mathbf{x} = \mathbf{b} is consistent (has at least one Solution) if and only if

rank(A)=rank([Ab])\mathrm{rank}(A) = \mathrm{rank}([A \mid \mathbf{b}])

If consistent, the solution set has dim(null(A))\dim(\mathrm{null}(A)) free parameters, where dim(null(A))=nrank(A)\dim(\mathrm{null}(A)) = n - \mathrm{rank}(A).

Proof. Let the RREF of [Ab][A \mid \mathbf{b}] have r=rank(A)r = \mathrm{rank}(A) pivots in the coefficient Columns. The system is inconsistent if and only if the last non-zero row is [001][0 \cdots 0 \mid 1] Which occurs precisely when the augmented column contains a pivot, i.e., when rank([Ab])>r\mathrm{rank}([A \mid \mathbf{b}]) \gt r.

If consistent, the rr pivot variables are determined by the nrn - r free variables, yielding nrank(A)n - \mathrm{rank}(A) degrees of freedom. \blacksquare

4.3 LU Decomposition

An LU decomposition of AMn×n(R)A \in \mathcal{M}_{n \times n}(\mathbb{R}) writes A=LUA = LU where LL is Lower triangular with 1s on the diagonal, and UU is upper triangular.

Theorem 4.3. If Gaussian elimination can be performed on AA without row exchanges, then AA Admits an LU decomposition.

Algorithm. Store the multipliers mijm_{ij} (used to eliminate entry aija_{ij}) in the lower Triangular portion. The resulting upper triangular matrix is UUAnd the multipliers form LL.

Worked Example. Find the LU decomposition of

A=(211433879)A = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix}

Solution

Step 1: Eliminate below a11a_{11}. m21=4/2=2m_{21} = 4/2 = 2, m31=8/2=4m_{31} = 8/2 = 4.

(211011035)\begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 3 & 5 \end{pmatrix}

Step 2: Eliminate below a22a_{22}. m32=3/1=3m_{32} = 3/1 = 3.

U=(211011002)U = \begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{pmatrix}

L=(100210431)L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{pmatrix}

Verify: LU=(211433879)=ALU = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix} = A. \blacksquare

To solve Ax=bA\mathbf{x} = \mathbf{b}First solve Ly=bL\mathbf{y} = \mathbf{b} (forward substitution), Then Ux=yU\mathbf{x} = \mathbf{y} (back substitution).

4.4 Gaussian Elimination with Partial Pivoting

When a pivot is zero (or very small), we swap rows to bring the largest available entry in the Current column into the pivot position. This is partial pivoting, and it improves numerical Stability.

Problem. Solve the system using Gaussian elimination with partial pivoting:

x1+2x2+x3=53x1+x2x3=22x1+3x2+4x3=11\begin{aligned} x_1 + 2x_2 + x_3 &= 5 \\ 3x_1 + x_2 - x_3 &= 2 \\ 2x_1 + 3x_2 + 4x_3 &= 11 \end{aligned}

Solution

Augmented matrix:

[Ab]=(1215311223411)[A \mid \mathbf{b}] = \begin{pmatrix} 1 & 2 & 1 & 5 \\ 3 & 1 & -1 & 2 \\ 2 & 3 & 4 & 11 \end{pmatrix}

Step 1. Column 1: largest entry is 3 in row 2. Swap R1R2R_1 \leftrightarrow R_2:

(3112121523411)\begin{pmatrix} 3 & 1 & -1 & 2 \\ 1 & 2 & 1 & 5 \\ 2 & 3 & 4 & 11 \end{pmatrix}

R2R213R1R_2 \to R_2 - \frac{1}{3}R_1, R3R323R1R_3 \to R_3 - \frac{2}{3}R_1:

(311205/34/313/307/314/329/3)\begin{pmatrix} 3 & 1 & -1 & 2 \\ 0 & 5/3 & 4/3 & 13/3 \\ 0 & 7/3 & 14/3 & 29/3 \end{pmatrix}

Step 2. Column 2: largest entry below pivot is 7/37/3 in row 3. Swap R2R3R_2 \leftrightarrow R_3:

(311207/314/329/305/34/313/3)\begin{pmatrix} 3 & 1 & -1 & 2 \\ 0 & 7/3 & 14/3 & 29/3 \\ 0 & 5/3 & 4/3 & 13/3 \end{pmatrix}

R3R357R2R_3 \to R_3 - \frac{5}{7}R_2:

(311207/314/329/3002/76/7)\begin{pmatrix} 3 & 1 & -1 & 2 \\ 0 & 7/3 & 14/3 & 29/3 \\ 0 & 0 & -2/7 & -6/7 \end{pmatrix}

Back substitution. From row 3: 27x3=67-\frac{2}{7}x_3 = -\frac{6}{7}So x3=3x_3 = 3.

From row 2: 73x2+143(3)=293\frac{7}{3}x_2 + \frac{14}{3}(3) = \frac{29}{3}So 73x2=29314=133\frac{7}{3}x_2 = \frac{29}{3} - 14 = -\frac{13}{3} Giving x2=137x_2 = -\frac{13}{7}.

From row 1: 3x1+(137)3=23x_1 + (-\frac{13}{7}) - 3 = 2So 3x1=2+3+137=4873x_1 = 2 + 3 + \frac{13}{7} = \frac{48}{7} Giving x1=167x_1 = \frac{16}{7}.

Solution: x1=167x_1 = \frac{16}{7}, x2=137x_2 = -\frac{13}{7}, x3=3x_3 = 3. \blacksquare

4.5 Least Squares Solutions

When Ax=bA\mathbf{x} = \mathbf{b} is overdetermined (more equations than unknowns) and inconsistent, We seek x\mathbf{x} that minimises Axb2\lVert A\mathbf{x} - \mathbf{b} \rVert^2.

Theorem 4.4 (Normal Equations). The least squares solution x^\hat{\mathbf{x}} satisfies

ATAx^=ATbA^T A \hat{\mathbf{x}} = A^T \mathbf{b}

If AA has full column rank, then ATAA^T A is invertible and x^=(ATA)1ATb\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}.

Proof. The error vector e=Axb\mathbf{e} = A\mathbf{x} - \mathbf{b} is minimised when ecol(A)\mathbf{e} \perp \mathrm{col}(A) I.e., when ATe=0A^T \mathbf{e} = \mathbf{0}. This gives AT(Axb)=0A^T(A\mathbf{x} - \mathbf{b}) = \mathbf{0} Or ATAx=ATbA^T A \mathbf{x} = A^T \mathbf{b}. If AA has full column rank, then ker(A)={0}\ker(A) = \{\mathbf{0}\} So ker(ATA)=ker(A)={0}\ker(A^T A) = \ker(A) = \{\mathbf{0}\}Meaning ATAA^T A is invertible. \blacksquare

Problem. Find the least squares line y=ax+by = ax + b fitting the data points (1,1)(1, 1), (2,1)(2, 1), (3,3)(3, 3).

Solution

The system is (112131)(ab)=(113)\begin{pmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix}\begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \\ 3 \end{pmatrix} I.e., Ax=bA\mathbf{x} = \mathbf{b} with A=(112131)A = \begin{pmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix} x=(a,b)T\mathbf{x} = (a, b)^T, b=(1,1,3)T\mathbf{b} = (1, 1, 3)^T.

Compute ATA=(123111)(112131)=(14663)A^T A = \begin{pmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{pmatrix}\begin{pmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{pmatrix} = \begin{pmatrix} 14 & 6 \\ 6 & 3 \end{pmatrix}.

ATb=(123111)(113)=(125)A^T \mathbf{b} = \begin{pmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{pmatrix}\begin{pmatrix} 1 \\ 1 \\ 3 \end{pmatrix} = \begin{pmatrix} 12 \\ 5 \end{pmatrix}.

Solve (14663)(ab)=(125)\begin{pmatrix} 14 & 6 \\ 6 & 3 \end{pmatrix}\begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} 12 \\ 5 \end{pmatrix}:

det(ATA)=4236=6\det(A^T A) = 42 - 36 = 6.

(ATA)1=16(36614)(A^T A)^{-1} = \frac{1}{6}\begin{pmatrix} 3 & -6 \\ -6 & 14 \end{pmatrix}.

x^=16(36614)(125)=16(363072+70)=16(62)=(11/3)\hat{\mathbf{x}} = \frac{1}{6}\begin{pmatrix} 3 & -6 \\ -6 & 14 \end{pmatrix}\begin{pmatrix} 12 \\ 5 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 36 - 30 \\ -72 + 70 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 6 \\ -2 \end{pmatrix} = \begin{pmatrix} 1 \\ -1/3 \end{pmatrix}.

The least squares line is y=x1/3y = x - 1/3. \blacksquare

4.6 Worked Example: System with Infinitely Many Solutions

Problem. Solve the system:

x1+2x2x3+3x4=12x1+4x2x3+5x4=2x1+2x2+x3+x4=0\begin{aligned} x_1 + 2x_2 - x_3 + 3x_4 &= 1 \\ 2x_1 + 4x_2 - x_3 + 5x_4 &= 2 \\ x_1 + 2x_2 + x_3 + x_4 &= 0 \end{aligned}

Solution

(121312415212110)R22R1,R3R1(121310011000221)\begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 2 & 4 & -1 & 5 & 2 \\ 1 & 2 & 1 & 1 & 0 \end{pmatrix} \xrightarrow{R_2 - 2R_1, R_3 - R_1} \begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 2 & -2 & -1 \end{pmatrix}

R32R2(121310011000001)\xrightarrow{R_3 - 2R_2} \begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 0 & -1 \end{pmatrix}

The last row reads 0=10 = -1So the system is inconsistent (no solution).

Revised problem: Change the last equation to x1+2x2+x3+x4=2x_1 + 2x_2 + x_3 + x_4 = 2:

(121312415212112)R22R1,R3R1(121310011000221)\begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 2 & 4 & -1 & 5 & 2 \\ 1 & 2 & 1 & 1 & 2 \end{pmatrix} \xrightarrow{R_2 - 2R_1, R_3 - R_1} \begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 2 & -2 & 1 \end{pmatrix}

R32R2(121310011000001)\xrightarrow{R_3 - 2R_2} \begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{pmatrix}

Still inconsistent! The RREF reveals 0=10 = 1 in the last row.

Revised again: Change the last equation to x1+2x2+x3+x4=1x_1 + 2x_2 + x_3 + x_4 = 1:

(121312415212111)R22R1,R3R1(121310011000220)\begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 2 & 4 & -1 & 5 & 2 \\ 1 & 2 & 1 & 1 & 1 \end{pmatrix} \xrightarrow{R_2 - 2R_1, R_3 - R_1} \begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 2 & -2 & 0 \end{pmatrix}

R32R2(121310011000000)\xrightarrow{R_3 - 2R_2} \begin{pmatrix} 1 & 2 & -1 & 3 & 1 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{pmatrix}

Now the system is consistent. Pivots in columns 1 and 3; free variables are x2x_2 and x4x_4. From row 2: x3=x4x_3 = x_4. From row 1: x1=12x2+x33x4=12x22x4x_1 = 1 - 2x_2 + x_3 - 3x_4 = 1 - 2x_2 - 2x_4.

Solution set: {(12s2t,s,t,t):s,tR}\{(1 - 2s - 2t, s, t, t) : s, t \in \mathbb{R}\}.

In parametric form: (x1x2x3x4)=(1000)+s(2100)+t(2011)\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \end{pmatrix} + s\begin{pmatrix} -2 \\ 1 \\ 0 \\ 0 \end{pmatrix} + t\begin{pmatrix} -2 \\ 0 \\ 1 \\ 1 \end{pmatrix}.

The solution space is a 2-dimensional affine subspace (a plane) in R4\mathbb{R}^4. \blacksquare

4.7 Common Pitfalls

  • Partial pivoting is not optional for numerical work. Without it, Gaussian elimination can produce catastrophically wrong results due to rounding errors.
  • The normal equations can be ill-conditioned. For better numerical stability, use QR decomposition to solve least squares problems instead of forming ATAA^T A.
  • Not every system has a solution. Always check consistency via the Rouché—Capelli theorem before attempting to solve.

5. Eigenvalues and Eigenvectors

5.1 Definitions

Let AMn×n(F)A \in \mathcal{M}_{n \times n}(F). A scalar λF\lambda \in F is an eigenvalue of AA if there Exists a non-zero vector vFn\mathbf{v} \in F^n such that

Av=λvA\mathbf{v} = \lambda \mathbf{v}

The vector v\mathbf{v} is called an eigenvector corresponding to λ\lambda.

The eigenspace corresponding to λ\lambda is Eλ=ker(AλI)E_\lambda = \ker(A - \lambda I). Its dimension is The geometric multiplicity of λ\lambda.

5.2 Characteristic Polynomial

Theorem 5.1. λ\lambda is an eigenvalue of AA if and only if det(AλI)=0\det(A - \lambda I) = 0.

The polynomial p(λ)=det(AλI)p(\lambda) = \det(A - \lambda I) is called the characteristic polynomial of AA. Its roots (in the algebraic closure of FF) are the eigenvalues of AA.

If p(λ)=(lambdalambda1)m1(lambdalambda2)m2(lambdalambdak)mkp(\lambda) = (\\lambda - \\lambda_1)^{m_1}(\\lambda - \\lambda_2)^{m_2}\cdots(\\lambda - \\lambda_k)^{m_k} With lambda1,,lambdak\\lambda_1, \ldots, \\lambda_k distinct, then mim_i is the algebraic multiplicity of lambdai\\lambda_i.

Proposition 5.2. For each eigenvalue λ\lambda, 1dim(Eλ)mλ1 \leq \mathrm{dim}(E_\lambda) \leq m_\lambda (geometric multiplicity does not exceed algebraic multiplicity).

5.3 Diagonalisation

Definition. AA is diagonalisable if there exists an invertible matrix PP and a diagonal Matrix DD such that

A=PDP1A = PDP^{-1}

Theorem 5.3. AMn×n(F)A \in \mathcal{M}_{n \times n}(F) is diagonalisable (over FF) if and only if AA has nn linearly independent Eigenvectors (over FF). Equivalently, the sum of the geometric multiplicities equals nn.

Corollary 5.4. If AA has nn distinct eigenvalues, then AA is diagonalisable.

Proof. Eigenvectors corresponding to distinct eigenvalues are linearly independent. With nn distinct Eigenvalues, we obtain nn linearly independent eigenvectors, which form a basis of FnF^n. \blacksquare

5.4 Cayley—Hamilton Theorem

Theorem 5.5 (Cayley—Hamilton). Every square matrix satisfies its own characteristic polynomial: If p(λ)=det(λIA)p(\lambda) = \det(\lambda I - A)Then p(A)=0p(A) = 0 (the zero matrix).

Proof sketch. Let p(λ)=λn+cn1λn1++c1λ+c0p(\lambda) = \lambda^n + c_{n-1}\lambda^{n-1} + \cdots + c_1\lambda + c_0. By the adjugate formula (Theorem 3.5), (λIA)adj(λIA)=p(λ)I(\lambda I - A) \cdot \mathrm{adj}(\lambda I - A) = p(\lambda) \cdot I. Each entry of adj(λIA)\mathrm{adj}(\lambda I - A) is a polynomial in λ\lambda of degree at most n1n - 1 So we can write adj(λIA)=Bn1λn1++B1λ+B0\mathrm{adj}(\lambda I - A) = B_{n-1}\lambda^{n-1} + \cdots + B_1\lambda + B_0 for Matrices BiB_i. Multiplying out and comparing coefficients of λk\lambda^k:

Bn1=I,Bn2ABn1=cn1I,,AB0=c0IB_{n-1} = I, \quad B_{n-2} - AB_{n-1} = c_{n-1}I, \quad \ldots, \quad -AB_0 = c_0 I

Multiplying the kk-th equation on the left by AkA^k and summing over kk:

AnBn1+An1(Bn2ABn1)++A0(AB0)=An+cn1An1++c0I=p(A)A^n B_{n-1} + A^{n-1}(B_{n-2} - AB_{n-1}) + \cdots + A^0(-AB_0) = A^n + c_{n-1}A^{n-1} + \cdots + c_0 I = p(A)

But the left side telescopes to zero, so p(A)=0p(A) = 0. \blacksquare

5.5 Jordan Normal Form

When a matrix is not diagonalisable, the Jordan normal form provides the next-best canonical Representation.

Theorem 5.6. Let AMn×n(C)A \in \mathcal{M}_{n \times n}(\mathbb{C}). Then AA is similar to a block-diagonal Matrix

J=(J1Jk)J = \begin{pmatrix} J_1 & & \\ & \ddots & \\ & & J_k \end{pmatrix}

Where each Jordan block has the form

Ji=(λi1λi1λi)J_i = \begin{pmatrix} \lambda_i & 1 & & \\ & \lambda_i & \ddots & \\ & & \ddots & 1 \\ & & & \lambda_i \end{pmatrix}

The Jordan form is unique up to permutation of the blocks.

Intuition. Each Jordan block corresponds to one eigenvalue. The size of the block equals the Number of steps in the chain v,(AλI)v,(AλI)2v,\mathbf{v}, (A - \lambda I)\mathbf{v}, (A - \lambda I)^2\mathbf{v}, \ldots Of generalised eigenvectors. A diagonalisable matrix has all Jordan blocks of size 1×11 \times 1.

Problem. Find the Jordan normal form of

A=(3103)A = \begin{pmatrix} 3 & 1 \\ 0 & 3 \end{pmatrix}

Solution

The characteristic polynomial is det(AλI)=(3λ)2\det(A - \lambda I) = (3 - \lambda)^2So λ=3\lambda = 3 is the Only eigenvalue with algebraic multiplicity 2.

A3I=(0100)A - 3I = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}Which has rank 1, so the geometric Multiplicity is dim(ker(A3I))=21=1\dim(\ker(A - 3I)) = 2 - 1 = 1.

Since the geometric multiplicity (1) is less than the algebraic multiplicity (2), AA is not Diagonalisable. The Jordan form has one block of size 2:

J=(3103)J = \begin{pmatrix} 3 & 1 \\ 0 & 3 \end{pmatrix}

(In this case, AA is already in Jordan form.) \blacksquare

5.6 Spectral Theorem for Real Symmetric Matrices

Theorem 5.7 (Spectral Theorem). If AMn×n(R)A \in \mathcal{M}_{n \times n}(\mathbb{R}) is symmetric (A=ATA = A^T), then:

  1. All eigenvalues of AA are real.
  2. AA has nn linearly independent orthonormal eigenvectors.
  3. AA is orthogonally diagonalisable: A=QDQTA = QDQ^T where QQ is orthogonal (QTQ=IQ^TQ = I).

Proof. We prove (1) and then (2) and (3) by induction on nn.

(1) Let λC\lambda \in \mathbb{C} be an eigenvalue with eigenvector vCn\mathbf{v} \in \mathbb{C}^n v0\mathbf{v} \neq \mathbf{0}. Then

vTAv=vT(λv)=λvTv\overline{\mathbf{v}}^T A \mathbf{v} = \overline{\mathbf{v}}^T (\lambda \mathbf{v}) = \lambda \overline{\mathbf{v}}^T \mathbf{v}

Since A=ATA = A^T and AA has real entries, A=A=AT\overline{A} = A = A^TSo

vTAv=(Av)Tv=(Av)Tv=(λv)Tv=λvTv\overline{\mathbf{v}}^T A \mathbf{v} = (A\overline{\mathbf{v}})^T \mathbf{v} = (\overline{A\mathbf{v}})^T \mathbf{v} = (\overline{\lambda}\,\overline{\mathbf{v}})^T \mathbf{v} = \overline{\lambda}\,\overline{\mathbf{v}}^T \mathbf{v}

Therefore (λλ)vTv=0(\lambda - \overline{\lambda})\overline{\mathbf{v}}^T\mathbf{v} = 0. Since vTv>0\overline{\mathbf{v}}^T\mathbf{v} \gt 0 We have λ=λ\lambda = \overline{\lambda}So λR\lambda \in \mathbb{R}.

(2) and (3) By induction. For n=1n = 1 the result is trivial. Assume it holds for (n1)×(n1)(n-1) \times (n-1) Symmetric matrices. Since all eigenvalues are real, AA has a real eigenvalue λ1\lambda_1 with real Eigenvector v1\mathbf{v}_1. Normalise: q1=v1/v1\mathbf{q}_1 = \mathbf{v}_1 / \lVert \mathbf{v}_1 \rVert.

Let W=q1={wRn:q1Tw=0}W = \mathbf{q}_1^\perp = \{\mathbf{w} \in \mathbb{R}^n : \mathbf{q}_1^T \mathbf{w} = 0\}. For any wW\mathbf{w} \in W:

q1T(Aw)=(Aq1)Tw=(λ1q1)Tw=λ10=0\mathbf{q}_1^T (A\mathbf{w}) = (A\mathbf{q}_1)^T \mathbf{w} = (\lambda_1 \mathbf{q}_1)^T \mathbf{w} = \lambda_1 \cdot 0 = 0

So AwWA\mathbf{w} \in W. Therefore AA restricts to a symmetric linear map AW:WWA|_W : W \to W on an (n1)(n-1)-dimensional space. By the inductive hypothesis, WW has an orthonormal basis {q2,,qn}\{\mathbf{q}_2, \ldots, \mathbf{q}_n\} of eigenvectors of AWA|_W.

Then {q1,q2,,qn}\{\mathbf{q}_1, \mathbf{q}_2, \ldots, \mathbf{q}_n\} is an orthonormal eigenbasis for Rn\mathbb{R}^n And A=QDQTA = QDQ^T with Q=[q1qn]Q = [\mathbf{q}_1 \mid \cdots \mid \mathbf{q}_n]. \blacksquare

5.7 Worked Example: Full Diagonalisation

Problem. Find the eigenvalues, eigenvectors, and diagonalise

A=(4123)A = \begin{pmatrix} 4 & 1 \\ 2 & 3 \end{pmatrix}

Solution

The characteristic polynomial is

det(AλI)=det(4λ123λ)=(4λ)(3λ)2\det(A - \lambda I) = \det\begin{pmatrix} 4 - \lambda & 1 \\ 2 & 3 - \lambda \end{pmatrix} = (4 - \lambda)(3 - \lambda) - 2

=λ27λ+10=(λ5)(λ2)= \lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2)

So the eigenvalues are λ1=5\lambda_1 = 5 and λ2=2\lambda_2 = 2.

For λ1=5\lambda_1 = 5: Solve (A5I)v=0(A - 5I)\mathbf{v} = \mathbf{0}.

(1122)v=0    v1+v2=0    v=t(11)\begin{pmatrix} -1 & 1 \\ 2 & -2 \end{pmatrix}\mathbf{v} = \mathbf{0} \implies -v_1 + v_2 = 0 \implies \mathbf{v} = t\begin{pmatrix} 1 \\ 1 \end{pmatrix}

For λ2=2\lambda_2 = 2: Solve (A2I)v=0(A - 2I)\mathbf{v} = \mathbf{0}.

(2121)v=0    2v1+v2=0    v=t(12)\begin{pmatrix} 2 & 1 \\ 2 & 1 \end{pmatrix}\mathbf{v} = \mathbf{0} \implies 2v_1 + v_2 = 0 \implies \mathbf{v} = t\begin{pmatrix} 1 \\ -2 \end{pmatrix}

Therefore A=PDP1A = PDP^{-1} with

P=(1112),D=(5002)P = \begin{pmatrix} 1 & 1 \\ 1 & -2 \end{pmatrix}, \quad D = \begin{pmatrix} 5 & 0 \\ 0 & 2 \end{pmatrix}

P1=13(2111)=(2/31/31/31/3)P^{-1} = \frac{1}{-3}\begin{pmatrix} -2 & -1 \\ -1 & 1 \end{pmatrix} = \begin{pmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{pmatrix}

Verification: PDP1=(1112)(5002)(2/31/31/31/3)PDP^{-1} = \begin{pmatrix} 1 & 1 \\ 1 & -2 \end{pmatrix}\begin{pmatrix} 5 & 0 \\ 0 & 2 \end{pmatrix}\begin{pmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{pmatrix}

=(5254)(2/31/31/31/3)=(10/3+2/35/32/310/34/35/3+4/3)=(4123)=A= \begin{pmatrix} 5 & 2 \\ 5 & -4 \end{pmatrix}\begin{pmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{pmatrix} = \begin{pmatrix} 10/3 + 2/3 & 5/3 - 2/3 \\ 10/3 - 4/3 & 5/3 + 4/3 \end{pmatrix} = \begin{pmatrix} 4 & 1 \\ 2 & 3 \end{pmatrix} = A. \blacksquare

Problem. Use the Cayley—Hamilton theorem to compute A10A^{10} for the same matrix AA above.

Solution

The characteristic polynomial is p(λ)=λ27λ+10p(\lambda) = \lambda^2 - 7\lambda + 10So by Cayley—Hamilton, A2=7A10IA^2 = 7A - 10I.

To find A10A^{10}Divide λ10\lambda^{10} by p(λ)p(\lambda):

λ10=q(λ)(λ27λ+10)+r(λ)\lambda^{10} = q(\lambda)(\lambda^2 - 7\lambda + 10) + r(\lambda)

Where r(λ)=aλ+br(\lambda) = a\lambda + b has degree less than 2. Then A10=r(A)=aA+bIA^{10} = r(A) = aA + bI.

To find aa and bbEvaluate at the eigenvalues:

λ10λ=5=510=9765625=5a+b\lambda^{10}\big|_{\lambda=5} = 5^{10} = 9765625 = 5a + b

λ10λ=2=210=1024=2a+b\lambda^{10}\big|_{\lambda=2} = 2^{10} = 1024 = 2a + b

Subtracting: 3a=97656251024=97646013a = 9765625 - 1024 = 9764601So a=3254867a = 3254867.

b=102423254867=10246509734=6508710b = 1024 - 2 \cdot 3254867 = 1024 - 6509734 = -6508710.

Therefore A10=3254867A6508710IA^{10} = 3254867 \cdot A - 6508710 \cdot I. \blacksquare

:::caution Common Pitfall Not every matrix is diagonalisable. For example, A=(1101)A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} has Eigenvalue λ=1\lambda = 1 with algebraic multiplicity 2 but geometric multiplicity 1. It has only one Linearly independent eigenvector and is not diagonalisable. :::

5.8 Worked Example: Spectral Decomposition of a Symmetric Matrix

Problem. Orthogonally diagonalise the symmetric matrix

A=(211121112)A = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}

Solution

det(AλI)=det(2λ1112λ1112λ)\det(A - \lambda I) = \det\begin{pmatrix} 2-\lambda & 1 & 1 \\ 1 & 2-\lambda & 1 \\ 1 & 1 & 2-\lambda \end{pmatrix}

=(2λ)[(2λ)21]1[(2λ)1]+1[1(2λ)]= (2-\lambda)[(2-\lambda)^2 - 1] - 1[(2-\lambda) - 1] + 1[1 - (2-\lambda)] =(2λ)(λ24λ+3)(1λ)+(λ1)= (2-\lambda)(\lambda^2 - 4\lambda + 3) - (1-\lambda) + (\lambda - 1) =(2λ)(λ1)(λ3)= (2-\lambda)(\lambda-1)(\lambda-3)

Eigenvalues: λ1=1\lambda_1 = 1, λ2=2\lambda_2 = 2, λ3=3\lambda_3 = 3.

For λ1=1\lambda_1 = 1: AI=(111111111)(111000000)A - I = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix} \to \begin{pmatrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}. Eigenspace: {(s,t,st):s,tR}\{(s, t, -s-t) : s, t \in \mathbb{R}\}. An orthonormal basis: q1=12(1,1,0)\mathbf{q}_1 = \frac{1}{\sqrt{2}}(1, -1, 0), q2=16(1,1,2)\mathbf{q}_2 = \frac{1}{\sqrt{6}}(1, 1, -2).

For λ2=2\lambda_2 = 2: A2I=(011101110)(101011000)A - 2I = \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix} \to \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \end{pmatrix}. Eigenspace: {(t,t,t):tR}\{(-t, -t, t) : t \in \mathbb{R}\}. Normalised: q3=13(1,1,1)\mathbf{q}_3 = \frac{1}{\sqrt{3}}(-1, -1, 1).

For λ3=3\lambda_3 = 3: A3I=(111111111)(111000000)A - 3I = \begin{pmatrix} -1 & 1 & 1 \\ 1 & -1 & 1 \\ 1 & 1 & -1 \end{pmatrix} \to \begin{pmatrix} 1 & -1 & -1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}. Eigenspace: {(s+t,s,t):s,tR}\{(s+t, s, t) : s, t \in \mathbb{R}\}. Normalised: q4=13(1,1,1)\mathbf{q}_4 = \frac{1}{\sqrt{3}}(1, 1, 1).

A=QDQTA = QDQ^T where Q=16(312231220222)Q = \frac{1}{\sqrt{6}}\begin{pmatrix} \sqrt{3} & 1 & -\sqrt{2} & \sqrt{2} \\ -\sqrt{3} & 1 & -\sqrt{2} & \sqrt{2} \\ 0 & -2 & \sqrt{2} & \sqrt{2} \end{pmatrix} and D=diag(1,2,3)D = \mathrm{diag}(1, 2, 3).

\blacksquare

5.9 Common Pitfalls

  • Algebraic multiplicity \geq geometric multiplicity, not the reverse. A matrix is diagonalisable if and only if equality holds for every eigenvalue.
  • The characteristic polynomial depends on the choice of eigenvalue variable convention. det(AλI)\det(A - \lambda I) and det(λIA)\det(\lambda I - A) differ by a factor of (1)n(-1)^n but have the same roots.
  • Similarity preserves eigenvalues but not eigenvectors. If A=PBP1A = PBP^{-1} and Bv=λvB\mathbf{v} = \lambda\mathbf{v} then A(Pv)=λ(Pv)A(P\mathbf{v}) = \lambda(P\mathbf{v}); the eigenvectors transform by PP.

6. Linear Transformations

6.1 Definition

A linear transformation (or linear map) T:VWT : V \to W between vector spaces VV and WW over FF Is a function satisfying:

  1. T(u+v)=T(u)+T(v)T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}) for all u,vV\mathbf{u}, \mathbf{v} \in V
  2. T(αv)=αT(v)T(\alpha \mathbf{v}) = \alpha T(\mathbf{v}) for all αF\alpha \in F, vV\mathbf{v} \in V

Equivalently, T(αu+βv)=αT(u)+βT(v)T(\alpha\mathbf{u} + \beta\mathbf{v}) = \alpha T(\mathbf{u}) + \beta T(\mathbf{v}) for all α,βF\alpha, \beta \in F and u,vV\mathbf{u}, \mathbf{v} \in V.

The set of all linear transformations from VV to WW is denoted L(V,W)\mathcal{L}(V, W).

Proposition 6.1. For any linear transformation TT:

  1. T(0)=0T(\mathbf{0}) = \mathbf{0}.
  2. T(v)=T(v)T(-\mathbf{v}) = -T(\mathbf{v}).
  3. T(i=1kαivi)=i=1kαiT(vi)T\left(\sum_{i=1}^k \alpha_i \mathbf{v}_i\right) = \sum_{i=1}^k \alpha_i T(\mathbf{v}_i).

Proof. T(0)=T(00)=0T(0)=0T(\mathbf{0}) = T(0 \cdot \mathbf{0}) = 0 \cdot T(\mathbf{0}) = \mathbf{0}. T(v)=T((1)v)=(1)T(v)=T(v)T(-\mathbf{v}) = T((-1)\mathbf{v}) = (-1)T(\mathbf{v}) = -T(\mathbf{v}). Property (3) follows by Induction. \blacksquare

6.2 Matrix Representation

If VV and WW are finite-dimensional with bases BV={v1,,vn}\mathcal{B}_V = \{\mathbf{v}_1, \ldots, \mathbf{v}_n\} and BW={w1,,wm}\mathcal{B}_W = \{\mathbf{w}_1, \ldots, \mathbf{w}_m\}Then every TL(V,W)T \in \mathcal{L}(V, W) is Uniquely represented by a matrix [T]BVBWMm×n(F)[T]_{\mathcal{B}_V}^{\mathcal{B}_W} \in \mathcal{M}_{m \times n}(F) Where the jj-th column is the coordinate vector of T(vj)T(\mathbf{v}_j) with respect to BW\mathcal{B}_W.

6.3 Kernel and Image

The kernel (null space) and image (range) of TT are:

ker(T)={vV:T(v)=0}\ker(T) = \{\mathbf{v} \in V : T(\mathbf{v}) = \mathbf{0}\} im(T)={T(v):vV}\mathrm{im}(T) = \{T(\mathbf{v}) : \mathbf{v} \in V\}

Proposition 6.2. ker(T)\ker(T) is a subspace of VV and im(T)\mathrm{im}(T) is a subspace of WW.

6.4 Rank-Nullity Theorem for Linear Maps

Theorem 6.3 (Rank-Nullity). For TL(V,W)T \in \mathcal{L}(V, W) with VV finite-dimensional:

dim(ker(T))+dim(im(T))=dim(V)\dim(\ker(T)) + \dim(\mathrm{im}(T)) = \dim(V)

Proof. Let {u1,,uk}\{\mathbf{u}_1, \ldots, \mathbf{u}_k\} be a basis for ker(T)\ker(T)Where k=dim(ker(T))k = \dim(\ker(T)). Extend to a basis {u1,,uk,uk+1,,un}\{\mathbf{u}_1, \ldots, \mathbf{u}_k, \mathbf{u}_{k+1}, \ldots, \mathbf{u}_n\} of VV Where n=dim(V)n = \dim(V).

We claim {T(uk+1),,T(un)}\{T(\mathbf{u}_{k+1}), \ldots, T(\mathbf{u}_n)\} is a basis for im(T)\mathrm{im}(T).

Spanning: For any wim(T)\mathbf{w} \in \mathrm{im}(T)Write w=T(v)\mathbf{w} = T(\mathbf{v}) for some v=i=1nαiuiV\mathbf{v} = \sum_{i=1}^n \alpha_i \mathbf{u}_i \in V. Then

w=T(i=1nαiui)=i=1nαiT(ui)=i=k+1nαiT(ui)\mathbf{w} = T\left(\sum_{i=1}^n \alpha_i \mathbf{u}_i\right) = \sum_{i=1}^n \alpha_i T(\mathbf{u}_i) = \sum_{i=k+1}^n \alpha_i T(\mathbf{u}_i)

Since T(ui)=0T(\mathbf{u}_i) = \mathbf{0} for iki \leq k.

Linear independence: If i=k+1nαiT(ui)=0\sum_{i=k+1}^n \alpha_i T(\mathbf{u}_i) = \mathbf{0}Then T(i=k+1nαiui)=0T\left(\sum_{i=k+1}^n \alpha_i \mathbf{u}_i\right) = \mathbf{0}So i=k+1nαiuiker(T)\sum_{i=k+1}^n \alpha_i \mathbf{u}_i \in \ker(T). Thus i=k+1nαiui=j=1kβjuj\sum_{i=k+1}^n \alpha_i \mathbf{u}_i = \sum_{j=1}^k \beta_j \mathbf{u}_j For some βj\beta_j. By linear independence of the full basis, all coefficients are zero.

Therefore dim(im(T))=nk\dim(\mathrm{im}(T)) = n - kGiving dim(ker(T))+dim(im(T))=n\dim(\ker(T)) + \dim(\mathrm{im}(T)) = n. \blacksquare

6.5 Isomorphisms

A linear transformation T:VWT : V \to W is an isomorphism if it is bijective. We write VWV \cong W.

Theorem 6.4. TT is an isomorphism if and only if ker(T)={0}\ker(T) = \{\mathbf{0}\} and im(T)=W\mathrm{im}(T) = W.

Corollary 6.5. If dim(V)=dim(W)<\dim(V) = \dim(W) \lt \inftyThen TT is injective if and only if TT is surjective.

Proof. If TT is injective, ker(T)={0}\ker(T) = \{\mathbf{0}\}So dim(im(T))=dim(V)=dim(W)\dim(\mathrm{im}(T)) = \dim(V) = \dim(W) Hence im(T)=W\mathrm{im}(T) = W (a subspace of full dimension equals the whole space). Conversely, If TT is surjective, dim(im(T))=dim(W)=dim(V)\dim(\mathrm{im}(T)) = \dim(W) = \dim(V)So dim(ker(T))=0\dim(\ker(T)) = 0Giving ker(T)={0}\ker(T) = \{\mathbf{0}\}. \blacksquare

6.6 Change of Basis

If PP is the change-of-basis matrix from basis B\mathcal{B} to basis B\mathcal{B}'Then for a Linear transformation TT with matrix representations [T]B[T]_{\mathcal{B}} and [T]B[T]_{\mathcal{B}'}:

[T]B=P1[T]BP[T]_{\mathcal{B}'} = P^{-1}[T]_{\mathcal{B}} P

This is the similarity transformation. Similar matrices represent the same linear transformation In different bases and share the same eigenvalues, determinant, and trace.

6.7 Worked Example: Matrix of a Transformation with Change of Basis

Problem. Let T:R2R2T : \mathbb{R}^2 \to \mathbb{R}^2 be defined by T(x,y)=(2x+y,x+2y)T(x, y) = (2x + y, x + 2y). (a) Find [T]E[T]_{\mathcal{E}} where E\mathcal{E} is the standard basis. (b) Find [T]B[T]_{\mathcal{B}} where B={(1,1),(1,1)}\mathcal{B} = \{(1, 1), (1, -1)\}. (c) Verify that [T]B=P1[T]EP[T]_{\mathcal{B}} = P^{-1}[T]_{\mathcal{E}} P.

Solution

(a) T(1,0)=(2,1)T(1, 0) = (2, 1) and T(0,1)=(1,2)T(0, 1) = (1, 2)So

[T]E=(2112)[T]_{\mathcal{E}} = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}

(b) Compute TT on the basis B\mathcal{B}:

T(1,1)=(3,3)=3(1,1)+0(1,1)T(1, 1) = (3, 3) = 3(1, 1) + 0(1, -1)So coordinates are (30)B\begin{pmatrix} 3 \\ 0 \end{pmatrix}_{\mathcal{B}}.

T(1,1)=(1,1)=0(1,1)+1(1,1)T(1, -1) = (1, -1) = 0(1, 1) + 1(1, -1)So coordinates are (01)B\begin{pmatrix} 0 \\ 1 \end{pmatrix}_{\mathcal{B}}.

[T]B=(3001)[T]_{\mathcal{B}} = \begin{pmatrix} 3 & 0 \\ 0 & 1 \end{pmatrix}

(c) The change-of-basis matrix from E\mathcal{E} to B\mathcal{B} is

P=(1111),P1=(1/21/21/21/2)P = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}, \quad P^{-1} = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & -1/2 \end{pmatrix}

P1[T]EP=(1/21/21/21/2)(2112)(1111)P^{-1}[T]_{\mathcal{E}} P = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & -1/2 \end{pmatrix}\begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

=(3/23/21/21/2)(1111)=(3001)=[T]B= \begin{pmatrix} 3/2 & 3/2 \\ 1/2 & -1/2 \end{pmatrix}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} = \begin{pmatrix} 3 & 0 \\ 0 & 1 \end{pmatrix} = [T]_{\mathcal{B}}. \blacksquare

6.8 Dual Spaces

Definition. The dual space of a vector space VV over FFDenoted VV^*Is the space of all Linear functionals f:VFf : V \to F. Elements of VV^* are called covectors.

Proposition 6.6. If dim(V)=n<\dim(V) = n \lt \inftyThen dim(V)=n\dim(V^*) = n.

Proof. Let {e1,,en}\{\mathbf{e}_1, \ldots, \mathbf{e}_n\} be a basis for VV. Define the dual basis {φ1,,φn}V\{\varphi_1, \ldots, \varphi_n\} \subseteq V^* by φi(ej)=δij\varphi_i(\mathbf{e}_j) = \delta_{ij} (the Kronecker Delta). Each φi\varphi_i is a well-defined linear functional since it is defined on a basis and Extended linearly. These are linearly independent: if ciφi=0\sum c_i \varphi_i = 0Then applying to ej\mathbf{e}_j gives cj=0c_j = 0. They span VV^*: for any fVf \in V^*, f=i=1nf(ei)φif = \sum_{i=1}^n f(\mathbf{e}_i)\varphi_i. \blacksquare

Definition. The double dual of VV is V=(V)V^{**} = (V^*)^*.

Theorem 6.7. If VV is finite-dimensional, the map Φ:VV\Phi : V \to V^{**} defined by Φ(v)(f)=f(v)\Phi(\mathbf{v})(f) = f(\mathbf{v}) is a natural isomorphism.

Intuition. The double dual “recovers” the original space. A vector v\mathbf{v} can be Identified with the functional on VV^* that evaluates each covector at v\mathbf{v}.

Example. For V=R3V = \mathbb{R}^3 with standard basis, the dual basis {φ1,φ2,φ3}\{\varphi_1, \varphi_2, \varphi_3\} Is given by φi(x1,x2,x3)=xi\varphi_i(x_1, x_2, x_3) = x_i. The functional f(x1,x2,x3)=3x12x2+x3f(x_1, x_2, x_3) = 3x_1 - 2x_2 + x_3 Corresponds to the covector 3φ12φ2+φ33\varphi_1 - 2\varphi_2 + \varphi_3 in VV^*.

Remark. In infinite dimensions, VV and VV^{**} need not be isomorphic. The double dual Isomorphism is a special feature of finite-dimensional spaces.

6.9 Annihilators

Definition. For a subset SVS \subseteq VThe annihilator of SS is

S0={fV:f(s)=0 for all sS}S^0 = \{f \in V^* : f(s) = 0 \mathrm{~for~all~} s \in S\}

Proposition 6.8. S0S^0 is a subspace of VV^*And if WW is a subspace of VV with dim(V)=n\dim(V) = nThen dim(W0)=ndim(W)\dim(W^0) = n - \dim(W).

Proof. S0S^0 is the intersection of the kernels ker(s)\ker(s) as ss ranges over SSWhere each ss Is viewed as an element of VV^{**} via Φ\Phi. Each ker(s)\ker(s) is a subspace of VV^*And any Intersection of subspaces is a subspace.

For the dimension: let dim(W)=k\dim(W) = k and extend a basis {w1,,wk}\{\mathbf{w}_1, \ldots, \mathbf{w}_k\} Of WW to a basis {w1,,wk,wk+1,,wn}\{\mathbf{w}_1, \ldots, \mathbf{w}_k, \mathbf{w}_{k+1}, \ldots, \mathbf{w}_n\} of VV. Let {φ1,,φn}\{\varphi_1, \ldots, \varphi_n\} be the dual basis. Then fW0f \in W^0 iff f(wi)=0f(\mathbf{w}_i) = 0 For i=1,,ki = 1, \ldots, k. Writing f=cjφjf = \sum c_j \varphi_jWe need ci=0c_i = 0 for i=1,,ki = 1, \ldots, k. So f=j=k+1ncjφjf = \sum_{j=k+1}^n c_j \varphi_jGiving dim(W0)=nk\dim(W^0) = n - k. \blacksquare


7. Inner Product Spaces

7.1 Definition of an Inner Product

An inner product on a vector space VV over FF (where F=RF = \mathbb{R} or C\mathbb{C}) is a Function ,:V×VF\langle \cdot, \cdot \rangle : V \times V \to F satisfying:

  1. Conjugate symmetry: u,v=v,u\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}
  2. Linearity in the first argument: αu+βw,v=αu,v+βw,v\langle \alpha\mathbf{u} + \beta\mathbf{w}, \mathbf{v} \rangle = \alpha\langle \mathbf{u}, \mathbf{v} \rangle + \beta\langle \mathbf{w}, \mathbf{v} \rangle
  3. Positive definiteness: v,v0\langle \mathbf{v}, \mathbf{v} \rangle \geq 0 with equality iff v=0\mathbf{v} = \mathbf{0}

A vector space equipped with an inner product is called an inner product space.

Example. The standard inner product on Rn\mathbb{R}^n is x,y=i=1nxiyi\langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^n x_i y_i. On Cn\mathbb{C}^n x,y=i=1nxiyi\langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^n x_i \overline{y_i}.

Example. On C[a,b]C[a,b]The L2L^2 inner product is f,g=abf(x)g(x)dx\langle f, g \rangle = \int_a^b f(x)g(x)\,dx.

7.2 Norms

Every inner product induces a norm:

v=v,v\lVert \mathbf{v} \rVert = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}

Theorem 7.1 (Cauchy—Schwarz Inequality). For all u,vV\mathbf{u}, \mathbf{v} \in V

u,vuv\lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert \leq \lVert \mathbf{u} \rVert \, \lVert \mathbf{v} \rVert

With equality if and only if u\mathbf{u} and v\mathbf{v} are linearly dependent.

Proof. If v=0\mathbf{v} = \mathbf{0}Both sides are 0 and the result holds. Assume v0\mathbf{v} \neq \mathbf{0}. For any tRt \in \mathbb{R} (or C\mathbb{C}), positive definiteness gives

0utv,utv=u,utv,utu,v+t2v,v0 \leq \langle \mathbf{u} - t\mathbf{v}, \mathbf{u} - t\mathbf{v} \rangle = \langle \mathbf{u}, \mathbf{u} \rangle - t\langle \mathbf{v}, \mathbf{u} \rangle - \overline{t}\langle \mathbf{u}, \mathbf{v} \rangle + \lvert t \rvert^2 \langle \mathbf{v}, \mathbf{v} \rangle

Set t=u,vv,vt = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\langle \mathbf{v}, \mathbf{v} \rangle} (the value that minimises the right side):

0u2u,v2v20 \leq \lVert \mathbf{u} \rVert^2 - \frac{\lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert^2}{\lVert \mathbf{v} \rVert^2}

Rearranging: u,v2u2v2\lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert^2 \leq \lVert \mathbf{u} \rVert^2 \lVert \mathbf{v} \rVert^2. Taking square roots gives the result. Equality holds iff utv=0\mathbf{u} - t\mathbf{v} = \mathbf{0} I.e., u\mathbf{u} and v\mathbf{v} are linearly dependent. \blacksquare

Theorem 7.2 (Triangle Inequality).

u+vu+v\lVert \mathbf{u} + \mathbf{v} \rVert \leq \lVert \mathbf{u} \rVert + \lVert \mathbf{v} \rVert

Proof.

u+v2=u+v,u+v=u2+2Reu,v+v2\lVert \mathbf{u} + \mathbf{v} \rVert^2 = \langle \mathbf{u} + \mathbf{v}, \mathbf{u} + \mathbf{v} \rangle = \lVert \mathbf{u} \rVert^2 + 2\,\mathrm{Re}\langle \mathbf{u}, \mathbf{v} \rangle + \lVert \mathbf{v} \rVert^2

By Cauchy—Schwarz, Reu,vu,vuv\mathrm{Re}\langle \mathbf{u}, \mathbf{v} \rangle \leq \lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert \leq \lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVertSo

u+v2u2+2uv+v2=(u+v)2\lVert \mathbf{u} + \mathbf{v} \rVert^2 \leq \lVert \mathbf{u} \rVert^2 + 2\lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert + \lVert \mathbf{v} \rVert^2 = (\lVert \mathbf{u} \rVert + \lVert \mathbf{v} \rVert)^2

Taking square roots gives the result. \blacksquare

7.3 Orthogonality

Two vectors u,v\mathbf{u}, \mathbf{v} are orthogonal if u,v=0\langle \mathbf{u}, \mathbf{v} \rangle = 0. We write uv\mathbf{u} \perp \mathbf{v}.

An orthonormal set {e1,,ek}\{e_1, \ldots, e_k\} satisfies ei,ej=δij\langle e_i, e_j \rangle = \delta_{ij}.

Theorem 7.3 (Pythagorean Theorem). If uv\mathbf{u} \perp \mathbf{v}Then

u+v2=u2+v2\lVert \mathbf{u} + \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + \lVert \mathbf{v} \rVert^2

Proof. u+v2=u2+2u,v+v2=u2+v2\lVert \mathbf{u} + \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + 2\langle \mathbf{u}, \mathbf{v} \rangle + \lVert \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + \lVert \mathbf{v} \rVert^2. \blacksquare

Proposition 7.4. Every orthonormal set is linearly independent.

Proof. If i=1kαiei=0\sum_{i=1}^k \alpha_i e_i = \mathbf{0}Then taking the inner product with eje_j: αj=αiei,ej=0,ej=0\alpha_j = \langle \sum \alpha_i e_i, e_j \rangle = \langle \mathbf{0}, e_j \rangle = 0 for each jj. \blacksquare

7.4 Gram—Schmidt Process

The Gram—Schmidt process converts a linearly independent set {v1,,vn}\{\mathbf{v}_1, \ldots, \mathbf{v}_n\} into an orthonormal set {e1,,en}\{e_1, \ldots, e_n\}:

u1=v1,e1=u1u1\mathbf{u}_1 = \mathbf{v}_1, \quad e_1 = \frac{\mathbf{u}_1}{\lVert \mathbf{u}_1 \rVert}

uk=vki=1k1vk,eiei,ek=ukuk\mathbf{u}_k = \mathbf{v}_k - \sum_{i=1}^{k-1} \langle \mathbf{v}_k, e_i \rangle e_i, \quad e_k = \frac{\mathbf{u}_k}{\lVert \mathbf{u}_k \rVert}

Proposition 7.5. At each step, span{e1,,ek}=span{v1,,vk}\mathrm{span}\{e_1, \ldots, e_k\} = \mathrm{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}.

Proof. By construction, uk\mathbf{u}_k is vk\mathbf{v}_k minus its projection onto span{e1,,ek1}=span{v1,,vk1}\mathrm{span}\{e_1, \ldots, e_{k-1}\} = \mathrm{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_{k-1}\}. So ukspan{v1,,vk}\mathbf{u}_k \in \mathrm{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} and vk=uk+i=1k1vk,eieispan{u1,,uk}\mathbf{v}_k = \mathbf{u}_k + \sum_{i=1}^{k-1}\langle \mathbf{v}_k, e_i \rangle e_i \in \mathrm{span}\{\mathbf{u}_1, \ldots, \mathbf{u}_k\}. Since each eie_i is a scalar multiple of ui\mathbf{u}_iThe spans coincide. \blacksquare

7.5 Orthogonal Projection

The orthogonal projection of v\mathbf{v} onto a subspace WW with orthonormal basis {e1,,ek}\{e_1, \ldots, e_k\} is

projW(v)=i=1kv,eiei\mathrm{proj_W}(\mathbf{v}) = \sum_{i=1}^k \langle \mathbf{v}, e_i \rangle e_i

Theorem 7.6 (Best Approximation). Among all vectors in WWThe orthogonal projection projW(v)\mathrm{proj_W}(\mathbf{v}) minimises the distance to v\mathbf{v}:

vprojW(v)vwfor all wW\lVert \mathbf{v} - \mathrm{proj_W}(\mathbf{v}) \rVert \leq \lVert \mathbf{v} - \mathbf{w} \rVert \quad \mathrm{for}~all~ \mathbf{w} \in W

Proof. For any wW\mathbf{w} \in WWrite vw=(vprojW(v))+(projW(v)w)\mathbf{v} - \mathbf{w} = (\mathbf{v} - \mathrm{proj_W}(\mathbf{v})) + (\mathrm{proj_W}(\mathbf{v}) - \mathbf{w}). The first term is orthogonal to WW (hence to the second term, which lies in WW), so by the Pythagorean theorem:

vw2=vprojW(v)2+projW(v)w2vprojW(v)2\lVert \mathbf{v} - \mathbf{w} \rVert^2 = \lVert \mathbf{v} - \mathrm{proj_W}(\mathbf{v}) \rVert^2 + \lVert \mathrm{proj_W}(\mathbf{v}) - \mathbf{w} \rVert^2 \geq \lVert \mathbf{v} - \mathrm{proj_W}(\mathbf{v}) \rVert^2

With equality iff w=projW(v)\mathbf{w} = \mathrm{proj_W}(\mathbf{v}). \blacksquare

7.6 Least Squares Approximation

A fundamental application of orthogonal projection is fitting functions to data. Given a subspace WW of an inner product space VV and a target vV\mathbf{v} \in VThe best approximation in WW Is the orthogonal projection projW(v)\mathrm{proj_W}(\mathbf{v}).

7.7 Worked Example: Gram—Schmidt

Problem. Apply the Gram—Schmidt process to v1=(1,1,0)\mathbf{v}_1 = (1, 1, 0) v2=(1,0,1)\mathbf{v}_2 = (1, 0, 1), v3=(0,1,1)\mathbf{v}_3 = (0, 1, 1) in R3\mathbb{R}^3 with the standard inner Product.

Solution

u1=v1=(1,1,0)\mathbf{u}_1 = \mathbf{v}_1 = (1, 1, 0), u1=2\lVert \mathbf{u}_1 \rVert = \sqrt{2}, e1=12(1,1,0)e_1 = \frac{1}{\sqrt{2}}(1, 1, 0).

u2=v2v2,e1e1=(1,0,1)1212(1,1,0)=(1,0,1)12(1,1,0)=(12,12,1)\mathbf{u}_2 = \mathbf{v}_2 - \langle \mathbf{v}_2, e_1 \rangle e_1 = (1, 0, 1) - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 1, 0) = (1, 0, 1) - \frac{1}{2}(1, 1, 0) = (\frac{1}{2}, -\frac{1}{2}, 1)

u2=1/4+1/4+1=3/2\lVert \mathbf{u}_2 \rVert = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}, e2=13/2(12,12,1)=16(1,1,2)e_2 = \frac{1}{\sqrt{3/2}}(\frac{1}{2}, -\frac{1}{2}, 1) = \frac{1}{\sqrt{6}}(1, -1, 2).

u3=v3v3,e1e1v3,e2e2\mathbf{u}_3 = \mathbf{v}_3 - \langle \mathbf{v}_3, e_1 \rangle e_1 - \langle \mathbf{v}_3, e_2 \rangle e_2

v3,e1=12(0+1+0)=12\langle \mathbf{v}_3, e_1 \rangle = \frac{1}{\sqrt{2}}(0 + 1 + 0) = \frac{1}{\sqrt{2}}

v3,e2=16(01+2)=16\langle \mathbf{v}_3, e_2 \rangle = \frac{1}{\sqrt{6}}(0 - 1 + 2) = \frac{1}{\sqrt{6}}

u3=(0,1,1)1212(1,1,0)1616(1,1,2)=(0,1,1)12(1,1,0)16(1,1,2)\mathbf{u}_3 = (0, 1, 1) - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 1, 0) - \frac{1}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(1, -1, 2) = (0, 1, 1) - \frac{1}{2}(1, 1, 0) - \frac{1}{6}(1, -1, 2)

=(1216,112+16,113)=(23,23,23)= (-\frac{1}{2} - \frac{1}{6}, 1 - \frac{1}{2} + \frac{1}{6}, 1 - \frac{1}{3}) = (-\frac{2}{3}, \frac{2}{3}, \frac{2}{3})

u3=4/9+4/9+4/9=4/3=2/3\lVert \mathbf{u}_3 \rVert = \sqrt{4/9 + 4/9 + 4/9} = \sqrt{4/3} = 2/\sqrt{3}, e3=32(23,23,23)=13(1,1,1)e_3 = \frac{\sqrt{3}}{2}(-\frac{2}{3}, \frac{2}{3}, \frac{2}{3}) = \frac{1}{\sqrt{3}}(-1, 1, 1).

Verification: e1,e2=112(11+0)=0\langle e_1, e_2 \rangle = \frac{1}{\sqrt{12}}(1 - 1 + 0) = 0. \checkmark e1,e3=16(1+1+0)=0\langle e_1, e_3 \rangle = \frac{1}{\sqrt{6}}(-1 + 1 + 0) = 0. \checkmark e2,e3=118(11+2)=0\langle e_2, e_3 \rangle = \frac{1}{\sqrt{18}}(-1 - 1 + 2) = 0. \checkmark

The orthonormal basis is {12(1,1,0), 16(1,1,2), 13(1,1,1)}\left\{\frac{1}{\sqrt{2}}(1,1,0),\ \frac{1}{\sqrt{6}}(1,-1,2),\ \frac{1}{\sqrt{3}}(-1,1,1)\right\}. \blacksquare

:::caution Common Pitfall The Gram—Schmidt process requires a linearly independent starting set. If the input vectors are Linearly dependent, one of the uk\mathbf{u}_k will be the zero vector, and the process will fail (attempting to divide by zero in the normalisation step). :::

7.8 Worked Example: Orthogonal Projection onto a Plane

Problem. Find the orthogonal projection of v=(3,1,2)\mathbf{v} = (3, -1, 2) onto the plane WW spanned by (1,0,1)(1, 0, 1) and (0,1,1)(0, 1, 1) in R3\mathbb{R}^3 with the standard inner product. Also find the distance from v\mathbf{v} to WW.

Solution

First, apply Gram—Schmidt to obtain an orthonormal basis for WW.

u1=(1,0,1)\mathbf{u}_1 = (1, 0, 1), u1=2\lVert \mathbf{u}_1 \rVert = \sqrt{2}, e1=12(1,0,1)e_1 = \frac{1}{\sqrt{2}}(1, 0, 1).

u2=(0,1,1)(0,1,1),e1e1=(0,1,1)1212(1,0,1)=(0,1,1)12(1,0,1)=(12,1,12)\mathbf{u}_2 = (0, 1, 1) - \langle (0,1,1), e_1 \rangle e_1 = (0, 1, 1) - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 0, 1) = (0, 1, 1) - \frac{1}{2}(1, 0, 1) = (-\frac{1}{2}, 1, \frac{1}{2}).

u2=1/4+1+1/4=3/2\lVert \mathbf{u}_2 \rVert = \sqrt{1/4 + 1 + 1/4} = \sqrt{3/2}, e2=16(1,2,1)e_2 = \frac{1}{\sqrt{6}}(-1, 2, 1).

Now compute the projection:

v,e1=12(3+0+2)=52\langle \mathbf{v}, e_1 \rangle = \frac{1}{\sqrt{2}}(3 + 0 + 2) = \frac{5}{\sqrt{2}}

v,e2=16(32+2)=36\langle \mathbf{v}, e_2 \rangle = \frac{1}{\sqrt{6}}(-3 - 2 + 2) = \frac{-3}{\sqrt{6}}

projW(v)=5212(1,0,1)+3616(1,2,1)\mathrm{proj_W}(\mathbf{v}) = \frac{5}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 0, 1) + \frac{-3}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(-1, 2, 1)

=52(1,0,1)+36(1,2,1)=(52,0,52)+(12,1,12)=(3,1,2)= \frac{5}{2}(1, 0, 1) + \frac{-3}{6}(-1, 2, 1) = (\frac{5}{2}, 0, \frac{5}{2}) + (\frac{1}{2}, -1, -\frac{1}{2}) = (3, -1, 2)

The residual is vprojW(v)=(0,0,0)\mathbf{v} - \mathrm{proj_W}(\mathbf{v}) = (0, 0, 0)So the distance is 0. This means vW\mathbf{v} \in W itself. Indeed, v=3(1,0,1)(0,1,1)span{(1,0,1),(0,1,1)}\mathbf{v} = 3(1, 0, 1) - (0, 1, 1) \in \mathrm{span}\{(1,0,1), (0,1,1)\}. \blacksquare

7.9 Worked Example: L2L^2 Least Squares Approximation

Problem. Find the constant function cc (i.e., the best approximation by a degree-0 polynomial) That minimises 01(exc)2dx\int_0^1 (e^x - c)^2\,dx.

Solution

We want the orthogonal projection of f(x)=exf(x) = e^x onto the subspace W=span{1}W = \mathrm{span}\{1\} in the L2[0,1]L^2[0,1] inner product space. The orthonormal basis for WW is e1=1e_1 = 1 (since 12=011dx=1\lVert 1 \rVert^2 = \int_0^1 1\,dx = 1).

projW(f)=f,11=(01exdx)1=(e1)1\mathrm{proj_W}(f) = \langle f, 1 \rangle \cdot 1 = \left(\int_0^1 e^x\,dx\right) \cdot 1 = (e - 1) \cdot 1

So the best constant approximation is c=e11.718c = e - 1 \approx 1.718.

Verification: The error is ex(e1)e^x - (e-1). Expanding exe^x as a Taylor series around x=1/2x = 1/2: The constant term is e1/21.649e^{1/2} \approx 1.649But our answer e11.718e - 1 \approx 1.718 is the L2L^2-optimal constant, not the Taylor approximation. The two optimisation criteria differ. \blacksquare

7.10 Common Pitfalls

  • The Cauchy—Schwarz inequality is not the triangle inequality. Cauchy—Schwarz bounds the inner product by the product of norms; the triangle inequality bounds the norm of a sum by the sum of norms. They are related (the triangle inequality follows from Cauchy—Schwarz) but distinct.
  • Gram—Schmidt is numerically unstable. For floating-point computation, modified Gram—Schmidt or Householder reflections are preferred.
  • Orthogonal projection decomposes v\mathbf{v} uniquely. v=projW(v)+v\mathbf{v} = \mathrm{proj_W}(\mathbf{v}) + \mathbf{v}^\perp where vW\mathbf{v}^\perp \in W^\perp. This decomposition is unique and is called the orthogonal decomposition.

8. Singular Value Decomposition

8.1 Existence of the SVD

Theorem 8.1 (Singular Value Decomposition). Every matrix AMm×n(R)A \in \mathcal{M}_{m \times n}(\mathbb{R}) can be factored as

A=UΣVTA = U \Sigma V^T

Where UMm×m(R)U \in \mathcal{M}_{m \times m}(\mathbb{R}) is orthogonal, VMn×n(R)V \in \mathcal{M}_{n \times n}(\mathbb{R}) is orthogonal, and ΣMm×n(R)\Sigma \in \mathcal{M}_{m \times n}(\mathbb{R}) is diagonal with non-negative entries σ1σ2σr0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r \geq 0 (where r=rank(A)r = \mathrm{rank}(A)).

The σi\sigma_i are called the singular values of AA. The columns of UU are the left singular vectors, and the columns of VV are the right singular vectors.

Proof. The matrix ATAA^T A is an n×nn \times n symmetric positive semi-definite matrix, so by the spectral theorem it has nn real non-negative eigenvalues and an orthonormal eigenbasis. Let σ12,σ22,,σn2\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2 be these eigenvalues (some may be zero) with corresponding orthonormal eigenvectors v1,,vn\mathbf{v}_1, \ldots, \mathbf{v}_n forming the columns of VV.

For each ii with σi>0\sigma_i > 0Define ui=Avi/σi\mathbf{u}_i = A\mathbf{v}_i / \sigma_i. We verify that these form an orthonormal set:

uiTuj=viTATAvjσiσj=σj2viTvjσiσj=σj2σiσjδij=δij\mathbf{u}_i^T \mathbf{u}_j = \frac{\mathbf{v}_i^T A^T A \mathbf{v}_j}{\sigma_i \sigma_j} = \frac{\sigma_j^2 \mathbf{v}_i^T \mathbf{v}_j}{\sigma_i \sigma_j} = \frac{\sigma_j^2}{\sigma_i \sigma_j} \delta_{ij} = \delta_{ij}

Extend {u1,,ur}\{\mathbf{u}_1, \ldots, \mathbf{u}_r\} to an orthonormal basis of Rm\mathbb{R}^m to form UU. Then for any vector xRn\mathbf{x} \in \mathbb{R}^n:

Ax=A(i=1n(viTx)vi)=i=1n(viTx)Avi=i=1rσi(viTx)ui=UΣVTxA\mathbf{x} = A\left(\sum_{i=1}^{n} (\mathbf{v}_i^T \mathbf{x})\mathbf{v}_i\right) = \sum_{i=1}^{n} (\mathbf{v}_i^T \mathbf{x}) A\mathbf{v}_i = \sum_{i=1}^{r} \sigma_i (\mathbf{v}_i^T \mathbf{x}) \mathbf{u}_i = U \Sigma V^T \mathbf{x}

Since this holds for all x\mathbf{x}We have A=UΣVTA = U \Sigma V^T. \blacksquare

8.2 Relationship to Eigenvalues

Proposition 8.2. The singular values of AA are the square roots of the eigenvalues of ATAA^T A (equivalently, of AATAA^T). The non-zero singular values of AA and ATA^T are identical.

Proof. From the construction above, ATAvi=σi2viA^T A \mathbf{v}_i = \sigma_i^2 \mathbf{v}_i. For AATAA^T:

AATui=A(ATA)viσi=σi2Aviσi=σi2uiAA^T \mathbf{u}_i = \frac{A(A^T A)\mathbf{v}_i}{\sigma_i} = \frac{\sigma_i^2 A\mathbf{v}_i}{\sigma_i} = \sigma_i^2 \mathbf{u}_i

So ui\mathbf{u}_i is an eigenvector of AATAA^T with eigenvalue σi2\sigma_i^2. The non-zero eigenvalues of ATAA^T A and AATAA^T coincide (since if ATAv=λvA^T A \mathbf{v} = \lambda \mathbf{v} with λ0\lambda \neq 0Then AAT(Av)=A(ATAv)=λ(Av)AA^T(A\mathbf{v}) = A(A^T A \mathbf{v}) = \lambda(A\mathbf{v}) and Av0A\mathbf{v} \neq \mathbf{0}). \blacksquare

Proposition 8.3. If AA is symmetric with eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_nThen the singular values of AA are λ1,,λn|\lambda_1|, \ldots, |\lambda_n|.

Proof. ATA=A2A^T A = A^2Whose eigenvalues are λi2\lambda_i^2. The singular values are λi2=λi\sqrt{\lambda_i^2} = |\lambda_i|. \blacksquare

8.3 Geometric Interpretation

The SVD decomposes the linear transformation T:RnRmT : \mathbb{R}^n \to \mathbb{R}^m into three steps:

  1. VTV^T rotates (or reflects) Rn\mathbb{R}^n into a coordinate system aligned with the right singular vectors.
  2. Σ\Sigma scales each axis by the corresponding singular value (and possibly drops dimensions where σi=0\sigma_i = 0).
  3. UU rotates (or reflects) the result into the coordinate system of the left singular vectors.

Unit circle image. Under AAThe unit circle in R2\mathbb{R}^2 is mapped to an ellipse with semi-axes σ1\sigma_1 and σ2\sigma_2 aligned with the columns of UU.

8.4 Low-Rank Approximation

Theorem 8.4 (Eckart—Young—Mirsky). Let A=UΣVTA = U \Sigma V^T be the SVD with singular values σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0. For any k<rk < rThe best rank-kk approximation to AA (in both the Frobenius and spectral norms) is

Ak=i=1kσiuiviT=UkΣkVkTA_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^T = U_k \Sigma_k V_k^T

And the approximation error is

AAkF=σk+12++σr2,AAk2=σk+1\lVert A - A_k \rVert_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}, \qquad \lVert A - A_k \rVert_2 = \sigma_{k+1}

Proof (Frobenius norm). Any rank-kk matrix BB can be written in terms of an orthonormal basis of its column space. Let WMn×k(R)W \in \mathcal{M}_{n \times k}(\mathbb{R}) have orthonormal columns spanning the column space of BB. Then B=CWTB = CW^T for some CCAnd:

ABF2=A(IWWT)F2+(AC)WTF2A(IWWT)F2\lVert A - B \rVert_F^2 = \lVert A(I - WW^T) \rVert_F^2 + \lVert (A - C)W^T \rVert_F^2 \geq \lVert A(I - WW^T) \rVert_F^2

The minimum over WW is achieved when WW spans the subspace spanned by v1,,vk\mathbf{v}_1, \ldots, \mathbf{v}_k (the top kk right singular vectors), giving:

AAkF2=i=k+1rσi2\lVert A - A_k \rVert_F^2 = \sum_{i=k+1}^{r} \sigma_i^2

The spectral norm result follows because AAk2=σk+1\lVert A - A_k \rVert_2 = \sigma_{k+1} is the largest singular value of AAkA - A_k. \blacksquare

8.5 Pseudoinverse

Definition. The Moore—Penrose pseudoinverse of A=UΣVTA = U \Sigma V^T is

A+=VΣ+UTA^+ = V \Sigma^+ U^T

Where Σ+\Sigma^+ is obtained from Σ\Sigma by transposing and inverting each non-zero singular value:

(Σ+)ii={1/σiifσi>00ifσi=0(\Sigma^+)_{ii} = \begin{cases} 1/\sigma_i & \text{if} \sigma_i > 0 \\ 0 & \text{if} \sigma_i = 0 \end{cases}

Theorem 8.5. The pseudoinverse satisfies the four Moore—Penrose conditions:

  1. AA+A=AAA^+A = A
  2. A+AA+=A+A^+AA^+ = A^+
  3. (AA+)T=AA+(AA^+)^T = AA^+
  4. (A+A)T=A+A(A^+A)^T = A^+A

Proof. Direct computation using A=UΣVTA = U \Sigma V^T and A+=VΣ+UTA^+ = V \Sigma^+ U^T:

AA+A=UΣVTVΣ+UTUΣVT=UΣΣ+ΣVT=UΣVT=AAA^+A = U \Sigma V^T V \Sigma^+ U^T U \Sigma V^T = U \Sigma \Sigma^+ \Sigma V^T = U \Sigma V^T = A

Since ΣΣ+Σ=Σ\Sigma \Sigma^+ \Sigma = \Sigma (the non-zero singular values are preserved, zeros remain zero). The remaining conditions follow similarly. \blacksquare

Theorem 8.6 (Minimum-Norm Least Squares). If Ax=bA\mathbf{x} = \mathbf{b} is inconsistent, then x=A+b\mathbf{x}^* = A^+\mathbf{b} is the least-squares solution of minimum norm.

Proof. The least-squares solutions to AxbA\mathbf{x} \approx \mathbf{b} are x=A+b+(IA+A)z\mathbf{x} = A^+\mathbf{b} + (I - A^+A)\mathbf{z} for arbitrary z\mathbf{z}. Since (IA+A)zker(A)(I - A^+A)\mathbf{z} \in \ker(A) and A+bim(AT)A^+\mathbf{b} \in \mathrm{im}(A^T)These two components are orthogonal. The minimum-norm solution is obtained when z=0\mathbf{z} = \mathbf{0}Giving x=A+b\mathbf{x}^* = A^+\mathbf{b}. \blacksquare

8.6 Condition Number

Definition. The condition number of AA (with respect to the spectral norm) is

κ(A)=A2A+2=σ1σr\kappa(A) = \lVert A \rVert_2 \cdot \lVert A^+ \rVert_2 = \frac{\sigma_1}{\sigma_r}

Where σ1\sigma_1 is the largest and σr\sigma_r is the smallest non-zero singular value.

Theorem 8.7 (Sensitivity of Linear Systems). If Ax=bA\mathbf{x} = \mathbf{b} and A(x+δx)=b+δbA(\mathbf{x} + \delta\mathbf{x}) = \mathbf{b} + \delta\mathbf{b}Then

δxxκ(A)δbb\frac{\lVert \delta\mathbf{x} \rVert}{\lVert \mathbf{x} \rVert} \leq \kappa(A) \cdot \frac{\lVert \delta\mathbf{b} \rVert}{\lVert \mathbf{b} \rVert}

Proof. From Aδx=δbA\delta\mathbf{x} = \delta\mathbf{b}: δx=A1δbA1δb=σr1δb\lVert \delta\mathbf{x} \rVert = \lVert A^{-1}\delta\mathbf{b} \rVert \leq \lVert A^{-1} \rVert \lVert \delta\mathbf{b} \rVert = \sigma_r^{-1} \lVert \delta\mathbf{b} \rVert. From Ax=bA\mathbf{x} = \mathbf{b}: b=Axσ11x\lVert \mathbf{b} \rVert = \lVert A\mathbf{x} \rVert \geq \sigma_1^{-1} \lVert \mathbf{x} \rVert… Wait, Axσrx\lVert A\mathbf{x} \rVert \geq \sigma_r \lVert \mathbf{x} \rVert and bσ1x\lVert \mathbf{b} \rVert \leq \sigma_1 \lVert \mathbf{x} \rVert. Combining: δx/x(σ1/σr)(δb/b)\lVert \delta\mathbf{x} \rVert / \lVert \mathbf{x} \rVert \leq (\sigma_1 / \sigma_r)(\lVert \delta\mathbf{b} \rVert / \lVert \mathbf{b} \rVert). \blacksquare

A matrix with large condition number is ill-conditioned: small perturbations in b\mathbf{b} can cause large changes in x\mathbf{x}.

8.7 Worked Example: Computing the SVD

Problem. Find the SVD of A=(322322)A = \begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix}.

Solution

Step 1: Compute ATA=(322232)(322322)=(174417)A^T A = \begin{pmatrix} 3 & 2 & 2 \\ 2 & 3 & -2 \end{pmatrix}\begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix} = \begin{pmatrix} 17 & 4 \\ 4 & 17 \end{pmatrix}.

Step 2: Eigenvalues of ATAA^T A: det(17λ4417λ)=(17λ)216=λ234λ+273=(λ21)(λ13)\det\begin{pmatrix} 17 - \lambda & 4 \\ 4 & 17 - \lambda \end{pmatrix} = (17 - \lambda)^2 - 16 = \lambda^2 - 34\lambda + 273 = (\lambda - 21)(\lambda - 13).

So σ12=21\sigma_1^2 = 21 and σ22=13\sigma_2^2 = 13Giving σ1=21\sigma_1 = \sqrt{21}, σ2=13\sigma_2 = \sqrt{13}.

Step 3: Eigenvectors of ATAA^T A. For λ=21\lambda = 21: (1721)v1+4v2=0(17 - 21)v_1 + 4v_2 = 0So v1=v2v_1 = v_2. Normalised: v1=12(1,1)T\mathbf{v}_1 = \frac{1}{\sqrt{2}}(1, 1)^T.

For λ=13\lambda = 13: 4v1+(1713)v2=04v_1 + (17 - 13)v_2 = 0So v1=v2v_1 = -v_2. Normalised: v2=12(1,1)T\mathbf{v}_2 = \frac{1}{\sqrt{2}}(1, -1)^T.

V=12(1111)V = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}.

Step 4: Compute ui=Avi/σi\mathbf{u}_i = A\mathbf{v}_i / \sigma_i.

u1=12112(322322)(11)=142(550)=12(110)\mathbf{u}_1 = \frac{1}{\sqrt{21}} \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix}\begin{pmatrix} 1 \\ 1 \end{pmatrix} = \frac{1}{\sqrt{42}}\begin{pmatrix} 5 \\ 5 \\ 0 \end{pmatrix} = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}.

u2=11312(322322)(11)=126(114)\mathbf{u}_2 = \frac{1}{\sqrt{13}} \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix}\begin{pmatrix} 1 \\ -1 \end{pmatrix} = \frac{1}{\sqrt{26}}\begin{pmatrix} 1 \\ -1 \\ 4 \end{pmatrix}.

Since AA is 3×23 \times 2We need a third left singular vector u3\mathbf{u}_3 orthogonal to u1\mathbf{u}_1 and u2\mathbf{u}_2. Compute u3=u1×u2=152(4,4,2)=126(2,2,1)\mathbf{u}_3 = \mathbf{u}_1 \times \mathbf{u}_2 = \frac{1}{\sqrt{52}}(4, -4, -2) = \frac{1}{\sqrt{26}}(2, -2, -1).

U=(1/21/262/261/21/262/2604/261/26)U = \begin{pmatrix} 1/\sqrt{2} & 1/\sqrt{26} & 2/\sqrt{26} \\ 1/\sqrt{2} & -1/\sqrt{26} & -2/\sqrt{26} \\ 0 & 4/\sqrt{26} & -1/\sqrt{26} \end{pmatrix}

Σ=(21001300)\Sigma = \begin{pmatrix} \sqrt{21} & 0 \\ 0 & \sqrt{13} \\ 0 & 0 \end{pmatrix}

A=UΣVT=(1/21/262/261/21/262/2604/261/26)(21001300)12(1111)A = U \Sigma V^T = \begin{pmatrix} 1/\sqrt{2} & 1/\sqrt{26} & 2/\sqrt{26} \\ 1/\sqrt{2} & -1/\sqrt{26} & -2/\sqrt{26} \\ 0 & 4/\sqrt{26} & -1/\sqrt{26} \end{pmatrix}\begin{pmatrix} \sqrt{21} & 0 \\ 0 & \sqrt{13} \\ 0 & 0 \end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

Verification: UU and VV are orthogonal, Σ\Sigma has the correct singular values on the diagonal, and A=UΣVTA = U\Sigma V^T recovers the original matrix. \blacksquare

8.8 Worked Example: Low-Rank Approximation

Problem. Let A=(110001101)A = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 1 \end{pmatrix}. Find the best rank-1 approximation.

Solution

ATA=(101100011)(110001101)=(211110102)A^T A = \begin{pmatrix} 1 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 1 \end{pmatrix}\begin{pmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \end{pmatrix}.

det(ATAλI)=det(2λ1111λ0102λ)=(2λ)[(1λ)(2λ)](2λ)(1λ)\det(A^T A - \lambda I) = \det\begin{pmatrix} 2-\lambda & 1 & 1 \\ 1 & 1-\lambda & 0 \\ 1 & 0 & 2-\lambda \end{pmatrix} = (2-\lambda)[(1-\lambda)(2-\lambda)] - (2-\lambda) - (1-\lambda)

=(2λ)(λ23λ+21)(3λ)=(2λ)(λ23λ+1)(3λ)= (2-\lambda)(\lambda^2 - 3\lambda + 2 - 1) - (3 - \lambda) = (2-\lambda)(\lambda^2 - 3\lambda + 1) - (3 - \lambda)

=2λ26λ+2λ3+3λ2λ3+λ=λ3+5λ26λ1= 2\lambda^2 - 6\lambda + 2 - \lambda^3 + 3\lambda^2 - \lambda - 3 + \lambda = -\lambda^3 + 5\lambda^2 - 6\lambda - 1

Setting to zero and solving: λ35λ2+6λ+1=0\lambda^3 - 5\lambda^2 + 6\lambda + 1 = 0. Testing λ=3\lambda = 3: 2745+18+1=1027 - 45 + 18 + 1 = 1 \neq 0. Testing λ=4\lambda = 4: 6480+24+1=9064 - 80 + 24 + 1 = 9 \neq 0. By numerical methods or the rational root theorem (no rational roots), the eigenvalues are approximately λ15.25\lambda_1 \approx 5.25, λ21.31\lambda_2 \approx 1.31, λ30.56\lambda_3 \approx -0.56.

Wait, ATAA^T A should be positive semi-definite, so all eigenvalues should be non-negative. Let me recompute.

ATA=(211110102)A^T A = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \end{pmatrix}.

tr(ATA)=5\mathrm{tr}(A^T A) = 5So λ1+λ2+λ3=5\lambda_1 + \lambda_2 + \lambda_3 = 5.

det(ATA)=2(2)1(2)1(1)=42+1=3\det(A^T A) = 2(2) - 1(2) - 1(-1) = 4 - 2 + 1 = 3.

\sum of principal 2×22 \times 2 minors =(21)+(41)+(20)=1+3+2=6= (2-1) + (4-1) + (2-0) = 1 + 3 + 2 = 6.

Characteristic polynomial: λ35λ2+6λ3=0\lambda^3 - 5\lambda^2 + 6\lambda - 3 = 0.

Testing λ=1\lambda = 1: 15+63=101 - 5 + 6 - 3 = -1 \neq 0. Testing λ=3\lambda = 3: 2745+183=3027 - 45 + 18 - 3 = -3 \neq 0.

By the trigonometric method for cubics or numerical approximation, λ13.35\lambda_1 \approx 3.35, λ21.35\lambda_2 \approx 1.35, λ30.30\lambda_3 \approx 0.30.

The best rank-1 approximation uses only σ1=λ11.83\sigma_1 = \sqrt{\lambda_1} \approx 1.83 and its corresponding singular vectors, yielding A1=σ1u1v1TA_1 = \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T with error AA1F=λ2+λ31.651.28\lVert A - A_1 \rVert_F = \sqrt{\lambda_2 + \lambda_3} \approx \sqrt{1.65} \approx 1.28. \blacksquare

8.9 Worked Example: Condition Number and Numerical Stability

Problem. Compute the condition number of A=(1111.0001)A = \begin{pmatrix} 1 & 1 \\ 1 & 1.0001 \end{pmatrix} and discuss its implications.

Solution

ATA=(22.00012.00012.00020001)A^T A = \begin{pmatrix} 2 & 2.0001 \\ 2.0001 & 2.00020001 \end{pmatrix}.

det(ATAλI)=(2λ)(2.00020001λ)2.00012\det(A^T A - \lambda I) = (2 - \lambda)(2.00020001 - \lambda) - 2.0001^2

=λ24.00020001λ+4.000400024.00040001=λ24.00020001λ+0.00000001= \lambda^2 - 4.00020001\lambda + 4.00040002 - 4.00040001 = \lambda^2 - 4.00020001\lambda + 0.00000001

λ=4.00020001±16.001600160.0000000424.00020001±4.000199992\lambda = \frac{4.00020001 \pm \sqrt{16.00160016 - 0.00000004}}{2} \approx \frac{4.00020001 \pm 4.00019999}{2}

λ14.00020000,λ20.00000001\lambda_1 \approx 4.00020000, \quad \lambda_2 \approx 0.00000001

σ12.00005,σ20.0001\sigma_1 \approx 2.00005, \quad \sigma_2 \approx 0.0001

κ(A)=σ1/σ220000\kappa(A) = \sigma_1 / \sigma_2 \approx 20000.

This means relative errors in b\mathbf{b} can be amplified by a factor of up to 2000020000 in the solution x\mathbf{x}. The matrix is nearly singular (the two rows are almost linearly dependent), and solving Ax=bA\mathbf{x} = \mathbf{b} in floating-point arithmetic will lose approximately log10(20000)4.3\log_{10}(20000) \approx 4.3 digits of precision. \blacksquare

8.10 Common Pitfalls

  • Singular values are always non-negative. Unlike eigenvalues, which can be negative or complex, singular values are the square roots of eigenvalues of ATAA^T AHence always real and non-negative.
  • The SVD is not unique. If AA has repeated singular values, the corresponding singular vectors can be any orthonormal basis of the eigenspace. The signs of singular vectors can also be flipped in pairs.
  • The pseudoinverse equals the inverse only for square, full-rank matrices. When AA is not full rank, A+AIA^+A \neq I; instead, A+AA^+A is the orthogonal projection onto im(AT)\mathrm{im}(A^T).
  • The SVD and eigendecomposition are different decompositions. The SVD always exists for any matrix, but the eigendecomposition requires the matrix to be square. Even for symmetric matrices, the singular values are λi|\lambda_i|Not λi\lambda_i.

9. Problem Set

Problem 1. Let V=R3V = \mathbb{R}^3 and W={(x,y,z)R3:xy+z=0}W = \{(x, y, z) \in \mathbb{R}^3 : x - y + z = 0\}. Show that WW is a subspace of VV and find its dimension.

Solution

WW is non-empty since 0=(0,0,0)W\mathbf{0} = (0, 0, 0) \in W. If (x1,y1,z1),(x2,y2,z2)W(x_1, y_1, z_1), (x_2, y_2, z_2) \in WThen (x1y1+z1)+(x2y2+z2)=0+0=0(x_1 - y_1 + z_1) + (x_2 - y_2 + z_2) = 0 + 0 = 0So their sum is in WW. Similarly, α(xy+z)=0\alpha(x - y + z) = 0 for any scalar α\alpha. Hence WW is a subspace.

WW is defined by one linear equation, so dim(W)=31=2\dim(W) = 3 - 1 = 2. A basis is {(1,1,0),(1,0,1)}\{(1, 1, 0), (-1, 0, 1)\}.

If you get this wrong, revise: Section 1.3 (Subspace Criterion).

Problem 2. Is the set S={(x,y)R2:xy=0}S = \{(x, y) \in \mathbb{R}^2 : xy = 0\} a subspace of R2\mathbb{R}^2?

Solution

No. (1,0)S(1, 0) \in S and (0,1)S(0, 1) \in SBut (1,0)+(0,1)=(1,1)S(1, 0) + (0, 1) = (1, 1) \notin S since 1101 \cdot 1 \neq 0. SS is not closed under addition.

If you get this wrong, revise: Section 1.3 (Subspace Criterion).

Problem 3. Determine whether the set {1x,1+x,x2}\{1 - x, 1 + x, x^2\} is linearly independent in P2(R)\mathcal{P}_2(\mathbb{R}).

Solution

Suppose a(1x)+b(1+x)+cx2=0a(1 - x) + b(1 + x) + cx^2 = 0 as a polynomial. Then (a+b)+(a+b)x+cx2=0(a + b) + (-a + b)x + cx^2 = 0So a+b=0a + b = 0, a+b=0-a + b = 0, c=0c = 0. From the first two equations: 2a=02a = 0So a=0a = 0Then b=0b = 0. Since a=b=c=0a = b = c = 0The set is linearly independent.

If you get this wrong, revise: Section 2.1 (Linear Independence).

Problem 4. Find a basis for the column space of

A=(1214240636110)A = \begin{pmatrix} 1 & 2 & 1 & 4 \\ 2 & 4 & 0 & 6 \\ 3 & 6 & 1 & 10 \end{pmatrix}

Solution

Row-reduce AA:

(1214240636110)R22R1,R33R1(121400220022)R3R2(121400220000)\begin{pmatrix} 1 & 2 & 1 & 4 \\ 2 & 4 & 0 & 6 \\ 3 & 6 & 1 & 10 \end{pmatrix} \xrightarrow{R_2 - 2R_1, R_3 - 3R_1} \begin{pmatrix} 1 & 2 & 1 & 4 \\ 0 & 0 & -2 & -2 \\ 0 & 0 & -2 & -2 \end{pmatrix} \xrightarrow{R_3 - R_2} \begin{pmatrix} 1 & 2 & 1 & 4 \\ 0 & 0 & -2 & -2 \\ 0 & 0 & 0 & 0 \end{pmatrix}

Pivots are in columns 1 and 3. A basis for col(A)\mathrm{col}(A) is {(1,2,3),(1,0,1)}\{(1, 2, 3), (1, 0, 1)\} (the pivot columns of the original AA). dim(col(A))=2\dim(\mathrm{col}(A)) = 2.

If you get this wrong, revise: Section 2.7 (Worked Examples).

Problem 5. Let U=span{(1,0,1),(0,1,1)}U = \mathrm{span}\{(1, 0, 1), (0, 1, 1)\} and W=span{(1,1,0)}W = \mathrm{span}\{(1, 1, 0)\} in R3\mathbb{R}^3. Verify the dimension formula dim(U+W)=dim(U)+dim(W)dim(UW)\dim(U + W) = \dim(U) + \dim(W) - \dim(U \cap W).

Solution

dim(U)=2\dim(U) = 2 (the two spanning vectors are linearly independent), dim(W)=1\dim(W) = 1. Since dim(U)+dim(W)=3=dim(R3)\dim(U) + \dim(W) = 3 = \dim(\mathbb{R}^3)We have U+W=R3U + W = \mathbb{R}^3So dim(U+W)=3\dim(U + W) = 3. By the dimension formula: dim(UW)=2+13=0\dim(U \cap W) = 2 + 1 - 3 = 0So UW={0}U \cap W = \{\mathbf{0}\}.

We can verify directly: if a(1,0,1)+b(0,1,1)=c(1,1,0)a(1,0,1) + b(0,1,1) = c(1,1,0)Then a=ca = c, b=cb = c, a+b=0a + b = 0 Giving c=0c = 0So only the zero vector is in the intersection.

If you get this wrong, revise: Section 2.5 (Dimension Formula).

Problem 6. Compute det(A)\det(A) using cofactor expansion where

A=(2013012010020301)A = \begin{pmatrix} 2 & 0 & 1 & 3 \\ 0 & 1 & 2 & 0 \\ 1 & 0 & 0 & 2 \\ 0 & 3 & 0 & 1 \end{pmatrix}

Solution

Expand along the second column (which has the most zeros):

det(A)=1det(213102001)+(1)4+23det(201012100)\det(A) = -1 \cdot \det\begin{pmatrix} 2 & 1 & 3 \\ 1 & 0 & 2 \\ 0 & 0 & 1 \end{pmatrix} + (-1)^{4+2} \cdot 3 \cdot \det\begin{pmatrix} 2 & 0 & 1 \\ 0 & 1 & 2 \\ 1 & 0 & 0 \end{pmatrix}

For the first 3×33 \times 3: expand along row 3: 1(2011)=11 \cdot (2 \cdot 0 - 1 \cdot 1) = -1.

For the second 3×33 \times 3: expand along row 3: 1(0211)=11 \cdot (0 \cdot 2 - 1 \cdot 1) = -1.

det(A)=(1)+3(1)=13=2\det(A) = -(-1) + 3(-1) = 1 - 3 = -2.

If you get this wrong, revise: Section 3.4 (Determinants).

Problem 7. Show that if AA is skew-symmetric (AT=AA^T = -A) and nn is odd, then det(A)=0\det(A) = 0.

Solution

det(A)=det(AT)=det(A)=(1)ndet(A)=det(A)\det(A) = \det(A^T) = \det(-A) = (-1)^n \det(A) = -\det(A) (since nn is odd). Therefore det(A)=det(A)\det(A) = -\det(A)So 2det(A)=02\det(A) = 0Giving det(A)=0\det(A) = 0.

If you get this wrong, revise: Section 3.5 (Properties of Determinants).

Problem 8. Use the adjugate formula to find the inverse of

A=(201110013)A = \begin{pmatrix} 2 & 0 & 1 \\ 1 & 1 & 0 \\ 0 & 1 & 3 \end{pmatrix}

Solution

det(A)=2(30)0+1(10)=6+1=7\det(A) = 2(3 - 0) - 0 + 1(1 - 0) = 6 + 1 = 7.

Cofactors: C11=+3C_{11} = +3, C12=3C_{12} = -3, C13=+1C_{13} = +1 C21=+1C_{21} = +1, C22=+6C_{22} = +6, C23=2C_{23} = -2 C31=1C_{31} = -1, C32=+1C_{32} = +1, C33=+2C_{33} = +2

adj(A)=(311361122)\mathrm{adj}(A) = \begin{pmatrix} 3 & 1 & -1 \\ -3 & 6 & 1 \\ 1 & -2 & 2 \end{pmatrix}

A1=17(311361122)A^{-1} = \frac{1}{7}\begin{pmatrix} 3 & 1 & -1 \\ -3 & 6 & 1 \\ 1 & -2 & 2 \end{pmatrix}

If you get this wrong, revise: Section 3.6 (Adjugate and Inverse Formula).

Problem 9. Solve the system by Gaussian elimination:

x+2yz=32x+5y+z=8x+y+4z=2\begin{aligned} x + 2y - z &= 3 \\ 2x + 5y + z &= 8 \\ -x + y + 4z &= 2 \end{aligned}

Solution

(121325181142)R22R1,R3+R1(121301320335)\begin{pmatrix} 1 & 2 & -1 & 3 \\ 2 & 5 & 1 & 8 \\ -1 & 1 & 4 & 2 \end{pmatrix} \xrightarrow{R_2 - 2R_1, R_3 + R_1} \begin{pmatrix} 1 & 2 & -1 & 3 \\ 0 & 1 & 3 & 2 \\ 0 & 3 & 3 & 5 \end{pmatrix}

R33R2(121301320061)\xrightarrow{R_3 - 3R_2} \begin{pmatrix} 1 & 2 & -1 & 3 \\ 0 & 1 & 3 & 2 \\ 0 & 0 & -6 & -1 \end{pmatrix}

From row 3: 6z=1-6z = -1So z=1/6z = 1/6. From row 2: y+3(1/6)=2y + 3(1/6) = 2So y=3/2y = 3/2. From row 1: x+2(3/2)1/6=3x + 2(3/2) - 1/6 = 3So x=33+1/6=1/6x = 3 - 3 + 1/6 = 1/6.

Solution: x=1/6x = 1/6, y=3/2y = 3/2, z=1/6z = 1/6.

If you get this wrong, revise: Section 4.1 (Gaussian Elimination).

Problem 10. Determine whether the following system is consistent using the Rouché—Capelli theorem:

x+y+z=12x+2y+2z=3xy+z=0\begin{aligned} x + y + z &= 1 \\ 2x + 2y + 2z &= 3 \\ x - y + z &= 0 \end{aligned}

Solution

[Ab]=(111122231110)R22R1,R3R1(111100010201)[A \mid \mathbf{b}] = \begin{pmatrix} 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 3 \\ 1 & -1 & 1 & 0 \end{pmatrix} \xrightarrow{R_2 - 2R_1, R_3 - R_1} \begin{pmatrix} 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & -2 & 0 & -1 \end{pmatrix}

rank(A)=2\mathrm{rank}(A) = 2 but rank([Ab])=3\mathrm{rank}([A \mid \mathbf{b}]) = 3 (the row [0 0 0 1][0\ 0\ 0\ 1] is Non-zero). Since rank(A)rank([Ab])\mathrm{rank}(A) \neq \mathrm{rank}([A \mid \mathbf{b}])The system is inconsistent.

If you get this wrong, revise: Section 4.2 (Rouché—Capelli Theorem).

Problem 11. Find the LU decomposition of

A=(121250103)A = \begin{pmatrix} 1 & 2 & -1 \\ 2 & 5 & 0 \\ -1 & 0 & 3 \end{pmatrix}

Solution

m21=2/1=2m_{21} = 2/1 = 2, m31=1/1=1m_{31} = -1/1 = -1:

(121012022)\begin{pmatrix} 1 & 2 & -1 \\ 0 & 1 & 2 \\ 0 & 2 & 2 \end{pmatrix}

m32=2/1=2m_{32} = 2/1 = 2:

U=(121012002),L=(100210121)U = \begin{pmatrix} 1 & 2 & -1 \\ 0 & 1 & 2 \\ 0 & 0 & -2 \end{pmatrix}, \quad L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -1 & 2 & 1 \end{pmatrix}

Verify: LU=(100210121)(121012002)=(121250103)=ALU = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -1 & 2 & 1 \end{pmatrix}\begin{pmatrix} 1 & 2 & -1 \\ 0 & 1 & 2 \\ 0 & 0 & -2 \end{pmatrix} = \begin{pmatrix} 1 & 2 & -1 \\ 2 & 5 & 0 \\ -1 & 0 & 3 \end{pmatrix} = A. \blacksquare

If you get this wrong, revise: Section 4.3 (LU Decomposition).

Problem 12. Find the least squares solution to the system Ax=bA\mathbf{x} = \mathbf{b} where

A=(101112),b=(011)A = \begin{pmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix}

Solution

ATA=(3335)A^T A = \begin{pmatrix} 3 & 3 \\ 3 & 5 \end{pmatrix}, ATb=(23)A^T \mathbf{b} = \begin{pmatrix} 2 \\ 3 \end{pmatrix}.

det(ATA)=159=6\det(A^T A) = 15 - 9 = 6, (ATA)1=16(5333)(A^T A)^{-1} = \frac{1}{6}\begin{pmatrix} 5 & -3 \\ -3 & 3 \end{pmatrix}

x^=16(5333)(23)=16(1096+9)=16(13)=(1/61/2)\hat{\mathbf{x}} = \frac{1}{6}\begin{pmatrix} 5 & -3 \\ -3 & 3 \end{pmatrix}\begin{pmatrix} 2 \\ 3 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 10 - 9 \\ -6 + 9 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 1 \\ 3 \end{pmatrix} = \begin{pmatrix} 1/6 \\ 1/2 \end{pmatrix}

The least squares solution is a=1/6a = 1/6, b=1/2b = 1/2.

If you get this wrong, revise: Section 4.5 (Least Squares Solutions).

Problem 13. Find the eigenvalues and a basis for each eigenspace of

A=(210021002)A = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 2 & 1 \\ 0 & 0 & 2 \end{pmatrix}

Is AA diagonalisable?

Solution

det(AλI)=(2λ)3\det(A - \lambda I) = (2 - \lambda)^3So λ=2\lambda = 2 with algebraic multiplicity 3.

A2I=(010001000)A - 2I = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}Which has rank 2. The null space is spanned by (1,0,0)T(1, 0, 0)^T. So the geometric multiplicity is 1.

Since the geometric multiplicity (1) does not equal the algebraic multiplicity (3), AA is not Diagonalisable. Its Jordan form is J=AJ = A itself (a single 3×33 \times 3 Jordan block).

If you get this wrong, revise: Section 5.3 (Diagonalisation) and Section 5.5 (Jordan Normal Form).

Problem 14. Diagonalise the matrix

A=(200031013)A = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 3 & -1 \\ 0 & -1 & 3 \end{pmatrix}

Solution

det(AλI)=(2λ)[(3λ)21]=(2λ)(λ26λ+8)=(2λ)(λ2)(λ4)\det(A - \lambda I) = (2 - \lambda)[(3-\lambda)^2 - 1] = (2-\lambda)(\lambda^2 - 6\lambda + 8) = (2-\lambda)(\lambda-2)(\lambda-4).

Eigenvalues: λ1=2\lambda_1 = 2 (algebraic multiplicity 2), λ2=4\lambda_2 = 4 (algebraic multiplicity 1).

For λ1=2\lambda_1 = 2: A2I=(000011011)(011000000)A - 2I = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 1 & -1 \\ 0 & -1 & 1 \end{pmatrix} \to \begin{pmatrix} 0 & 1 & -1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}. Eigenspace basis: {(1,0,0),(0,1,1)}\{(1, 0, 0), (0, 1, 1)\}. Geometric multiplicity = 2.

For λ2=4\lambda_2 = 4: A4I=(200011011)(100011000)A - 4I = \begin{pmatrix} -2 & 0 & 0 \\ 0 & -1 & -1 \\ 0 & -1 & -1 \end{pmatrix} \to \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \end{pmatrix}. Eigenspace basis: {(0,1,1)}\{(0, -1, 1)\}. Geometric multiplicity = 1.

Since 2+1=3=n2 + 1 = 3 = n, AA is diagonalisable:

P=(100011011),D=(200020004)P = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & -1 \\ 0 & 1 & 1 \end{pmatrix}, \quad D = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 4 \end{pmatrix}

If you get this wrong, revise: Section 5.3 (Diagonalisation).

Problem 15. Use the Cayley—Hamilton theorem to express A3A^3 as a linear combination of A2A^2, AA And IIWhere A=(1213)A = \begin{pmatrix} 1 & 2 \\ -1 & 3 \end{pmatrix}.

Solution

det(AλI)=(1λ)(3λ)+2=λ24λ+5\det(A - \lambda I) = (1 - \lambda)(3 - \lambda) + 2 = \lambda^2 - 4\lambda + 5.

By Cayley—Hamilton: A24A+5I=0A^2 - 4A + 5I = 0So A2=4A5IA^2 = 4A - 5I.

A3=AA2=A(4A5I)=4A25A=4(4A5I)5A=16A20I5A=11A20IA^3 = A \cdot A^2 = A(4A - 5I) = 4A^2 - 5A = 4(4A - 5I) - 5A = 16A - 20I - 5A = 11A - 20I.

A3=11(1213)20(1001)=(9221113)A^3 = 11\begin{pmatrix} 1 & 2 \\ -1 & 3 \end{pmatrix} - 20\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} -9 & 22 \\ -11 & 13 \end{pmatrix}.

If you get this wrong, revise: Section 5.4 (Cayley—Hamilton Theorem).

Problem 16. Let T:R3R2T : \mathbb{R}^3 \to \mathbb{R}^2 be defined by T(x,y,z)=(x+y,y+z)T(x, y, z) = (x + y, y + z). Find the matrix of TT with respect to the standard bases, and verify the rank-nullity theorem.

Solution

T(1,0,0)=(1,0)T(1, 0, 0) = (1, 0), T(0,1,0)=(1,1)T(0, 1, 0) = (1, 1), T(0,0,1)=(0,1)T(0, 0, 1) = (0, 1).

[T]E=(110011)[T]_{\mathcal{E}} = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix}.

ker(T)={(x,y,z):x+y=0,y+z=0}={(t,t,t):tR}\ker(T) = \{(x, y, z) : x + y = 0, y + z = 0\} = \{(t, -t, t) : t \in \mathbb{R}\}So dim(ker(T))=1\dim(\ker(T)) = 1.

im(T)=span{(1,0),(1,1)}=R2\mathrm{im}(T) = \mathrm{span}\{(1, 0), (1, 1)\} = \mathbb{R}^2So dim(im(T))=2\dim(\mathrm{im}(T)) = 2.

Verify: dim(ker(T))+dim(im(T))=1+2=3=dim(R3)\dim(\ker(T)) + \dim(\mathrm{im}(T)) = 1 + 2 = 3 = \dim(\mathbb{R}^3). \blacksquare

If you get this wrong, revise: Section 6.4 (Rank-Nullity for Linear Maps).

Problem 17. Let V=R3V = \mathbb{R}^3 with the standard inner product. Find the orthogonal projection Of v=(1,2,3)\mathbf{v} = (1, 2, 3) onto the plane WW defined by x+y+z=0x + y + z = 0.

Solution

A basis for WW: {(1,1,0),(1,0,1)}\{(1, -1, 0), (1, 0, -1)\}. Apply Gram—Schmidt:

e1=12(1,1,0)e_1 = \frac{1}{\sqrt{2}}(1, -1, 0).

u2=(1,0,1)(1,0,1),e1e1=(1,0,1)1212(1,1,0)=(12,12,1)\mathbf{u}_2 = (1, 0, -1) - \langle (1, 0, -1), e_1 \rangle e_1 = (1, 0, -1) - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, -1, 0) = (\frac{1}{2}, \frac{1}{2}, -1).

u2=1/4+1/4+1=3/2\lVert \mathbf{u}_2 \rVert = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}So e2=16(1,1,2)e_2 = \frac{1}{\sqrt{6}}(1, 1, -2).

projW(v)=(1,2,3),e1e1+(1,2,3),e2e2\mathrm{proj_W}(\mathbf{v}) = \langle (1,2,3), e_1 \rangle e_1 + \langle (1,2,3), e_2 \rangle e_2

(1,2,3),e1=12(12)=12\langle (1,2,3), e_1 \rangle = \frac{1}{\sqrt{2}}(1 - 2) = \frac{-1}{\sqrt{2}}

(1,2,3),e2=16(1+26)=36\langle (1,2,3), e_2 \rangle = \frac{1}{\sqrt{6}}(1 + 2 - 6) = \frac{-3}{\sqrt{6}}

projW(v)=1212(1,1,0)+3616(1,1,2)\mathrm{proj_W}(\mathbf{v}) = \frac{-1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, -1, 0) + \frac{-3}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(1, 1, -2)

=12(1,1,0)12(1,1,2)=(1,0,1)= -\frac{1}{2}(1, -1, 0) - \frac{1}{2}(1, 1, -2) = (-1, 0, 1).

The orthogonal projection is (1,0,1)(-1, 0, 1). \blacksquare

If you get this wrong, revise: Section 7.5 (Orthogonal Projection).

Problem 18. Prove the Cauchy—Schwarz inequality for Rn\mathbb{R}^n directly: for any nonzero x,yRn\mathbf{x}, \mathbf{y} \in \mathbb{R}^nShow that xyxy\lvert\mathbf{x} \cdot \mathbf{y}\rvert \leq \lVert \mathbf{x} \rVert \lVert \mathbf{y} \rVert And determine when equality holds.

Solution

Consider the function f(t)=x+ty2=x2+2t(xy)+t2y2f(t) = \lVert \mathbf{x} + t\mathbf{y} \rVert^2 = \lVert \mathbf{x} \rVert^2 + 2t(\mathbf{x} \cdot \mathbf{y}) + t^2 \lVert \mathbf{y} \rVert^2.

Since f(t)0f(t) \geq 0 for all tRt \in \mathbb{R}This quadratic in tt has at most one real root, So its discriminant satisfies Δ0\Delta \leq 0:

4(xy)24x2y204(\mathbf{x} \cdot \mathbf{y})^2 - 4\lVert \mathbf{x} \rVert^2 \lVert \mathbf{y} \rVert^2 \leq 0

Therefore (xy)2x2y2(\mathbf{x} \cdot \mathbf{y})^2 \leq \lVert \mathbf{x} \rVert^2 \lVert \mathbf{y} \rVert^2And taking square roots gives the result.

Equality holds iff Δ=0\Delta = 0Which means f(t)f(t) has a double root, i.e., there exists t0t_0 such that x+t0y=0\mathbf{x} + t_0 \mathbf{y} = \mathbf{0}Meaning x\mathbf{x} and y\mathbf{y} are linearly dependent.

If you get this wrong, revise: Section 7.2 (Cauchy—Schwarz Inequality).

Problem 19. Let A=(100021012)A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 2 \end{pmatrix}. Verify the Cayley—Hamilton Theorem by explicitly computing p(A)p(A).

Solution

p(λ)=det(AλI)=(1λ)[(2λ)21]=(1λ)(λ24λ+3)=(1λ)(λ1)(λ3)p(\lambda) = \det(A - \lambda I) = (1 - \lambda)[(2-\lambda)^2 - 1] = (1-\lambda)(\lambda^2 - 4\lambda + 3) = (1-\lambda)(\lambda-1)(\lambda-3).

So p(λ)=(λ1)2(λ3)=(λ35λ2+7λ3)p(\lambda) = -(\lambda-1)^2(\lambda-3) = -(\lambda^3 - 5\lambda^2 + 7\lambda - 3).

p(A)=(A35A2+7A3I)p(A) = -(A^3 - 5A^2 + 7A - 3I).

A2=(100021012)2=(100054045)A^2 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 2 \end{pmatrix}^2 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 5 & 4 \\ 0 & 4 & 5 \end{pmatrix}

A3=AA2=(100021012)(100054045)=(1000141301314)A^3 = A \cdot A^2 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & 0 & 0 \\ 0 & 5 & 4 \\ 0 & 4 & 5 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 14 & 13 \\ 0 & 13 & 14 \end{pmatrix}

p(A)=(1000141301314)+5(100054045)7(100021012)+3(100010001)p(A) = -\begin{pmatrix} 1 & 0 & 0 \\ 0 & 14 & 13 \\ 0 & 13 & 14 \end{pmatrix} + 5\begin{pmatrix} 1 & 0 & 0 \\ 0 & 5 & 4 \\ 0 & 4 & 5 \end{pmatrix} - 7\begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 2 \end{pmatrix} + 3\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}

=(1+57+300014+2514+313+207+0013+207+014+2514+3)=(000000000)= \begin{pmatrix} -1+5-7+3 & 0 & 0 \\ 0 & -14+25-14+3 & -13+20-7+0 \\ 0 & -13+20-7+0 & -14+25-14+3 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}

So p(A)=0p(A) = 0Confirming Cayley—Hamilton. \blacksquare

If you get this wrong, revise: Section 5.4 (Cayley—Hamilton Theorem).

Problem 20. Let T:P2(R)P2(R)T : \mathcal{P}_2(\mathbb{R}) \to \mathcal{P}_2(\mathbb{R}) be defined by T(p)=pT(p) = p' (the derivative). Find the matrix of TT with respect to the basis B={1,x,x2}\mathcal{B} = \{1, x, x^2\}And determine ker(T)\ker(T) and im(T)\mathrm{im}(T).

Solution

T(1)=0=01+0x+0x2T(1) = 0 = 0 \cdot 1 + 0 \cdot x + 0 \cdot x^2So coordinates are (000)\begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}.

T(x)=1=11+0x+0x2T(x) = 1 = 1 \cdot 1 + 0 \cdot x + 0 \cdot x^2So coordinates are (100)\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}.

T(x2)=2x=01+2x+0x2T(x^2) = 2x = 0 \cdot 1 + 2 \cdot x + 0 \cdot x^2So coordinates are (020)\begin{pmatrix} 0 \\ 2 \\ 0 \end{pmatrix}.

[T]B=(010002000)[T]_{\mathcal{B}} = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{pmatrix}

ker(T)={p:p=0}=span{1}\ker(T) = \{p : p' = 0\} = \mathrm{span}\{1\}So dim(ker(T))=1\dim(\ker(T)) = 1.

im(T)={p:pP2}=span{1,x}\mathrm{im}(T) = \{p' : p \in \mathcal{P}_2\} = \mathrm{span}\{1, x\}So dim(im(T))=2\dim(\mathrm{im}(T)) = 2.

Verify: dim(ker(T))+dim(im(T))=1+2=3=dim(P2)\dim(\ker(T)) + \dim(\mathrm{im}(T)) = 1 + 2 = 3 = \dim(\mathcal{P}_2). \blacksquare

If you get this wrong, revise: Section 6.2 (Matrix Representation) and Section 6.4 (Rank-Nullity).

Common Pitfalls

  1. Forgetting to check that solutions satisfy the original equation (especially with squaring both sides or dividing by variables).

  2. Assuming matrix multiplication is commutative — ABBAAB \neq BA.

Worked Examples

Example 1: Eigenvalue Decomposition

Problem. Find the eigenvalues and eigenvectors of A=(4123)A = \begin{pmatrix} 4 & 1 \\ 2 & 3 \end{pmatrix}.

Solution. Characteristic polynomial:

det(AλI)=(4λ)(3λ)2=λ27λ+10=(λ5)(λ2)=0\det(A - \lambda I) = (4-\lambda)(3-\lambda) - 2 = \lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2) = 0

Eigenvalues: λ1=5\lambda_1 = 5, λ2=2\lambda_2 = 2.

For λ1=5\lambda_1 = 5: (A5I)v=0(A-5I)\mathbf{v} = 0: (1122)v=0\begin{pmatrix} -1 & 1 \\ 2 & -2 \end{pmatrix}\mathbf{v} = 0. Eigenvector: v1=(11)\mathbf{v}_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix}.

For λ2=2\lambda_2 = 2: (2121)v=0\begin{pmatrix} 2 & 1 \\ 2 & 1 \end{pmatrix}\mathbf{v} = 0. Eigenvector: v2=(12)\mathbf{v}_2 = \begin{pmatrix} 1 \\ -2 \end{pmatrix}.

\blacksquare

Example 2: Gram-Schmidt Orthogonalisation

Problem. Apply Gram-Schmidt to v1=(1,1,0)\mathbf{v}_1 = (1,1,0), v2=(1,0,1)\mathbf{v}_2 = (1,0,1), v3=(0,1,1)\mathbf{v}_3 = (0,1,1).

Solution.

u1=v1=(1,1,0)\mathbf{u}_1 = \mathbf{v}_1 = (1,1,0).

u2=v2v2u1u1u1u1=(1,0,1)12(1,1,0)=(12,12,1)\mathbf{u}_2 = \mathbf{v}_2 - \frac{\mathbf{v}_2 \cdot \mathbf{u}_1}{\mathbf{u}_1 \cdot \mathbf{u}_1}\mathbf{u}_1 = (1,0,1) - \frac{1}{2}(1,1,0) = (\frac{1}{2}, -\frac{1}{2}, 1).

u3=v3v3u1u1u1u1v3u2u2u2u2=(23,23,23)\mathbf{u}_3 = \mathbf{v}_3 - \frac{\mathbf{v}_3 \cdot \mathbf{u}_1}{\mathbf{u}_1 \cdot \mathbf{u}_1}\mathbf{u}_1 - \frac{\mathbf{v}_3 \cdot \mathbf{u}_2}{\mathbf{u}_2 \cdot \mathbf{u}_2}\mathbf{u}_2 = (-\frac{2}{3}, \frac{2}{3}, \frac{2}{3}).

Orthogonal set: {(1,1,0),(12,12,1),(23,23,23)}\{(1,1,0),\,(\frac{1}{2},-\frac{1}{2},1),\,(-\frac{2}{3},\frac{2}{3},\frac{2}{3})\}.

\blacksquare

Summary

  • Vector spaces: span, linear independence, basis, dimension; subspaces (null space, column space, row space).
  • Linear maps: rank-nullity theorem (dimV=rank(T)+nullity(T)\dim V = \text{rank}(T) + \text{nullity}(T)); matrix representations change with basis.
  • Determinants: det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B); invertible iff det0\det \neq 0; geometric interpretation as volume scaling factor.
  • Eigenvalues and eigenvectors: characteristic polynomial, diagonalisation when eigenvectors form a basis; spectral theorem for symmetric matrices.
  • Inner products: orthogonality, Gram-Schmidt process, orthogonal projections, least-squares solutions.

Cross-References

TopicSiteLink
Abstract AlgebraWyattsNotesView
Multivariable CalculusWyattsNotesView
Real AnalysisWyattsNotesView
Linear Algebra — MIT 18.06MIT OCWView