Skip to main content

Linear Algebra

1. Vectors and Vector Spaces

1.1 Definition of a Vector Space

A vector space over a field FF (typically R\mathbb{R} or C\mathbb{C}) is a set VV equipped with two operations:

  1. Vector addition: +:V×VV+ : V \times V \to V
  2. Scalar multiplication: :F×VV\cdot : F \times V \to V

satisfying the following axioms for all u,v,wV\mathbf{u}, \mathbf{v}, \mathbf{w} \in V and all α,βF\alpha, \beta \in F:

  1. Commutativity: u+v=v+u\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}
  2. Associativity of addition: (u+v)+w=u+(v+w)(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})
  3. Additive identity: There exists 0V\mathbf{0} \in V such that v+0=v\mathbf{v} + \mathbf{0} = \mathbf{v}
  4. Additive inverse: For each v\mathbf{v}, there exists v-\mathbf{v} such that v+(v)=0\mathbf{v} + (-\mathbf{v}) = \mathbf{0}
  5. Compatibility of scalar multiplication: α(βv)=(αβ)v\alpha(\beta \mathbf{v}) = (\alpha\beta)\mathbf{v}
  6. Identity element of scalar multiplication: 1v=v1 \cdot \mathbf{v} = \mathbf{v}
  7. Distributivity over vector addition: α(u+v)=αu+αv\alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v}
  8. Distributivity over scalar addition: (α+β)v=αv+βv(\alpha + \beta)\mathbf{v} = \alpha\mathbf{v} + \beta\mathbf{v}

1.2 Examples

Example 1. Rn\mathbb{R}^n with component-wise addition and scalar multiplication is a vector space over R\mathbb{R}.

Example 2. The set Pn\mathcal{P}_n of all polynomials of degree at most nn with real coefficients, with the usual polynomial addition and scalar multiplication, is a vector space over R\mathbb{R}.

Example 3. The set C[a,b]C[a,b] of all continuous real-valued functions on [a,b][a,b], with point-wise addition and scalar multiplication, is a vector space over R\mathbb{R}.

Example 4. The set Mm×n(R)\mathcal{M}_{m \times n}(\mathbb{R}) of all m×nm \times n real matrices is a vector space over R\mathbb{R}.

1.3 Subspaces

A subspace WW of a vector space VV is a subset WVW \subseteq V that is itself a vector space under the same operations.

Theorem 1.1 (Subspace Criterion). A non-empty subset WVW \subseteq V is a subspace if and only if for all u,vW\mathbf{u}, \mathbf{v} \in W and all αF\alpha \in F:

  1. u+vW\mathbf{u} + \mathbf{v} \in W (closed under addition)
  2. αuW\alpha \mathbf{u} \in W (closed under scalar multiplication)

Proof. If WW is a subspace, closure is immediate from the definition. Conversely, if WW is non-empty and closed under both operations, pick uW\mathbf{u} \in W. Then u=(1)uW-\mathbf{u} = (-1)\mathbf{u} \in W by closure under scalar multiplication, and u+(u)=0W\mathbf{u} + (-\mathbf{u}) = \mathbf{0} \in W by closure under addition. The remaining axioms are inherited from VV. \blacksquare

Example 5. The set of all solutions to the homogeneous equation Ax=0A\mathbf{x} = \mathbf{0} forms a subspace of Rn\mathbb{R}^n, called the null space of AA.

2. Linear Independence, Span, Basis, and Dimension

2.1 Linear Independence

A set of vectors {v1,v2,,vk}V\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\} \subseteq V is linearly independent if the equation

α1v1+α2v2++αkvk=0\alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \cdots + \alpha_k \mathbf{v}_k = \mathbf{0}

implies α1=α2==αk=0\alpha_1 = \alpha_2 = \cdots = \alpha_k = 0. Otherwise the set is linearly dependent.

2.2 Span

The span of a set SVS \subseteq V, denoted span(S)\mathrm{span}(S), is the set of all finite linear combinations of elements of SS:

span(S)={i=1kαivi:kN,αiF,viS}\mathrm{span}(S) = \left\{ \sum_{i=1}^k \alpha_i \mathbf{v}_i : k \in \mathbb{N},\, \alpha_i \in F,\, \mathbf{v}_i \in S \right\}

Proposition 2.1. span(S)\mathrm{span}(S) is always a subspace of VV.

2.3 Basis and Dimension

A set BVB \subseteq V is a basis for VV if:

  1. BB is linearly independent, and
  2. span(B)=V\mathrm{span}(B) = V.

Theorem 2.1. Every vector space has a basis. All bases of a finite-dimensional vector space have the same number of elements.

The dimension of VV, denoted dim(V)\dim(V), is the cardinality of any basis for VV.

Theorem 2.2 (Dimension Formula). If UU and WW are subspaces of a finite-dimensional vector space VV, then

dim(U+W)=dim(U)+dim(W)dim(UW)\dim(U + W) = \dim(U) + \dim(W) - \dim(U \cap W)

2.4 Worked Example

Problem. Determine whether the vectors v1=(1,2,3)\mathbf{v}_1 = (1, 2, 3), v2=(4,5,6)\mathbf{v}_2 = (4, 5, 6), v3=(7,8,9)\mathbf{v}_3 = (7, 8, 9) form a basis for R3\mathbb{R}^3.

Solution. Form the matrix A=[v1v2v3]A = [\mathbf{v}_1 \mid \mathbf{v}_2 \mid \mathbf{v}_3] and compute its determinant:

det(A)=1(4548)2(3642)+3(3235)=3+129=0\det(A) = 1(45 - 48) - 2(36 - 42) + 3(32 - 35) = -3 + 12 - 9 = 0

Since det(A)=0\det(A) = 0, the columns are linearly dependent, so {v1,v2,v3}\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\} is not a basis. In fact, v32v2+v1=0\mathbf{v}_3 - 2\mathbf{v}_2 + \mathbf{v}_1 = \mathbf{0}.

tip

To check if nn vectors in Rn\mathbb{R}^n form a basis, compute the determinant of the matrix whose columns are those vectors. If det0\det \neq 0, they form a basis; if det=0\det = 0, they do not.

3. Matrices

3.1 Matrix Operations

An m×nm \times n matrix AA over FF is a rectangular array of mnmn elements from FF, arranged in mm rows and nn columns. The set of all such matrices is denoted Mm×n(F)\mathcal{M}_{m \times n}(F).

Addition. For A,BMm×n(F)A, B \in \mathcal{M}_{m \times n}(F), (A+B)ij=Aij+Bij(A + B)_{ij} = A_{ij} + B_{ij}.

Scalar multiplication. For αF\alpha \in F and AMm×n(F)A \in \mathcal{M}_{m \times n}(F), (αA)ij=αAij(\alpha A)_{ij} = \alpha A_{ij}.

Matrix multiplication. For AMm×n(F)A \in \mathcal{M}_{m \times n}(F) and BMn×p(F)B \in \mathcal{M}_{n \times p}(F), the product ABMm×p(F)AB \in \mathcal{M}_{m \times p}(F) is defined by

(AB)ij=k=1nAikBkj(AB)_{ij} = \sum_{k=1}^n A_{ik} B_{kj}

Proposition 3.1. Matrix multiplication is associative but not commutative in general.

3.2 Transpose

The transpose of AMm×n(F)A \in \mathcal{M}_{m \times n}(F), denoted ATA^T, is the n×mn \times m matrix with (AT)ij=Aji(A^T)_{ij} = A_{ji}.

Properties of transpose:

  1. (A+B)T=AT+BT(A + B)^T = A^T + B^T
  2. (αA)T=αAT(\alpha A)^T = \alpha A^T
  3. (AB)T=BTAT(AB)^T = B^T A^T
  4. (AT)T=A(A^T)^T = A

3.3 Inverse Matrices

A square matrix AMn×n(F)A \in \mathcal{M}_{n \times n}(F) is invertible if there exists a matrix A1Mn×n(F)A^{-1} \in \mathcal{M}_{n \times n}(F) such that

AA1=A1A=InAA^{-1} = A^{-1}A = I_n

Theorem 3.1. The following are equivalent for AMn×n(F)A \in \mathcal{M}_{n \times n}(F):

  1. AA is invertible.
  2. det(A)0\det(A) \neq 0.
  3. The columns of AA are linearly independent.
  4. The rows of AA are linearly independent.
  5. rank(A)=n\mathrm{rank}(A) = n.
  6. The equation Ax=bA\mathbf{x} = \mathbf{b} has a unique solution for every b\mathbf{b}.
  7. The only solution to Ax=0A\mathbf{x} = \mathbf{0} is x=0\mathbf{x} = \mathbf{0}.

3.4 Determinants

The determinant is a function det:Mn×n(F)F\det : \mathcal{M}_{n \times n}(F) \to F defined recursively by Laplace expansion along the first row:

det(A)=j=1n(1)1+ja1jM1j\det(A) = \sum_{j=1}^n (-1)^{1+j} a_{1j} M_{1j}

where M1jM_{1j} is the (1,j)(1,j)-minor (the determinant of the (n1)×(n1)(n-1) \times (n-1) matrix obtained by deleting row 1 and column jj).

Properties of determinants:

  1. det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)
  2. det(AT)=det(A)\det(A^T) = \det(A)
  3. det(A1)=1/det(A)\det(A^{-1}) = 1 / \det(A)
  4. Swapping two rows changes the sign of the determinant.
  5. Multiplying a row by α\alpha multiplies the determinant by α\alpha.
  6. Adding a multiple of one row to another leaves the determinant unchanged.

Worked Example. Compute det(A)\det(A) where

A=(123014560)A = \begin{pmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{pmatrix}

Solution. Expanding along the first column:

det(A)=1det(1460)0+5det(2314)\det(A) = 1 \cdot \det\begin{pmatrix} 1 & 4 \\ 6 & 0 \end{pmatrix} - 0 + 5 \cdot \det\begin{pmatrix} 2 & 3 \\ 1 & 4 \end{pmatrix}

=1(024)+5(83)=24+25=1= 1 \cdot (0 - 24) + 5 \cdot (8 - 3) = -24 + 25 = 1

Common Pitfall

The determinant is only defined for square matrices. There is no meaningful determinant for an m×nm \times n matrix with mnm \neq n. Do not confuse det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B) with a non-existent formula for non-square matrices.

4. Systems of Linear Equations

4.1 Gaussian Elimination

A system of mm linear equations in nn unknowns can be written as Ax=bA\mathbf{x} = \mathbf{b}, where AMm×n(R)A \in \mathcal{M}_{m \times n}(\mathbb{R}), xRn\mathbf{x} \in \mathbb{R}^n, and bRm\mathbf{b} \in \mathbb{R}^m.

Gaussian elimination transforms the augmented matrix [Ab][A \mid \mathbf{b}] into row echelon form (REF) using elementary row operations:

  1. Swap two rows (RiRjR_i \leftrightarrow R_j).
  2. Multiply a row by a non-zero scalar (RiαRiR_i \to \alpha R_i).
  3. Add a multiple of one row to another (RiRi+αRjR_i \to R_i + \alpha R_j).

A matrix is in REF if:

  • All zero rows are at the bottom.
  • The leading entry (pivot) of each non-zero row is strictly to the right of the pivot above.
  • All entries below a pivot are zero.

Reduced row echelon form (RREF) additionally requires:

  • Each pivot is 1.
  • Each pivot is the only non-zero entry in its column.

Theorem 4.1. Every matrix has a unique RREF. Two matrices are row-equivalent if and only if they have the same RREF.

4.2 Existence and Uniqueness

Theorem 4.2 (Rouché-Capelli). The system Ax=bA\mathbf{x} = \mathbf{b} is consistent (has at least one solution) if and only if

rank(A)=rank([Ab])\mathrm{rank}(A) = \mathrm{rank}([A \mid \mathbf{b}])

If consistent, the solution set has dim(null(A))\dim(\mathrm{null}(A)) free parameters, where dim(null(A))=nrank(A)\dim(\mathrm{null}(A)) = n - \mathrm{rank}(A).

4.3 LU Decomposition

An LU decomposition of AMn×n(R)A \in \mathcal{M}_{n \times n}(\mathbb{R}) writes A=LUA = LU where LL is lower triangular with 1s on the diagonal, and UU is upper triangular.

Theorem 4.3. If Gaussian elimination can be performed on AA without row exchanges, then AA admits an LU decomposition.

Algorithm. Store the multipliers mijm_{ij} (used to eliminate entry aija_{ij}) in the lower triangular portion. The resulting upper triangular matrix is UU, and the multipliers form LL.

Worked Example. Find the LU decomposition of

A=(211433879)A = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix}

Solution.

Step 1: Eliminate below a11a_{11}. m21=4/2=2m_{21} = 4/2 = 2, m31=8/2=4m_{31} = 8/2 = 4.

(211011035)\begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 3 & 5 \end{pmatrix}

Step 2: Eliminate below a22a_{22}. m32=3/1=3m_{32} = 3/1 = 3.

U=(211011002)U = \begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{pmatrix}

L=(100210431)L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{pmatrix}

Verify: LU=(211433879)=ALU = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix} = A. \blacksquare

To solve Ax=bA\mathbf{x} = \mathbf{b}, first solve Ly=bL\mathbf{y} = \mathbf{b} (forward substitution), then Ux=yU\mathbf{x} = \mathbf{y} (back substitution).

5. Eigenvalues and Eigenvectors

5.1 Definitions

Let AMn×n(F)A \in \mathcal{M}_{n \times n}(F). A scalar λF\lambda \in F is an eigenvalue of AA if there exists a non-zero vector vFn\mathbf{v} \in F^n such that

Av=λvA\mathbf{v} = \lambda \mathbf{v}

The vector v\mathbf{v} is called an eigenvector corresponding to λ\lambda.

5.2 Characteristic Polynomial

Theorem 5.1. λ\lambda is an eigenvalue of AA if and only if det(AλI)=0\det(A - \lambda I) = 0.

The polynomial p(λ)=det(AλI)p(\lambda) = \det(A - \lambda I) is called the characteristic polynomial of AA. Its roots (in the algebraic closure of FF) are the eigenvalues of AA.

5.3 Diagonalisation

Definition. AA is diagonalisable if there exists an invertible matrix PP and a diagonal matrix DD such that

A=PDP1A = PDP^{-1}

Theorem 5.2. AA is diagonalisable (over FF) if and only if AA has nn linearly independent eigenvectors (over FF).

Corollary 5.3. If AA has nn distinct eigenvalues, then AA is diagonalisable.

5.4 Spectral Theorem for Real Symmetric Matrices

Theorem 5.4 (Spectral Theorem). If AMn×n(R)A \in \mathcal{M}_{n \times n}(\mathbb{R}) is symmetric (A=ATA = A^T), then:

  1. All eigenvalues of AA are real.
  2. AA has nn linearly independent orthonormal eigenvectors.
  3. AA is orthogonally diagonalisable: A=QDQTA = QDQ^T where QQ is orthogonal (QTQ=IQ^TQ = I).

5.5 Worked Example

Problem. Find the eigenvalues and eigenvectors of

A=(4123)A = \begin{pmatrix} 4 & 1 \\ 2 & 3 \end{pmatrix}

Solution. The characteristic polynomial is

det(AλI)=det(4λ123λ)=(4λ)(3λ)2\det(A - \lambda I) = \det\begin{pmatrix} 4 - \lambda & 1 \\ 2 & 3 - \lambda \end{pmatrix} = (4 - \lambda)(3 - \lambda) - 2

=λ27λ+10=(λ5)(λ2)= \lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2)

So the eigenvalues are λ1=5\lambda_1 = 5 and λ2=2\lambda_2 = 2.

For λ1=5\lambda_1 = 5: Solve (A5I)v=0(A - 5I)\mathbf{v} = \mathbf{0}.

(1122)v=0    v1+v2=0    v=t(11)\begin{pmatrix} -1 & 1 \\ 2 & -2 \end{pmatrix}\mathbf{v} = \mathbf{0} \implies -v_1 + v_2 = 0 \implies \mathbf{v} = t\begin{pmatrix} 1 \\ 1 \end{pmatrix}

For λ2=2\lambda_2 = 2: Solve (A2I)v=0(A - 2I)\mathbf{v} = \mathbf{0}.

(2121)v=0    2v1+v2=0    v=t(12)\begin{pmatrix} 2 & 1 \\ 2 & 1 \end{pmatrix}\mathbf{v} = \mathbf{0} \implies 2v_1 + v_2 = 0 \implies \mathbf{v} = t\begin{pmatrix} 1 \\ -2 \end{pmatrix}

Therefore A=PDP1A = PDP^{-1} with

P=(1112),D=(5002)P = \begin{pmatrix} 1 & 1 \\ 1 & -2 \end{pmatrix}, \quad D = \begin{pmatrix} 5 & 0 \\ 0 & 2 \end{pmatrix}

Common Pitfall

Not every matrix is diagonalisable. For example, A=(1101)A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} has eigenvalue λ=1\lambda = 1 with algebraic multiplicity 2 but geometric multiplicity 1. It has only one linearly independent eigenvector and is not diagonalisable.

6. Linear Transformations

6.1 Definition

A linear transformation (or linear map) T:VWT : V \to W between vector spaces VV and WW over FF is a function satisfying:

  1. T(u+v)=T(u)+T(v)T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}) for all u,vV\mathbf{u}, \mathbf{v} \in V
  2. T(αv)=αT(v)T(\alpha \mathbf{v}) = \alpha T(\mathbf{v}) for all αF\alpha \in F, vV\mathbf{v} \in V

The set of all linear transformations from VV to WW is denoted L(V,W)\mathcal{L}(V, W).

6.2 Matrix Representation

If VV and WW are finite-dimensional with bases BV={v1,,vn}\mathcal{B}_V = \{\mathbf{v}_1, \ldots, \mathbf{v}_n\} and BW={w1,,wm}\mathcal{B}_W = \{\mathbf{w}_1, \ldots, \mathbf{w}_m\}, then every TL(V,W)T \in \mathcal{L}(V, W) is uniquely represented by a matrix [T]BVBWMm×n(F)[T]_{\mathcal{B}_V}^{\mathcal{B}_W} \in \mathcal{M}_{m \times n}(F) where the jj-th column is the coordinate vector of T(vj)T(\mathbf{v}_j) with respect to BW\mathcal{B}_W.

6.3 Kernel and Image

The kernel (null space) and image (range) of TT are:

ker(T)={vV:T(v)=0}\ker(T) = \{\mathbf{v} \in V : T(\mathbf{v}) = \mathbf{0}\} im(T)={T(v):vV}\mathrm{im}(T) = \{T(\mathbf{v}) : \mathbf{v} \in V\}

Theorem 6.1 (Rank-Nullity Theorem). For TL(V,W)T \in \mathcal{L}(V, W) with VV finite-dimensional:

dim(ker(T))+dim(im(T))=dim(V)\dim(\ker(T)) + \dim(\mathrm{im}(T)) = \dim(V)

6.4 Isomorphisms

A linear transformation T:VWT : V \to W is an isomorphism if it is bijective. We write VWV \cong W.

Theorem 6.2. TT is an isomorphism if and only if ker(T)={0}\ker(T) = \{\mathbf{0}\} and im(T)=W\mathrm{im}(T) = W.

Corollary 6.3. If dim(V)=dim(W)<\dim(V) = \dim(W) < \infty, then TT is injective if and only if TT is surjective.

6.5 Change of Basis

If PP is the change-of-basis matrix from basis B\mathcal{B} to basis B\mathcal{B}', then for a linear transformation TT with matrix representations [T]B[T]_{\mathcal{B}} and [T]B[T]_{\mathcal{B}'}:

[T]B=P1[T]BP[T]_{\mathcal{B}'} = P^{-1}[T]_{\mathcal{B}} P

This is the similarity transformation. Similar matrices represent the same linear transformation in different bases and share the same eigenvalues, determinant, and trace.

7. Inner Product Spaces

7.1 Definition of an Inner Product

An inner product on a vector space VV over FF (where F=RF = \mathbb{R} or C\mathbb{C}) is a function ,:V×VF\langle \cdot, \cdot \rangle : V \times V \to F satisfying:

  1. Conjugate symmetry: u,v=v,u\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}
  2. Linearity in the first argument: αu+βw,v=αu,v+βw,v\langle \alpha\mathbf{u} + \beta\mathbf{w}, \mathbf{v} \rangle = \alpha\langle \mathbf{u}, \mathbf{v} \rangle + \beta\langle \mathbf{w}, \mathbf{v} \rangle
  3. Positive definiteness: v,v0\langle \mathbf{v}, \mathbf{v} \rangle \geq 0 with equality iff v=0\mathbf{v} = \mathbf{0}

A vector space equipped with an inner product is called an inner product space.

7.2 Norms

Every inner product induces a norm:

v=v,v\lVert \mathbf{v} \rVert = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}

Theorem 7.1 (Cauchy-Schwarz Inequality).

u,vuv|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \lVert \mathbf{u} \rVert \, \lVert \mathbf{v} \rVert

with equality if and only if u\mathbf{u} and v\mathbf{v} are linearly dependent.

Theorem 7.2 (Triangle Inequality).

u+vu+v\lVert \mathbf{u} + \mathbf{v} \rVert \leq \lVert \mathbf{u} \rVert + \lVert \mathbf{v} \rVert

7.3 Orthogonality

Two vectors u,v\mathbf{u}, \mathbf{v} are orthogonal if u,v=0\langle \mathbf{u}, \mathbf{v} \rangle = 0.

An orthonormal set {e1,,ek}\{e_1, \ldots, e_k\} satisfies ei,ej=δij\langle e_i, e_j \rangle = \delta_{ij}.

Theorem 7.3 (Pythagorean Theorem). If uv\mathbf{u} \perp \mathbf{v}, then

u+v2=u2+v2\lVert \mathbf{u} + \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + \lVert \mathbf{v} \rVert^2

7.4 Gram-Schmidt Process

The Gram-Schmidt process converts a linearly independent set {v1,,vn}\{\mathbf{v}_1, \ldots, \mathbf{v}_n\} into an orthonormal set {e1,,en}\{e_1, \ldots, e_n\}:

u1=v1,e1=u1u1\mathbf{u}_1 = \mathbf{v}_1, \quad e_1 = \frac{\mathbf{u}_1}{\lVert \mathbf{u}_1 \rVert}

uk=vki=1k1vk,eiei,ek=ukuk\mathbf{u}_k = \mathbf{v}_k - \sum_{i=1}^{k-1} \langle \mathbf{v}_k, e_i \rangle e_i, \quad e_k = \frac{\mathbf{u}_k}{\lVert \mathbf{u}_k \rVert}

7.5 Orthogonal Projection

The orthogonal projection of v\mathbf{v} onto a subspace WW with orthonormal basis {e1,,ek}\{e_1, \ldots, e_k\} is

projW(v)=i=1kv,eiei\mathrm{proj}_W(\mathbf{v}) = \sum_{i=1}^k \langle \mathbf{v}, e_i \rangle e_i

Theorem 7.4 (Best Approximation). Among all vectors in WW, the orthogonal projection projW(v)\mathrm{proj}_W(\mathbf{v}) minimises the distance to v\mathbf{v}:

vprojW(v)vwforallwW\lVert \mathbf{v} - \mathrm{proj}_W(\mathbf{v}) \rVert \leq \lVert \mathbf{v} - \mathbf{w} \rVert \quad \mathrm{for all } \mathbf{w} \in W

7.6 Worked Example

Problem. Apply the Gram-Schmidt process to v1=(1,1,1)\mathbf{v}_1 = (1, 1, 1), v2=(1,0,1)\mathbf{v}_2 = (1, 0, 1), v3=(0,1,0)\mathbf{v}_3 = (0, 1, 0) in R3\mathbb{R}^3 with the standard inner product.

Solution.

e1=(1,1,1)3=13(1,1,1)e_1 = \frac{(1,1,1)}{\sqrt{3}} = \frac{1}{\sqrt{3}}(1, 1, 1)

u2=(1,0,1)(1,0,1),e1e1=(1,0,1)2313(1,1,1)=(1,0,1)23(1,1,1)=(13,23,13)\mathbf{u}_2 = (1,0,1) - \langle (1,0,1), e_1 \rangle e_1 = (1,0,1) - \frac{2}{\sqrt{3}} \cdot \frac{1}{\sqrt{3}}(1,1,1) = (1,0,1) - \frac{2}{3}(1,1,1) = (\frac{1}{3}, -\frac{2}{3}, \frac{1}{3})

e2=(13,23,13)6/3=16(1,2,1)e_2 = \frac{(\frac{1}{3}, -\frac{2}{3}, \frac{1}{3})}{\sqrt{6}/3} = \frac{1}{\sqrt{6}}(1, -2, 1)

u3=(0,1,0)(0,1,0),e1e1(0,1,0),e2e2=(0,1,0)13(1,1,1)+26(1,2,1)=(0,1,0)(13,13,13)+(13,23,13)=(1313,11323,13+13)\mathbf{u}_3 = (0,1,0) - \langle (0,1,0), e_1 \rangle e_1 - \langle (0,1,0), e_2 \rangle e_2 = (0,1,0) - \frac{1}{3}(1,1,1) + \frac{2}{6}(1,-2,1) = (0,1,0) - (\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) + (\frac{1}{3}, -\frac{2}{3}, \frac{1}{3}) = (\frac{1}{3} - \frac{1}{3}, 1 - \frac{1}{3} - \frac{2}{3}, -\frac{1}{3} + \frac{1}{3}) \cdot \ldots

Verifying: u3=(0,1,0)13(1,1,1)26(1,2,1)=(0,1,0)(13,13,13)+(13,23,13)=(0,0,0)\mathbf{u}_3 = (0, 1, 0) - \frac{1}{3}(1,1,1) - \frac{-2}{6}(1,-2,1) = (0,1,0) - (\frac{1}{3},\frac{1}{3},\frac{1}{3}) + (\frac{1}{3},-\frac{2}{3},\frac{1}{3}) = (0, 0, 0)

This reveals that v3\mathbf{v}_3 is a linear combination of v1\mathbf{v}_1 and v2\mathbf{v}_2. Indeed, v3=v1v2\mathbf{v}_3 = \mathbf{v}_1 - \mathbf{v}_2. The original set was linearly dependent. We should have started with a linearly independent set. \blacksquare

Common Pitfall

The Gram-Schmidt process requires a linearly independent starting set. If the input vectors are linearly dependent, one of the uk\mathbf{u}_k will be the zero vector, and the process will fail (attempting to divide by zero in the normalisation step).