1. Vectors and Vector Spaces
1.1 Definition of a Vector Space
A vector space over a field F F F (typically R \mathbb{R} R or C \mathbb{C} C ) is a set V V V equipped
with two operations:
Vector addition : + : V × V → V + : V \times V \to V + : V × V → V
Scalar multiplication : ⋅ : F × V → V \cdot : F \times V \to V ⋅ : F × V → V
satisfying the following axioms for all u , v , w ∈ V \mathbf{u}, \mathbf{v}, \mathbf{w} \in V u , v , w ∈ V and all
α , β ∈ F \alpha, \beta \in F α , β ∈ F :
Commutativity : u + v = v + u \mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u} u + v = v + u
Associativity of addition : ( u + v ) + w = u + ( v + w ) (\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w}) ( u + v ) + w = u + ( v + w )
Additive identity : There exists 0 ∈ V \mathbf{0} \in V 0 ∈ V such that v + 0 = v \mathbf{v} + \mathbf{0} = \mathbf{v} v + 0 = v
Additive inverse : For each v \mathbf{v} v , there exists − v -\mathbf{v} − v such that v + ( − v ) = 0 \mathbf{v} + (-\mathbf{v}) = \mathbf{0} v + ( − v ) = 0
Compatibility of scalar multiplication : α ( β v ) = ( α β ) v \alpha(\beta \mathbf{v}) = (\alpha\beta)\mathbf{v} α ( β v ) = ( α β ) v
Identity element of scalar multiplication : 1 ⋅ v = v 1 \cdot \mathbf{v} = \mathbf{v} 1 ⋅ v = v
Distributivity over vector addition : α ( u + v ) = α u + α v \alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v} α ( u + v ) = α u + α v
Distributivity over scalar addition : ( α + β ) v = α v + β v (\alpha + \beta)\mathbf{v} = \alpha\mathbf{v} + \beta\mathbf{v} ( α + β ) v = α v + β v
1.2 Examples
Example 1. R n \mathbb{R}^n R n with component-wise addition and scalar multiplication is a vector space
over R \mathbb{R} R .
Example 2. The set P n \mathcal{P}_n P n of all polynomials of degree at most n n n with real coefficients,
with the usual polynomial addition and scalar multiplication, is a vector space over R \mathbb{R} R .
Example 3. The set C [ a , b ] C[a,b] C [ a , b ] of all continuous real-valued functions on [ a , b ] [a,b] [ a , b ] , with point-wise
addition and scalar multiplication, is a vector space over R \mathbb{R} R .
Example 4. The set M m × n ( R ) \mathcal{M}_{m \times n}(\mathbb{R}) M m × n ( R ) of all m × n m \times n m × n real matrices is a
vector space over R \mathbb{R} R .
1.3 Subspaces
A subspace W W W of a vector space V V V is a subset W ⊆ V W \subseteq V W ⊆ V that is itself a vector space
under the same operations.
Theorem 1.1 (Subspace Criterion). A non-empty subset W ⊆ V W \subseteq V W ⊆ V is a subspace if and only
if for all u , v ∈ W \mathbf{u}, \mathbf{v} \in W u , v ∈ W and all α ∈ F \alpha \in F α ∈ F :
u + v ∈ W \mathbf{u} + \mathbf{v} \in W u + v ∈ W (closed under addition)
α u ∈ W \alpha \mathbf{u} \in W α u ∈ W (closed under scalar multiplication)
Proof. If W W W is a subspace, closure is immediate from the definition. Conversely, if W W W is
non-empty and closed under both operations, pick u ∈ W \mathbf{u} \in W u ∈ W . Then − u = ( − 1 ) u ∈ W -\mathbf{u} = (-1)\mathbf{u} \in W − u = ( − 1 ) u ∈ W
by closure under scalar multiplication, and u + ( − u ) = 0 ∈ W \mathbf{u} + (-\mathbf{u}) = \mathbf{0} \in W u + ( − u ) = 0 ∈ W by closure
under addition. The remaining axioms are inherited from V V V . ■ \blacksquare ■
Example 5. The set of all solutions to the homogeneous equation A x = 0 A\mathbf{x} = \mathbf{0} A x = 0 forms a
subspace of R n \mathbb{R}^n R n , called the null space of A A A .
2. Linear Independence, Span, Basis, and Dimension
2.1 Linear Independence
A set of vectors { v 1 , v 2 , … , v k } ⊆ V \{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\} \subseteq V { v 1 , v 2 , … , v k } ⊆ V is linearly
independent if the equation
α 1 v 1 + α 2 v 2 + ⋯ + α k v k = 0 \alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \cdots + \alpha_k \mathbf{v}_k = \mathbf{0} α 1 v 1 + α 2 v 2 + ⋯ + α k v k = 0
implies α 1 = α 2 = ⋯ = α k = 0 \alpha_1 = \alpha_2 = \cdots = \alpha_k = 0 α 1 = α 2 = ⋯ = α k = 0 . Otherwise the set is linearly dependent .
2.2 Span
The span of a set S ⊆ V S \subseteq V S ⊆ V , denoted s p a n ( S ) \mathrm{span}(S) span ( S ) , is the set of all finite linear
combinations of elements of S S S :
s p a n ( S ) = { ∑ i = 1 k α i v i : k ∈ N , α i ∈ F , v i ∈ S } \mathrm{span}(S) = \left\{ \sum_{i=1}^k \alpha_i \mathbf{v}_i : k \in \mathbb{N},\, \alpha_i \in F,\, \mathbf{v}_i \in S \right\} span ( S ) = { ∑ i = 1 k α i v i : k ∈ N , α i ∈ F , v i ∈ S }
Proposition 2.1. s p a n ( S ) \mathrm{span}(S) span ( S ) is always a subspace of V V V .
2.3 Basis and Dimension
A set B ⊆ V B \subseteq V B ⊆ V is a basis for V V V if:
B B B is linearly independent, and
s p a n ( B ) = V \mathrm{span}(B) = V span ( B ) = V .
Theorem 2.1. Every vector space has a basis. All bases of a finite-dimensional vector space have
the same number of elements.
The dimension of V V V , denoted dim ( V ) \dim(V) dim ( V ) , is the cardinality of any basis for V V V .
Theorem 2.2 (Dimension Formula). If U U U and W W W are subspaces of a finite-dimensional vector
space V V V , then
dim ( U + W ) = dim ( U ) + dim ( W ) − dim ( U ∩ W ) \dim(U + W) = \dim(U) + \dim(W) - \dim(U \cap W) dim ( U + W ) = dim ( U ) + dim ( W ) − dim ( U ∩ W )
2.4 Worked Example
Problem. Determine whether the vectors
v 1 = ( 1 , 2 , 3 ) \mathbf{v}_1 = (1, 2, 3) v 1 = ( 1 , 2 , 3 ) , v 2 = ( 4 , 5 , 6 ) \mathbf{v}_2 = (4, 5, 6) v 2 = ( 4 , 5 , 6 ) , v 3 = ( 7 , 8 , 9 ) \mathbf{v}_3 = (7, 8, 9) v 3 = ( 7 , 8 , 9 ) form a basis
for R 3 \mathbb{R}^3 R 3 .
Solution. Form the matrix A = [ v 1 ∣ v 2 ∣ v 3 ] A = [\mathbf{v}_1 \mid \mathbf{v}_2 \mid \mathbf{v}_3] A = [ v 1 ∣ v 2 ∣ v 3 ] and compute
its determinant:
det ( A ) = 1 ( 45 − 48 ) − 2 ( 36 − 42 ) + 3 ( 32 − 35 ) = − 3 + 12 − 9 = 0 \det(A) = 1(45 - 48) - 2(36 - 42) + 3(32 - 35) = -3 + 12 - 9 = 0 det ( A ) = 1 ( 45 − 48 ) − 2 ( 36 − 42 ) + 3 ( 32 − 35 ) = − 3 + 12 − 9 = 0
Since det ( A ) = 0 \det(A) = 0 det ( A ) = 0 , the columns are linearly dependent, so { v 1 , v 2 , v 3 } \{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\} { v 1 , v 2 , v 3 }
is not a basis. In fact, v 3 − 2 v 2 + v 1 = 0 \mathbf{v}_3 - 2\mathbf{v}_2 + \mathbf{v}_1 = \mathbf{0} v 3 − 2 v 2 + v 1 = 0 .
To check if n n n vectors in R n \mathbb{R}^n R n form a basis, compute the determinant of the matrix whose
columns are those vectors. If det ≠ 0 \det \neq 0 det = 0 , they form a basis; if det = 0 \det = 0 det = 0 , they do not.
3. Matrices
3.1 Matrix Operations
An m × n m \times n m × n matrix A A A over F F F is a rectangular array of m n mn mn elements from F F F , arranged in
m m m rows and n n n columns. The set of all such matrices is denoted M m × n ( F ) \mathcal{M}_{m \times n}(F) M m × n ( F ) .
Addition. For A , B ∈ M m × n ( F ) A, B \in \mathcal{M}_{m \times n}(F) A , B ∈ M m × n ( F ) , ( A + B ) i j = A i j + B i j (A + B)_{ij} = A_{ij} + B_{ij} ( A + B ) ij = A ij + B ij .
Scalar multiplication. For α ∈ F \alpha \in F α ∈ F and A ∈ M m × n ( F ) A \in \mathcal{M}_{m \times n}(F) A ∈ M m × n ( F ) ,
( α A ) i j = α A i j (\alpha A)_{ij} = \alpha A_{ij} ( α A ) ij = α A ij .
Matrix multiplication. For A ∈ M m × n ( F ) A \in \mathcal{M}_{m \times n}(F) A ∈ M m × n ( F ) and B ∈ M n × p ( F ) B \in \mathcal{M}_{n \times p}(F) B ∈ M n × p ( F ) ,
the product A B ∈ M m × p ( F ) AB \in \mathcal{M}_{m \times p}(F) A B ∈ M m × p ( F ) is defined by
( A B ) i j = ∑ k = 1 n A i k B k j (AB)_{ij} = \sum_{k=1}^n A_{ik} B_{kj} ( A B ) ij = ∑ k = 1 n A ik B k j
Proposition 3.1. Matrix multiplication is associative but not commutative in general.
3.2 Transpose
The transpose of A ∈ M m × n ( F ) A \in \mathcal{M}_{m \times n}(F) A ∈ M m × n ( F ) , denoted A T A^T A T , is the n × m n \times m n × m matrix
with ( A T ) i j = A j i (A^T)_{ij} = A_{ji} ( A T ) ij = A j i .
Properties of transpose:
( A + B ) T = A T + B T (A + B)^T = A^T + B^T ( A + B ) T = A T + B T
( α A ) T = α A T (\alpha A)^T = \alpha A^T ( α A ) T = α A T
( A B ) T = B T A T (AB)^T = B^T A^T ( A B ) T = B T A T
( A T ) T = A (A^T)^T = A ( A T ) T = A
3.3 Inverse Matrices
A square matrix A ∈ M n × n ( F ) A \in \mathcal{M}_{n \times n}(F) A ∈ M n × n ( F ) is invertible if there exists a matrix
A − 1 ∈ M n × n ( F ) A^{-1} \in \mathcal{M}_{n \times n}(F) A − 1 ∈ M n × n ( F ) such that
A A − 1 = A − 1 A = I n AA^{-1} = A^{-1}A = I_n A A − 1 = A − 1 A = I n
Theorem 3.1. The following are equivalent for A ∈ M n × n ( F ) A \in \mathcal{M}_{n \times n}(F) A ∈ M n × n ( F ) :
A A A is invertible.
det ( A ) ≠ 0 \det(A) \neq 0 det ( A ) = 0 .
The columns of A A A are linearly independent.
The rows of A A A are linearly independent.
r a n k ( A ) = n \mathrm{rank}(A) = n rank ( A ) = n .
The equation A x = b A\mathbf{x} = \mathbf{b} A x = b has a unique solution for every b \mathbf{b} b .
The only solution to A x = 0 A\mathbf{x} = \mathbf{0} A x = 0 is x = 0 \mathbf{x} = \mathbf{0} x = 0 .
3.4 Determinants
The determinant is a function det : M n × n ( F ) → F \det : \mathcal{M}_{n \times n}(F) \to F det : M n × n ( F ) → F defined recursively by
Laplace expansion along the first row:
det ( A ) = ∑ j = 1 n ( − 1 ) 1 + j a 1 j M 1 j \det(A) = \sum_{j=1}^n (-1)^{1+j} a_{1j} M_{1j} det ( A ) = ∑ j = 1 n ( − 1 ) 1 + j a 1 j M 1 j
where M 1 j M_{1j} M 1 j is the ( 1 , j ) (1,j) ( 1 , j ) -minor (the determinant of the ( n − 1 ) × ( n − 1 ) (n-1) \times (n-1) ( n − 1 ) × ( n − 1 ) matrix obtained by
deleting row 1 and column j j j ).
Properties of determinants:
det ( A B ) = det ( A ) det ( B ) \det(AB) = \det(A)\det(B) det ( A B ) = det ( A ) det ( B )
det ( A T ) = det ( A ) \det(A^T) = \det(A) det ( A T ) = det ( A )
det ( A − 1 ) = 1 / det ( A ) \det(A^{-1}) = 1 / \det(A) det ( A − 1 ) = 1/ det ( A )
Swapping two rows changes the sign of the determinant.
Multiplying a row by α \alpha α multiplies the determinant by α \alpha α .
Adding a multiple of one row to another leaves the determinant unchanged.
Worked Example. Compute det ( A ) \det(A) det ( A ) where
A = ( 1 2 3 0 1 4 5 6 0 ) A = \begin{pmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{pmatrix} A = 1 0 5 2 1 6 3 4 0
Solution. Expanding along the first column:
det ( A ) = 1 ⋅ det ( 1 4 6 0 ) − 0 + 5 ⋅ det ( 2 3 1 4 ) \det(A) = 1 \cdot \det\begin{pmatrix} 1 & 4 \\ 6 & 0 \end{pmatrix} - 0 + 5 \cdot \det\begin{pmatrix} 2 & 3 \\ 1 & 4 \end{pmatrix} det ( A ) = 1 ⋅ det ( 1 6 4 0 ) − 0 + 5 ⋅ det ( 2 1 3 4 )
= 1 ⋅ ( 0 − 24 ) + 5 ⋅ ( 8 − 3 ) = − 24 + 25 = 1 = 1 \cdot (0 - 24) + 5 \cdot (8 - 3) = -24 + 25 = 1 = 1 ⋅ ( 0 − 24 ) + 5 ⋅ ( 8 − 3 ) = − 24 + 25 = 1
The determinant is only defined for square matrices. There is no meaningful determinant for an
m × n m \times n m × n matrix with m ≠ n m \neq n m = n . Do not confuse det ( A B ) = det ( A ) det ( B ) \det(AB) = \det(A)\det(B) det ( A B ) = det ( A ) det ( B ) with a
non-existent formula for non-square matrices.
4. Systems of Linear Equations
4.1 Gaussian Elimination
A system of m m m linear equations in n n n unknowns can be written as A x = b A\mathbf{x} = \mathbf{b} A x = b , where
A ∈ M m × n ( R ) A \in \mathcal{M}_{m \times n}(\mathbb{R}) A ∈ M m × n ( R ) , x ∈ R n \mathbf{x} \in \mathbb{R}^n x ∈ R n , and
b ∈ R m \mathbf{b} \in \mathbb{R}^m b ∈ R m .
Gaussian elimination transforms the augmented matrix [ A ∣ b ] [A \mid \mathbf{b}] [ A ∣ b ] into row echelon
form (REF) using elementary row operations:
Swap two rows (R i ↔ R j R_i \leftrightarrow R_j R i ↔ R j ).
Multiply a row by a non-zero scalar (R i → α R i R_i \to \alpha R_i R i → α R i ).
Add a multiple of one row to another (R i → R i + α R j R_i \to R_i + \alpha R_j R i → R i + α R j ).
A matrix is in REF if:
All zero rows are at the bottom.
The leading entry (pivot) of each non-zero row is strictly to the right of the pivot above.
All entries below a pivot are zero.
Reduced row echelon form (RREF) additionally requires:
Each pivot is 1.
Each pivot is the only non-zero entry in its column.
Theorem 4.1. Every matrix has a unique RREF. Two matrices are row-equivalent if and only if they
have the same RREF.
4.2 Existence and Uniqueness
Theorem 4.2 (Rouché-Capelli). The system A x = b A\mathbf{x} = \mathbf{b} A x = b is consistent (has at least one
solution) if and only if
r a n k ( A ) = r a n k ( [ A ∣ b ] ) \mathrm{rank}(A) = \mathrm{rank}([A \mid \mathbf{b}]) rank ( A ) = rank ([ A ∣ b ])
If consistent, the solution set has dim ( n u l l ( A ) ) \dim(\mathrm{null}(A)) dim ( null ( A )) free parameters, where
dim ( n u l l ( A ) ) = n − r a n k ( A ) \dim(\mathrm{null}(A)) = n - \mathrm{rank}(A) dim ( null ( A )) = n − rank ( A ) .
4.3 LU Decomposition
An LU decomposition of A ∈ M n × n ( R ) A \in \mathcal{M}_{n \times n}(\mathbb{R}) A ∈ M n × n ( R ) writes A = L U A = LU A = LU where L L L is
lower triangular with 1s on the diagonal, and U U U is upper triangular.
Theorem 4.3. If Gaussian elimination can be performed on A A A without row exchanges, then A A A
admits an LU decomposition.
Algorithm. Store the multipliers m i j m_{ij} m ij (used to eliminate entry a i j a_{ij} a ij ) in the lower
triangular portion. The resulting upper triangular matrix is U U U , and the multipliers form L L L .
Worked Example. Find the LU decomposition of
A = ( 2 1 1 4 3 3 8 7 9 ) A = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix} A = 2 4 8 1 3 7 1 3 9
Solution.
Step 1: Eliminate below a 11 a_{11} a 11 . m 21 = 4 / 2 = 2 m_{21} = 4/2 = 2 m 21 = 4/2 = 2 , m 31 = 8 / 2 = 4 m_{31} = 8/2 = 4 m 31 = 8/2 = 4 .
( 2 1 1 0 1 1 0 3 5 ) \begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 3 & 5 \end{pmatrix} 2 0 0 1 1 3 1 1 5
Step 2: Eliminate below a 22 a_{22} a 22 . m 32 = 3 / 1 = 3 m_{32} = 3/1 = 3 m 32 = 3/1 = 3 .
U = ( 2 1 1 0 1 1 0 0 2 ) U = \begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{pmatrix} U = 2 0 0 1 1 0 1 1 2
L = ( 1 0 0 2 1 0 4 3 1 ) L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{pmatrix} L = 1 2 4 0 1 3 0 0 1
Verify: L U = ( 2 1 1 4 3 3 8 7 9 ) = A LU = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix} = A LU = 2 4 8 1 3 7 1 3 9 = A . ■ \blacksquare ■
To solve A x = b A\mathbf{x} = \mathbf{b} A x = b , first solve L y = b L\mathbf{y} = \mathbf{b} L y = b (forward substitution),
then U x = y U\mathbf{x} = \mathbf{y} U x = y (back substitution).
5. Eigenvalues and Eigenvectors
5.1 Definitions
Let A ∈ M n × n ( F ) A \in \mathcal{M}_{n \times n}(F) A ∈ M n × n ( F ) . A scalar λ ∈ F \lambda \in F λ ∈ F is an eigenvalue of A A A if there
exists a non-zero vector v ∈ F n \mathbf{v} \in F^n v ∈ F n such that
A v = λ v A\mathbf{v} = \lambda \mathbf{v} A v = λ v
The vector v \mathbf{v} v is called an eigenvector corresponding to λ \lambda λ .
5.2 Characteristic Polynomial
Theorem 5.1. λ \lambda λ is an eigenvalue of A A A if and only if det ( A − λ I ) = 0 \det(A - \lambda I) = 0 det ( A − λ I ) = 0 .
The polynomial p ( λ ) = det ( A − λ I ) p(\lambda) = \det(A - \lambda I) p ( λ ) = det ( A − λ I ) is called the characteristic polynomial of A A A .
Its roots (in the algebraic closure of F F F ) are the eigenvalues of A A A .
5.3 Diagonalisation
Definition. A A A is diagonalisable if there exists an invertible matrix P P P and a diagonal
matrix D D D such that
A = P D P − 1 A = PDP^{-1} A = P D P − 1
Theorem 5.2. A A A is diagonalisable (over F F F ) if and only if A A A has n n n linearly independent
eigenvectors (over F F F ).
Corollary 5.3. If A A A has n n n distinct eigenvalues, then A A A is diagonalisable.
5.4 Spectral Theorem for Real Symmetric Matrices
Theorem 5.4 (Spectral Theorem). If A ∈ M n × n ( R ) A \in \mathcal{M}_{n \times n}(\mathbb{R}) A ∈ M n × n ( R ) is symmetric
(A = A T A = A^T A = A T ), then:
All eigenvalues of A A A are real.
A A A has n n n linearly independent orthonormal eigenvectors.
A A A is orthogonally diagonalisable: A = Q D Q T A = QDQ^T A = Q D Q T where Q Q Q is orthogonal (Q T Q = I Q^TQ = I Q T Q = I ).
5.5 Worked Example
Problem. Find the eigenvalues and eigenvectors of
A = ( 4 1 2 3 ) A = \begin{pmatrix} 4 & 1 \\ 2 & 3 \end{pmatrix} A = ( 4 2 1 3 )
Solution. The characteristic polynomial is
det ( A − λ I ) = det ( 4 − λ 1 2 3 − λ ) = ( 4 − λ ) ( 3 − λ ) − 2 \det(A - \lambda I) = \det\begin{pmatrix} 4 - \lambda & 1 \\ 2 & 3 - \lambda \end{pmatrix} = (4 - \lambda)(3 - \lambda) - 2 det ( A − λ I ) = det ( 4 − λ 2 1 3 − λ ) = ( 4 − λ ) ( 3 − λ ) − 2
= λ 2 − 7 λ + 10 = ( λ − 5 ) ( λ − 2 ) = \lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2) = λ 2 − 7 λ + 10 = ( λ − 5 ) ( λ − 2 )
So the eigenvalues are λ 1 = 5 \lambda_1 = 5 λ 1 = 5 and λ 2 = 2 \lambda_2 = 2 λ 2 = 2 .
For λ 1 = 5 \lambda_1 = 5 λ 1 = 5 : Solve ( A − 5 I ) v = 0 (A - 5I)\mathbf{v} = \mathbf{0} ( A − 5 I ) v = 0 .
( − 1 1 2 − 2 ) v = 0 ⟹ − v 1 + v 2 = 0 ⟹ v = t ( 1 1 ) \begin{pmatrix} -1 & 1 \\ 2 & -2 \end{pmatrix}\mathbf{v} = \mathbf{0} \implies -v_1 + v_2 = 0 \implies \mathbf{v} = t\begin{pmatrix} 1 \\ 1 \end{pmatrix} ( − 1 2 1 − 2 ) v = 0 ⟹ − v 1 + v 2 = 0 ⟹ v = t ( 1 1 )
For λ 2 = 2 \lambda_2 = 2 λ 2 = 2 : Solve ( A − 2 I ) v = 0 (A - 2I)\mathbf{v} = \mathbf{0} ( A − 2 I ) v = 0 .
( 2 1 2 1 ) v = 0 ⟹ 2 v 1 + v 2 = 0 ⟹ v = t ( 1 − 2 ) \begin{pmatrix} 2 & 1 \\ 2 & 1 \end{pmatrix}\mathbf{v} = \mathbf{0} \implies 2v_1 + v_2 = 0 \implies \mathbf{v} = t\begin{pmatrix} 1 \\ -2 \end{pmatrix} ( 2 2 1 1 ) v = 0 ⟹ 2 v 1 + v 2 = 0 ⟹ v = t ( 1 − 2 )
Therefore A = P D P − 1 A = PDP^{-1} A = P D P − 1 with
P = ( 1 1 1 − 2 ) , D = ( 5 0 0 2 ) P = \begin{pmatrix} 1 & 1 \\ 1 & -2 \end{pmatrix}, \quad D = \begin{pmatrix} 5 & 0 \\ 0 & 2 \end{pmatrix} P = ( 1 1 1 − 2 ) , D = ( 5 0 0 2 )
Not every matrix is diagonalisable. For example, A = ( 1 1 0 1 ) A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} A = ( 1 0 1 1 ) has
eigenvalue λ = 1 \lambda = 1 λ = 1 with algebraic multiplicity 2 but geometric multiplicity 1. It has only one
linearly independent eigenvector and is not diagonalisable.
6.1 Definition
A linear transformation (or linear map) T : V → W T : V \to W T : V → W between vector spaces V V V and W W W over F F F
is a function satisfying:
T ( u + v ) = T ( u ) + T ( v ) T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}) T ( u + v ) = T ( u ) + T ( v ) for all u , v ∈ V \mathbf{u}, \mathbf{v} \in V u , v ∈ V
T ( α v ) = α T ( v ) T(\alpha \mathbf{v}) = \alpha T(\mathbf{v}) T ( α v ) = α T ( v ) for all α ∈ F \alpha \in F α ∈ F , v ∈ V \mathbf{v} \in V v ∈ V
The set of all linear transformations from V V V to W W W is denoted L ( V , W ) \mathcal{L}(V, W) L ( V , W ) .
6.2 Matrix Representation
If V V V and W W W are finite-dimensional with bases
B V = { v 1 , … , v n } \mathcal{B}_V = \{\mathbf{v}_1, \ldots, \mathbf{v}_n\} B V = { v 1 , … , v n } and
B W = { w 1 , … , w m } \mathcal{B}_W = \{\mathbf{w}_1, \ldots, \mathbf{w}_m\} B W = { w 1 , … , w m } , then every T ∈ L ( V , W ) T \in \mathcal{L}(V, W) T ∈ L ( V , W ) is
uniquely represented by a matrix [ T ] B V B W ∈ M m × n ( F ) [T]_{\mathcal{B}_V}^{\mathcal{B}_W} \in \mathcal{M}_{m \times n}(F) [ T ] B V B W ∈ M m × n ( F )
where the j j j -th column is the coordinate vector of T ( v j ) T(\mathbf{v}_j) T ( v j ) with respect to B W \mathcal{B}_W B W .
6.3 Kernel and Image
The kernel (null space) and image (range) of T T T are:
ker ( T ) = { v ∈ V : T ( v ) = 0 } \ker(T) = \{\mathbf{v} \in V : T(\mathbf{v}) = \mathbf{0}\} ker ( T ) = { v ∈ V : T ( v ) = 0 }
i m ( T ) = { T ( v ) : v ∈ V } \mathrm{im}(T) = \{T(\mathbf{v}) : \mathbf{v} \in V\} im ( T ) = { T ( v ) : v ∈ V }
Theorem 6.1 (Rank-Nullity Theorem). For T ∈ L ( V , W ) T \in \mathcal{L}(V, W) T ∈ L ( V , W ) with V V V finite-dimensional:
dim ( ker ( T ) ) + dim ( i m ( T ) ) = dim ( V ) \dim(\ker(T)) + \dim(\mathrm{im}(T)) = \dim(V) dim ( ker ( T )) + dim ( im ( T )) = dim ( V )
6.4 Isomorphisms
A linear transformation T : V → W T : V \to W T : V → W is an isomorphism if it is bijective. We write V ≅ W V \cong W V ≅ W .
Theorem 6.2. T T T is an isomorphism if and only if ker ( T ) = { 0 } \ker(T) = \{\mathbf{0}\} ker ( T ) = { 0 } and i m ( T ) = W \mathrm{im}(T) = W im ( T ) = W .
Corollary 6.3. If dim ( V ) = dim ( W ) < ∞ \dim(V) = \dim(W) < \infty dim ( V ) = dim ( W ) < ∞ , then T T T is injective if and only if T T T is surjective.
6.5 Change of Basis
If P P P is the change-of-basis matrix from basis B \mathcal{B} B to basis B ′ \mathcal{B}' B ′ , then for a
linear transformation T T T with matrix representations [ T ] B [T]_{\mathcal{B}} [ T ] B and [ T ] B ′ [T]_{\mathcal{B}'} [ T ] B ′ :
[ T ] B ′ = P − 1 [ T ] B P [T]_{\mathcal{B}'} = P^{-1}[T]_{\mathcal{B}} P [ T ] B ′ = P − 1 [ T ] B P
This is the similarity transformation . Similar matrices represent the same linear transformation
in different bases and share the same eigenvalues, determinant, and trace.
7. Inner Product Spaces
7.1 Definition of an Inner Product
An inner product on a vector space V V V over F F F (where F = R F = \mathbb{R} F = R or C \mathbb{C} C ) is a
function ⟨ ⋅ , ⋅ ⟩ : V × V → F \langle \cdot, \cdot \rangle : V \times V \to F ⟨ ⋅ , ⋅ ⟩ : V × V → F satisfying:
Conjugate symmetry : ⟨ u , v ⟩ = ⟨ v , u ⟩ ‾ \langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle} ⟨ u , v ⟩ = ⟨ v , u ⟩
Linearity in the first argument : ⟨ α u + β w , v ⟩ = α ⟨ u , v ⟩ + β ⟨ w , v ⟩ \langle \alpha\mathbf{u} + \beta\mathbf{w}, \mathbf{v} \rangle = \alpha\langle \mathbf{u}, \mathbf{v} \rangle + \beta\langle \mathbf{w}, \mathbf{v} \rangle ⟨ α u + β w , v ⟩ = α ⟨ u , v ⟩ + β ⟨ w , v ⟩
Positive definiteness : ⟨ v , v ⟩ ≥ 0 \langle \mathbf{v}, \mathbf{v} \rangle \geq 0 ⟨ v , v ⟩ ≥ 0 with equality iff v = 0 \mathbf{v} = \mathbf{0} v = 0
A vector space equipped with an inner product is called an inner product space .
7.2 Norms
Every inner product induces a norm :
∥ v ∥ = ⟨ v , v ⟩ \lVert \mathbf{v} \rVert = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle} ∥ v ∥ = ⟨ v , v ⟩
Theorem 7.1 (Cauchy-Schwarz Inequality).
∣ ⟨ u , v ⟩ ∣ ≤ ∥ u ∥ ∥ v ∥ |\langle \mathbf{u}, \mathbf{v} \rangle| \leq \lVert \mathbf{u} \rVert \, \lVert \mathbf{v} \rVert ∣ ⟨ u , v ⟩ ∣ ≤ ∥ u ∥ ∥ v ∥
with equality if and only if u \mathbf{u} u and v \mathbf{v} v are linearly dependent.
Theorem 7.2 (Triangle Inequality).
∥ u + v ∥ ≤ ∥ u ∥ + ∥ v ∥ \lVert \mathbf{u} + \mathbf{v} \rVert \leq \lVert \mathbf{u} \rVert + \lVert \mathbf{v} \rVert ∥ u + v ∥ ≤ ∥ u ∥ + ∥ v ∥
7.3 Orthogonality
Two vectors u , v \mathbf{u}, \mathbf{v} u , v are orthogonal if ⟨ u , v ⟩ = 0 \langle \mathbf{u}, \mathbf{v} \rangle = 0 ⟨ u , v ⟩ = 0 .
An orthonormal set { e 1 , … , e k } \{e_1, \ldots, e_k\} { e 1 , … , e k } satisfies ⟨ e i , e j ⟩ = δ i j \langle e_i, e_j \rangle = \delta_{ij} ⟨ e i , e j ⟩ = δ ij .
Theorem 7.3 (Pythagorean Theorem). If u ⊥ v \mathbf{u} \perp \mathbf{v} u ⊥ v , then
∥ u + v ∥ 2 = ∥ u ∥ 2 + ∥ v ∥ 2 \lVert \mathbf{u} + \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + \lVert \mathbf{v} \rVert^2 ∥ u + v ∥ 2 = ∥ u ∥ 2 + ∥ v ∥ 2
7.4 Gram-Schmidt Process
The Gram-Schmidt process converts a linearly independent set
{ v 1 , … , v n } \{\mathbf{v}_1, \ldots, \mathbf{v}_n\} { v 1 , … , v n } into an orthonormal set { e 1 , … , e n } \{e_1, \ldots, e_n\} { e 1 , … , e n } :
u 1 = v 1 , e 1 = u 1 ∥ u 1 ∥ \mathbf{u}_1 = \mathbf{v}_1, \quad e_1 = \frac{\mathbf{u}_1}{\lVert \mathbf{u}_1 \rVert} u 1 = v 1 , e 1 = ∥ u 1 ∥ u 1
u k = v k − ∑ i = 1 k − 1 ⟨ v k , e i ⟩ e i , e k = u k ∥ u k ∥ \mathbf{u}_k = \mathbf{v}_k - \sum_{i=1}^{k-1} \langle \mathbf{v}_k, e_i \rangle e_i, \quad e_k = \frac{\mathbf{u}_k}{\lVert \mathbf{u}_k \rVert} u k = v k − ∑ i = 1 k − 1 ⟨ v k , e i ⟩ e i , e k = ∥ u k ∥ u k
7.5 Orthogonal Projection
The orthogonal projection of v \mathbf{v} v onto a subspace W W W with orthonormal basis
{ e 1 , … , e k } \{e_1, \ldots, e_k\} { e 1 , … , e k } is
p r o j W ( v ) = ∑ i = 1 k ⟨ v , e i ⟩ e i \mathrm{proj}_W(\mathbf{v}) = \sum_{i=1}^k \langle \mathbf{v}, e_i \rangle e_i proj W ( v ) = ∑ i = 1 k ⟨ v , e i ⟩ e i
Theorem 7.4 (Best Approximation). Among all vectors in W W W , the orthogonal projection
p r o j W ( v ) \mathrm{proj}_W(\mathbf{v}) proj W ( v ) minimises the distance to v \mathbf{v} v :
∥ v − p r o j W ( v ) ∥ ≤ ∥ v − w ∥ f o r a l l w ∈ W \lVert \mathbf{v} - \mathrm{proj}_W(\mathbf{v}) \rVert \leq \lVert \mathbf{v} - \mathbf{w} \rVert \quad \mathrm{for all } \mathbf{w} \in W ∥ v − proj W ( v )∥ ≤ ∥ v − w ∥ forall w ∈ W
7.6 Worked Example
Problem. Apply the Gram-Schmidt process to v 1 = ( 1 , 1 , 1 ) \mathbf{v}_1 = (1, 1, 1) v 1 = ( 1 , 1 , 1 ) ,
v 2 = ( 1 , 0 , 1 ) \mathbf{v}_2 = (1, 0, 1) v 2 = ( 1 , 0 , 1 ) , v 3 = ( 0 , 1 , 0 ) \mathbf{v}_3 = (0, 1, 0) v 3 = ( 0 , 1 , 0 ) in R 3 \mathbb{R}^3 R 3 with the standard inner
product.
Solution.
e 1 = ( 1 , 1 , 1 ) 3 = 1 3 ( 1 , 1 , 1 ) e_1 = \frac{(1,1,1)}{\sqrt{3}} = \frac{1}{\sqrt{3}}(1, 1, 1) e 1 = 3 ( 1 , 1 , 1 ) = 3 1 ( 1 , 1 , 1 )
u 2 = ( 1 , 0 , 1 ) − ⟨ ( 1 , 0 , 1 ) , e 1 ⟩ e 1 = ( 1 , 0 , 1 ) − 2 3 ⋅ 1 3 ( 1 , 1 , 1 ) = ( 1 , 0 , 1 ) − 2 3 ( 1 , 1 , 1 ) = ( 1 3 , − 2 3 , 1 3 ) \mathbf{u}_2 = (1,0,1) - \langle (1,0,1), e_1 \rangle e_1 = (1,0,1) - \frac{2}{\sqrt{3}} \cdot \frac{1}{\sqrt{3}}(1,1,1) = (1,0,1) - \frac{2}{3}(1,1,1) = (\frac{1}{3}, -\frac{2}{3}, \frac{1}{3}) u 2 = ( 1 , 0 , 1 ) − ⟨( 1 , 0 , 1 ) , e 1 ⟩ e 1 = ( 1 , 0 , 1 ) − 3 2 ⋅ 3 1 ( 1 , 1 , 1 ) = ( 1 , 0 , 1 ) − 3 2 ( 1 , 1 , 1 ) = ( 3 1 , − 3 2 , 3 1 )
e 2 = ( 1 3 , − 2 3 , 1 3 ) 6 / 3 = 1 6 ( 1 , − 2 , 1 ) e_2 = \frac{(\frac{1}{3}, -\frac{2}{3}, \frac{1}{3})}{\sqrt{6}/3} = \frac{1}{\sqrt{6}}(1, -2, 1) e 2 = 6 /3 ( 3 1 , − 3 2 , 3 1 ) = 6 1 ( 1 , − 2 , 1 )
u 3 = ( 0 , 1 , 0 ) − ⟨ ( 0 , 1 , 0 ) , e 1 ⟩ e 1 − ⟨ ( 0 , 1 , 0 ) , e 2 ⟩ e 2 = ( 0 , 1 , 0 ) − 1 3 ( 1 , 1 , 1 ) + 2 6 ( 1 , − 2 , 1 ) = ( 0 , 1 , 0 ) − ( 1 3 , 1 3 , 1 3 ) + ( 1 3 , − 2 3 , 1 3 ) = ( 1 3 − 1 3 , 1 − 1 3 − 2 3 , − 1 3 + 1 3 ) ⋅ … \mathbf{u}_3 = (0,1,0) - \langle (0,1,0), e_1 \rangle e_1 - \langle (0,1,0), e_2 \rangle e_2 = (0,1,0) - \frac{1}{3}(1,1,1) + \frac{2}{6}(1,-2,1) = (0,1,0) - (\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) + (\frac{1}{3}, -\frac{2}{3}, \frac{1}{3}) = (\frac{1}{3} - \frac{1}{3}, 1 - \frac{1}{3} - \frac{2}{3}, -\frac{1}{3} + \frac{1}{3}) \cdot \ldots u 3 = ( 0 , 1 , 0 ) − ⟨( 0 , 1 , 0 ) , e 1 ⟩ e 1 − ⟨( 0 , 1 , 0 ) , e 2 ⟩ e 2 = ( 0 , 1 , 0 ) − 3 1 ( 1 , 1 , 1 ) + 6 2 ( 1 , − 2 , 1 ) = ( 0 , 1 , 0 ) − ( 3 1 , 3 1 , 3 1 ) + ( 3 1 , − 3 2 , 3 1 ) = ( 3 1 − 3 1 , 1 − 3 1 − 3 2 , − 3 1 + 3 1 ) ⋅ …
Verifying: u 3 = ( 0 , 1 , 0 ) − 1 3 ( 1 , 1 , 1 ) − − 2 6 ( 1 , − 2 , 1 ) = ( 0 , 1 , 0 ) − ( 1 3 , 1 3 , 1 3 ) + ( 1 3 , − 2 3 , 1 3 ) = ( 0 , 0 , 0 ) \mathbf{u}_3 = (0, 1, 0) - \frac{1}{3}(1,1,1) - \frac{-2}{6}(1,-2,1) = (0,1,0) - (\frac{1}{3},\frac{1}{3},\frac{1}{3}) + (\frac{1}{3},-\frac{2}{3},\frac{1}{3}) = (0, 0, 0) u 3 = ( 0 , 1 , 0 ) − 3 1 ( 1 , 1 , 1 ) − 6 − 2 ( 1 , − 2 , 1 ) = ( 0 , 1 , 0 ) − ( 3 1 , 3 1 , 3 1 ) + ( 3 1 , − 3 2 , 3 1 ) = ( 0 , 0 , 0 )
This reveals that v 3 \mathbf{v}_3 v 3 is a linear combination of v 1 \mathbf{v}_1 v 1 and v 2 \mathbf{v}_2 v 2 . Indeed,
v 3 = v 1 − v 2 \mathbf{v}_3 = \mathbf{v}_1 - \mathbf{v}_2 v 3 = v 1 − v 2 . The original set was linearly dependent. We should have
started with a linearly independent set. ■ \blacksquare ■
The Gram-Schmidt process requires a linearly independent starting set. If the input vectors are
linearly dependent, one of the u k \mathbf{u}_k u k will be the zero vector, and the process will fail
(attempting to divide by zero in the normalisation step).