Theorem 8.1 (Singular Value Decomposition). Every matrix A∈Mm×n(R) can be factored as
A=UΣVT
Where U∈Mm×m(R) is orthogonal, V∈Mn×n(R) is orthogonal, and Σ∈Mm×n(R) is diagonal with non-negative entries σ1≥σ2≥⋯≥σr≥0 (where r=rank(A)).
The σi are called the singular values of A. The columns of U are the left singular vectors, and the columns of V are the right singular vectors.
Proof. The matrix ATA is an n×n symmetric positive semi-definite matrix, so by the spectral theorem it has n real non-negative eigenvalues and an orthonormal eigenbasis. Let σ12,σ22,…,σn2 be these eigenvalues (some may be zero) with corresponding orthonormal eigenvectors v1,…,vn forming the columns of V.
For each i with σi>0Define ui=Avi/σi. We verify that these form an orthonormal set:
Proposition 8.2. The singular values of A are the square roots of the eigenvalues of ATA (equivalently, of AAT). The non-zero singular values of A and AT are identical.
Proof. From the construction above, ATAvi=σi2vi. For AAT:
AATui=σiA(ATA)vi=σiσi2Avi=σi2ui
So ui is an eigenvector of AAT with eigenvalue σi2. The non-zero eigenvalues of ATA and AAT coincide (since if ATAv=λv with λ=0Then AAT(Av)=A(ATAv)=λ(Av) and Av=0). ■
Proposition 8.3. If A is symmetric with eigenvalues λ1,…,λnThen the singular values of A are ∣λ1∣,…,∣λn∣.
Proof.ATA=A2Whose eigenvalues are λi2. The singular values are λi2=∣λi∣. ■
8.3 Geometric Interpretation
The SVD decomposes the linear transformation T:Rn→Rm into three steps:
VT rotates (or reflects) Rn into a coordinate system aligned with the right singular vectors.
Σ scales each axis by the corresponding singular value (and possibly drops dimensions where σi=0).
U rotates (or reflects) the result into the coordinate system of the left singular vectors.
Unit circle image. Under AThe unit circle in R2 is mapped to an ellipse with semi-axes σ1 and σ2 aligned with the columns of U.
8.4 Low-Rank Approximation
Theorem 8.4 (Eckart—Young—Mirsky). Let A=UΣVT be the SVD with singular values σ1≥σ2≥⋯≥σr>0. For any k<rThe best rank-k approximation to A (in both the Frobenius and spectral norms) is
Ak=∑i=1kσiuiviT=UkΣkVkT
And the approximation error is
∥A−Ak∥F=σk+12+⋯+σr2,∥A−Ak∥2=σk+1
Proof (Frobenius norm). Any rank-k matrix B can be written in terms of an orthonormal basis of its column space. Let W∈Mn×k(R) have orthonormal columns spanning the column space of B. Then B=CWT for some CAnd:
∥A−B∥F2=∥A(I−WWT)∥F2+∥(A−C)WT∥F2≥∥A(I−WWT)∥F2
The minimum over W is achieved when W spans the subspace spanned by v1,…,vk (the top k right singular vectors), giving:
∥A−Ak∥F2=∑i=k+1rσi2
The spectral norm result follows because ∥A−Ak∥2=σk+1 is the largest singular value of A−Ak. ■
8.5 Pseudoinverse
Definition. The Moore—Penrose pseudoinverse of A=UΣVT is
A+=VΣ+UT
Where Σ+ is obtained from Σ by transposing and inverting each non-zero singular value:
(Σ+)ii={1/σi0ifσi>0ifσi=0
Theorem 8.5. The pseudoinverse satisfies the four Moore—Penrose conditions:
AA+A=A
A+AA+=A+
(AA+)T=AA+
(A+A)T=A+A
Proof. Direct computation using A=UΣVT and A+=VΣ+UT:
AA+A=UΣVTVΣ+UTUΣVT=UΣΣ+ΣVT=UΣVT=A
Since ΣΣ+Σ=Σ (the non-zero singular values are preserved, zeros remain zero). The remaining conditions follow similarly. ■
Theorem 8.6 (Minimum-Norm Least Squares). If Ax=b is inconsistent, then x∗=A+b is the least-squares solution of minimum norm.
Proof. The least-squares solutions to Ax≈b are x=A+b+(I−A+A)z for arbitrary z. Since (I−A+A)z∈ker(A) and A+b∈im(AT)These two components are orthogonal. The minimum-norm solution is obtained when z=0Giving x∗=A+b. ■
8.6 Condition Number
Definition. The condition number of A (with respect to the spectral norm) is
κ(A)=∥A∥2⋅∥A+∥2=σrσ1
Where σ1 is the largest and σr is the smallest non-zero singular value.
Theorem 8.7 (Sensitivity of Linear Systems). If Ax=b and A(x+δx)=b+δbThen
∥x∥∥δx∥≤κ(A)⋅∥b∥∥δb∥
Proof. From Aδx=δb: ∥δx∥=∥A−1δb∥≤∥A−1∥∥δb∥=σr−1∥δb∥. From Ax=b: ∥b∥=∥Ax∥≥σ1−1∥x∥… Wait, ∥Ax∥≥σr∥x∥ and ∥b∥≤σ1∥x∥. Combining: ∥δx∥/∥x∥≤(σ1/σr)(∥δb∥/∥b∥). ■
A matrix with large condition number is ill-conditioned: small perturbations in b can cause large changes in x.
Setting to zero and solving: λ3−5λ2+6λ+1=0. Testing λ=3: 27−45+18+1=1=0. Testing λ=4: 64−80+24+1=9=0. By numerical methods or the rational root theorem (no rational roots), the eigenvalues are approximately λ1≈5.25, λ2≈1.31, λ3≈−0.56.
Wait, ATA should be positive semi-definite, so all eigenvalues should be non-negative. Let me recompute.
ATA=211110102.
tr(ATA)=5So λ1+λ2+λ3=5.
det(ATA)=2(2)−1(2)−1(−1)=4−2+1=3.
∑ of principal 2×2 minors =(2−1)+(4−1)+(2−0)=1+3+2=6.
By the trigonometric method for cubics or numerical approximation, λ1≈3.35, λ2≈1.35, λ3≈0.30.
The best rank-1 approximation uses only σ1=λ1≈1.83 and its corresponding singular vectors, yielding A1=σ1u1v1T with error ∥A−A1∥F=λ2+λ3≈1.65≈1.28. ■
8.9 Worked Example: Condition Number and Numerical Stability
Problem. Compute the condition number of A=(1111.0001) and discuss its implications.
This means relative errors in b can be amplified by a factor of up to 20000 in the solution x. The matrix is nearly singular (the two rows are almost linearly dependent), and solving Ax=b in floating-point arithmetic will lose approximately log10(20000)≈4.3 digits of precision. ■
8.10 Common Pitfalls
Singular values are always non-negative. Unlike eigenvalues, which can be negative or complex, singular values are the square roots of eigenvalues of ATAHence always real and non-negative.
The SVD is not unique. If A has repeated singular values, the corresponding singular vectors can be any orthonormal basis of the eigenspace. The signs of singular vectors can also be flipped in pairs.
The pseudoinverse equals the inverse only for square, full-rank matrices. When A is not full rank, A+A=I; instead, A+A is the orthogonal projection onto im(AT).
The SVD and eigendecomposition are different decompositions. The SVD always exists for any matrix, but the eigendecomposition requires the matrix to be square. Even for symmetric matrices, the singular values are ∣λi∣Not λi.