Skip to content

Singular Value Decomposition

8.1 Existence of the SVD

Theorem 8.1 (Singular Value Decomposition). Every matrix AMm×n(R)A \in \mathcal{M}_{m \times n}(\mathbb{R}) can be factored as

A=UΣVTA = U \Sigma V^T

Where UMm×m(R)U \in \mathcal{M}_{m \times m}(\mathbb{R}) is orthogonal, VMn×n(R)V \in \mathcal{M}_{n \times n}(\mathbb{R}) is orthogonal, and ΣMm×n(R)\Sigma \in \mathcal{M}_{m \times n}(\mathbb{R}) is diagonal with non-negative entries σ1σ2σr0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r \geq 0 (where r=rank(A)r = \mathrm{rank}(A)).

The σi\sigma_i are called the singular values of AA. The columns of UU are the left singular vectors, and the columns of VV are the right singular vectors.

Proof. The matrix ATAA^T A is an n×nn \times n symmetric positive semi-definite matrix, so by the spectral theorem it has nn real non-negative eigenvalues and an orthonormal eigenbasis. Let σ12,σ22,,σn2\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2 be these eigenvalues (some may be zero) with corresponding orthonormal eigenvectors v1,,vn\mathbf{v}_1, \ldots, \mathbf{v}_n forming the columns of VV.

For each ii with σi>0\sigma_i > 0Define ui=Avi/σi\mathbf{u}_i = A\mathbf{v}_i / \sigma_i. We verify that these form an orthonormal set:

uiTuj=viTATAvjσiσj=σj2viTvjσiσj=σj2σiσjδij=δij\mathbf{u}_i^T \mathbf{u}_j = \frac{\mathbf{v}_i^T A^T A \mathbf{v}_j}{\sigma_i \sigma_j} = \frac{\sigma_j^2 \mathbf{v}_i^T \mathbf{v}_j}{\sigma_i \sigma_j} = \frac{\sigma_j^2}{\sigma_i \sigma_j} \delta_{ij} = \delta_{ij}

Extend {u1,,ur}\{\mathbf{u}_1, \ldots, \mathbf{u}_r\} to an orthonormal basis of Rm\mathbb{R}^m to form UU. Then for any vector xRn\mathbf{x} \in \mathbb{R}^n:

Ax=A(i=1n(viTx)vi)=i=1n(viTx)Avi=i=1rσi(viTx)ui=UΣVTxA\mathbf{x} = A\left(\sum_{i=1}^{n} (\mathbf{v}_i^T \mathbf{x})\mathbf{v}_i\right) = \sum_{i=1}^{n} (\mathbf{v}_i^T \mathbf{x}) A\mathbf{v}_i = \sum_{i=1}^{r} \sigma_i (\mathbf{v}_i^T \mathbf{x}) \mathbf{u}_i = U \Sigma V^T \mathbf{x}

Since this holds for all x\mathbf{x}We have A=UΣVTA = U \Sigma V^T. \blacksquare

8.2 Relationship to Eigenvalues

Proposition 8.2. The singular values of AA are the square roots of the eigenvalues of ATAA^T A (equivalently, of AATAA^T). The non-zero singular values of AA and ATA^T are identical.

Proof. From the construction above, ATAvi=σi2viA^T A \mathbf{v}_i = \sigma_i^2 \mathbf{v}_i. For AATAA^T:

AATui=A(ATA)viσi=σi2Aviσi=σi2uiAA^T \mathbf{u}_i = \frac{A(A^T A)\mathbf{v}_i}{\sigma_i} = \frac{\sigma_i^2 A\mathbf{v}_i}{\sigma_i} = \sigma_i^2 \mathbf{u}_i

So ui\mathbf{u}_i is an eigenvector of AATAA^T with eigenvalue σi2\sigma_i^2. The non-zero eigenvalues of ATAA^T A and AATAA^T coincide (since if ATAv=λvA^T A \mathbf{v} = \lambda \mathbf{v} with λ0\lambda \neq 0Then AAT(Av)=A(ATAv)=λ(Av)AA^T(A\mathbf{v}) = A(A^T A \mathbf{v}) = \lambda(A\mathbf{v}) and Av0A\mathbf{v} \neq \mathbf{0}). \blacksquare

Proposition 8.3. If AA is symmetric with eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_nThen the singular values of AA are λ1,,λn|\lambda_1|, \ldots, |\lambda_n|.

Proof. ATA=A2A^T A = A^2Whose eigenvalues are λi2\lambda_i^2. The singular values are λi2=λi\sqrt{\lambda_i^2} = |\lambda_i|. \blacksquare

8.3 Geometric Interpretation

The SVD decomposes the linear transformation T:RnRmT : \mathbb{R}^n \to \mathbb{R}^m into three steps:

  1. VTV^T rotates (or reflects) Rn\mathbb{R}^n into a coordinate system aligned with the right singular vectors.
  2. Σ\Sigma scales each axis by the corresponding singular value (and possibly drops dimensions where σi=0\sigma_i = 0).
  3. UU rotates (or reflects) the result into the coordinate system of the left singular vectors.

Unit circle image. Under AAThe unit circle in R2\mathbb{R}^2 is mapped to an ellipse with semi-axes σ1\sigma_1 and σ2\sigma_2 aligned with the columns of UU.

8.4 Low-Rank Approximation

Theorem 8.4 (Eckart—Young—Mirsky). Let A=UΣVTA = U \Sigma V^T be the SVD with singular values σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0. For any k<rk < rThe best rank-kk approximation to AA (in both the Frobenius and spectral norms) is

Ak=i=1kσiuiviT=UkΣkVkTA_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^T = U_k \Sigma_k V_k^T

And the approximation error is

AAkF=σk+12++σr2,AAk2=σk+1\lVert A - A_k \rVert_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}, \qquad \lVert A - A_k \rVert_2 = \sigma_{k+1}

Proof (Frobenius norm). Any rank-kk matrix BB can be written in terms of an orthonormal basis of its column space. Let WMn×k(R)W \in \mathcal{M}_{n \times k}(\mathbb{R}) have orthonormal columns spanning the column space of BB. Then B=CWTB = CW^T for some CCAnd:

ABF2=A(IWWT)F2+(AC)WTF2A(IWWT)F2\lVert A - B \rVert_F^2 = \lVert A(I - WW^T) \rVert_F^2 + \lVert (A - C)W^T \rVert_F^2 \geq \lVert A(I - WW^T) \rVert_F^2

The minimum over WW is achieved when WW spans the subspace spanned by v1,,vk\mathbf{v}_1, \ldots, \mathbf{v}_k (the top kk right singular vectors), giving:

AAkF2=i=k+1rσi2\lVert A - A_k \rVert_F^2 = \sum_{i=k+1}^{r} \sigma_i^2

The spectral norm result follows because AAk2=σk+1\lVert A - A_k \rVert_2 = \sigma_{k+1} is the largest singular value of AAkA - A_k. \blacksquare

8.5 Pseudoinverse

Definition. The Moore—Penrose pseudoinverse of A=UΣVTA = U \Sigma V^T is

A+=VΣ+UTA^+ = V \Sigma^+ U^T

Where Σ+\Sigma^+ is obtained from Σ\Sigma by transposing and inverting each non-zero singular value:

(Σ+)ii={1/σiifσi>00ifσi=0(\Sigma^+)_{ii} = \begin{cases} 1/\sigma_i & \text{if} \sigma_i > 0 \\ 0 & \text{if} \sigma_i = 0 \end{cases}

Theorem 8.5. The pseudoinverse satisfies the four Moore—Penrose conditions:

  1. AA+A=AAA^+A = A
  2. A+AA+=A+A^+AA^+ = A^+
  3. (AA+)T=AA+(AA^+)^T = AA^+
  4. (A+A)T=A+A(A^+A)^T = A^+A

Proof. Direct computation using A=UΣVTA = U \Sigma V^T and A+=VΣ+UTA^+ = V \Sigma^+ U^T:

AA+A=UΣVTVΣ+UTUΣVT=UΣΣ+ΣVT=UΣVT=AAA^+A = U \Sigma V^T V \Sigma^+ U^T U \Sigma V^T = U \Sigma \Sigma^+ \Sigma V^T = U \Sigma V^T = A

Since ΣΣ+Σ=Σ\Sigma \Sigma^+ \Sigma = \Sigma (the non-zero singular values are preserved, zeros remain zero). The remaining conditions follow similarly. \blacksquare

Theorem 8.6 (Minimum-Norm Least Squares). If Ax=bA\mathbf{x} = \mathbf{b} is inconsistent, then x=A+b\mathbf{x}^* = A^+\mathbf{b} is the least-squares solution of minimum norm.

Proof. The least-squares solutions to AxbA\mathbf{x} \approx \mathbf{b} are x=A+b+(IA+A)z\mathbf{x} = A^+\mathbf{b} + (I - A^+A)\mathbf{z} for arbitrary z\mathbf{z}. Since (IA+A)zker(A)(I - A^+A)\mathbf{z} \in \ker(A) and A+bim(AT)A^+\mathbf{b} \in \mathrm{im}(A^T)These two components are orthogonal. The minimum-norm solution is obtained when z=0\mathbf{z} = \mathbf{0}Giving x=A+b\mathbf{x}^* = A^+\mathbf{b}. \blacksquare

8.6 Condition Number

Definition. The condition number of AA (with respect to the spectral norm) is

κ(A)=A2A+2=σ1σr\kappa(A) = \lVert A \rVert_2 \cdot \lVert A^+ \rVert_2 = \frac{\sigma_1}{\sigma_r}

Where σ1\sigma_1 is the largest and σr\sigma_r is the smallest non-zero singular value.

Theorem 8.7 (Sensitivity of Linear Systems). If Ax=bA\mathbf{x} = \mathbf{b} and A(x+δx)=b+δbA(\mathbf{x} + \delta\mathbf{x}) = \mathbf{b} + \delta\mathbf{b}Then

δxxκ(A)δbb\frac{\lVert \delta\mathbf{x} \rVert}{\lVert \mathbf{x} \rVert} \leq \kappa(A) \cdot \frac{\lVert \delta\mathbf{b} \rVert}{\lVert \mathbf{b} \rVert}

Proof. From Aδx=δbA\delta\mathbf{x} = \delta\mathbf{b}: δx=A1δbA1δb=σr1δb\lVert \delta\mathbf{x} \rVert = \lVert A^{-1}\delta\mathbf{b} \rVert \leq \lVert A^{-1} \rVert \lVert \delta\mathbf{b} \rVert = \sigma_r^{-1} \lVert \delta\mathbf{b} \rVert. From Ax=bA\mathbf{x} = \mathbf{b}: b=Axσ11x\lVert \mathbf{b} \rVert = \lVert A\mathbf{x} \rVert \geq \sigma_1^{-1} \lVert \mathbf{x} \rVert… Wait, Axσrx\lVert A\mathbf{x} \rVert \geq \sigma_r \lVert \mathbf{x} \rVert and bσ1x\lVert \mathbf{b} \rVert \leq \sigma_1 \lVert \mathbf{x} \rVert. Combining: δx/x(σ1/σr)(δb/b)\lVert \delta\mathbf{x} \rVert / \lVert \mathbf{x} \rVert \leq (\sigma_1 / \sigma_r)(\lVert \delta\mathbf{b} \rVert / \lVert \mathbf{b} \rVert). \blacksquare

A matrix with large condition number is ill-conditioned: small perturbations in b\mathbf{b} can cause large changes in x\mathbf{x}.

8.7 Worked Example: Computing the SVD

Problem. Find the SVD of A=(322322)A = \begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix}.

Solution

Step 1: Compute ATA=(322232)(322322)=(174417)A^T A = \begin{pmatrix} 3 & 2 & 2 \\ 2 & 3 & -2 \end{pmatrix}\begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix} = \begin{pmatrix} 17 & 4 \\ 4 & 17 \end{pmatrix}.

Step 2: Eigenvalues of ATAA^T A: det(17λ4417λ)=(17λ)216=λ234λ+273=(λ21)(λ13)\det\begin{pmatrix} 17 - \lambda & 4 \\ 4 & 17 - \lambda \end{pmatrix} = (17 - \lambda)^2 - 16 = \lambda^2 - 34\lambda + 273 = (\lambda - 21)(\lambda - 13).

So σ12=21\sigma_1^2 = 21 and σ22=13\sigma_2^2 = 13Giving σ1=21\sigma_1 = \sqrt{21}, σ2=13\sigma_2 = \sqrt{13}.

Step 3: Eigenvectors of ATAA^T A. For λ=21\lambda = 21: (1721)v1+4v2=0(17 - 21)v_1 + 4v_2 = 0So v1=v2v_1 = v_2. Normalised: v1=12(1,1)T\mathbf{v}_1 = \frac{1}{\sqrt{2}}(1, 1)^T.

For λ=13\lambda = 13: 4v1+(1713)v2=04v_1 + (17 - 13)v_2 = 0So v1=v2v_1 = -v_2. Normalised: v2=12(1,1)T\mathbf{v}_2 = \frac{1}{\sqrt{2}}(1, -1)^T.

V=12(1111)V = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}.

Step 4: Compute ui=Avi/σi\mathbf{u}_i = A\mathbf{v}_i / \sigma_i.

u1=12112(322322)(11)=142(550)=12(110)\mathbf{u}_1 = \frac{1}{\sqrt{21}} \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix}\begin{pmatrix} 1 \\ 1 \end{pmatrix} = \frac{1}{\sqrt{42}}\begin{pmatrix} 5 \\ 5 \\ 0 \end{pmatrix} = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}.

u2=11312(322322)(11)=126(114)\mathbf{u}_2 = \frac{1}{\sqrt{13}} \cdot \frac{1}{\sqrt{2}}\begin{pmatrix} 3 & 2 \\ 2 & 3 \\ 2 & -2 \end{pmatrix}\begin{pmatrix} 1 \\ -1 \end{pmatrix} = \frac{1}{\sqrt{26}}\begin{pmatrix} 1 \\ -1 \\ 4 \end{pmatrix}.

Since AA is 3×23 \times 2We need a third left singular vector u3\mathbf{u}_3 orthogonal to u1\mathbf{u}_1 and u2\mathbf{u}_2. Compute u3=u1×u2=152(4,4,2)=126(2,2,1)\mathbf{u}_3 = \mathbf{u}_1 \times \mathbf{u}_2 = \frac{1}{\sqrt{52}}(4, -4, -2) = \frac{1}{\sqrt{26}}(2, -2, -1).

U=(1/21/262/261/21/262/2604/261/26)U = \begin{pmatrix} 1/\sqrt{2} & 1/\sqrt{26} & 2/\sqrt{26} \\ 1/\sqrt{2} & -1/\sqrt{26} & -2/\sqrt{26} \\ 0 & 4/\sqrt{26} & -1/\sqrt{26} \end{pmatrix}

Σ=(21001300)\Sigma = \begin{pmatrix} \sqrt{21} & 0 \\ 0 & \sqrt{13} \\ 0 & 0 \end{pmatrix}

A=UΣVT=(1/21/262/261/21/262/2604/261/26)(21001300)12(1111)A = U \Sigma V^T = \begin{pmatrix} 1/\sqrt{2} & 1/\sqrt{26} & 2/\sqrt{26} \\ 1/\sqrt{2} & -1/\sqrt{26} & -2/\sqrt{26} \\ 0 & 4/\sqrt{26} & -1/\sqrt{26} \end{pmatrix}\begin{pmatrix} \sqrt{21} & 0 \\ 0 & \sqrt{13} \\ 0 & 0 \end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

Verification: UU and VV are orthogonal, Σ\Sigma has the correct singular values on the diagonal, and A=UΣVTA = U\Sigma V^T recovers the original matrix. \blacksquare

8.8 Worked Example: Low-Rank Approximation

Problem. Let A=(110001101)A = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 1 \end{pmatrix}. Find the best rank-1 approximation.

Solution

ATA=(101100011)(110001101)=(211110102)A^T A = \begin{pmatrix} 1 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 1 \end{pmatrix}\begin{pmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \end{pmatrix}.

det(ATAλI)=det(2λ1111λ0102λ)=(2λ)[(1λ)(2λ)](2λ)(1λ)\det(A^T A - \lambda I) = \det\begin{pmatrix} 2-\lambda & 1 & 1 \\ 1 & 1-\lambda & 0 \\ 1 & 0 & 2-\lambda \end{pmatrix} = (2-\lambda)[(1-\lambda)(2-\lambda)] - (2-\lambda) - (1-\lambda)

=(2λ)(λ23λ+21)(3λ)=(2λ)(λ23λ+1)(3λ)= (2-\lambda)(\lambda^2 - 3\lambda + 2 - 1) - (3 - \lambda) = (2-\lambda)(\lambda^2 - 3\lambda + 1) - (3 - \lambda)

=2λ26λ+2λ3+3λ2λ3+λ=λ3+5λ26λ1= 2\lambda^2 - 6\lambda + 2 - \lambda^3 + 3\lambda^2 - \lambda - 3 + \lambda = -\lambda^3 + 5\lambda^2 - 6\lambda - 1

Setting to zero and solving: λ35λ2+6λ+1=0\lambda^3 - 5\lambda^2 + 6\lambda + 1 = 0. Testing λ=3\lambda = 3: 2745+18+1=1027 - 45 + 18 + 1 = 1 \neq 0. Testing λ=4\lambda = 4: 6480+24+1=9064 - 80 + 24 + 1 = 9 \neq 0. By numerical methods or the rational root theorem (no rational roots), the eigenvalues are approximately λ15.25\lambda_1 \approx 5.25, λ21.31\lambda_2 \approx 1.31, λ30.56\lambda_3 \approx -0.56.

Wait, ATAA^T A should be positive semi-definite, so all eigenvalues should be non-negative. Let me recompute.

ATA=(211110102)A^T A = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \end{pmatrix}.

tr(ATA)=5\mathrm{tr}(A^T A) = 5So λ1+λ2+λ3=5\lambda_1 + \lambda_2 + \lambda_3 = 5.

det(ATA)=2(2)1(2)1(1)=42+1=3\det(A^T A) = 2(2) - 1(2) - 1(-1) = 4 - 2 + 1 = 3.

\sum of principal 2×22 \times 2 minors =(21)+(41)+(20)=1+3+2=6= (2-1) + (4-1) + (2-0) = 1 + 3 + 2 = 6.

Characteristic polynomial: λ35λ2+6λ3=0\lambda^3 - 5\lambda^2 + 6\lambda - 3 = 0.

Testing λ=1\lambda = 1: 15+63=101 - 5 + 6 - 3 = -1 \neq 0. Testing λ=3\lambda = 3: 2745+183=3027 - 45 + 18 - 3 = -3 \neq 0.

By the trigonometric method for cubics or numerical approximation, λ13.35\lambda_1 \approx 3.35, λ21.35\lambda_2 \approx 1.35, λ30.30\lambda_3 \approx 0.30.

The best rank-1 approximation uses only σ1=λ11.83\sigma_1 = \sqrt{\lambda_1} \approx 1.83 and its corresponding singular vectors, yielding A1=σ1u1v1TA_1 = \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T with error AA1F=λ2+λ31.651.28\lVert A - A_1 \rVert_F = \sqrt{\lambda_2 + \lambda_3} \approx \sqrt{1.65} \approx 1.28. \blacksquare

8.9 Worked Example: Condition Number and Numerical Stability

Problem. Compute the condition number of A=(1111.0001)A = \begin{pmatrix} 1 & 1 \\ 1 & 1.0001 \end{pmatrix} and discuss its implications.

Solution

ATA=(22.00012.00012.00020001)A^T A = \begin{pmatrix} 2 & 2.0001 \\ 2.0001 & 2.00020001 \end{pmatrix}.

det(ATAλI)=(2λ)(2.00020001λ)2.00012\det(A^T A - \lambda I) = (2 - \lambda)(2.00020001 - \lambda) - 2.0001^2

=λ24.00020001λ+4.000400024.00040001=λ24.00020001λ+0.00000001= \lambda^2 - 4.00020001\lambda + 4.00040002 - 4.00040001 = \lambda^2 - 4.00020001\lambda + 0.00000001

λ=4.00020001±16.001600160.0000000424.00020001±4.000199992\lambda = \frac{4.00020001 \pm \sqrt{16.00160016 - 0.00000004}}{2} \approx \frac{4.00020001 \pm 4.00019999}{2}

λ14.00020000,λ20.00000001\lambda_1 \approx 4.00020000, \quad \lambda_2 \approx 0.00000001

σ12.00005,σ20.0001\sigma_1 \approx 2.00005, \quad \sigma_2 \approx 0.0001

κ(A)=σ1/σ220000\kappa(A) = \sigma_1 / \sigma_2 \approx 20000.

This means relative errors in b\mathbf{b} can be amplified by a factor of up to 2000020000 in the solution x\mathbf{x}. The matrix is nearly singular (the two rows are almost linearly dependent), and solving Ax=bA\mathbf{x} = \mathbf{b} in floating-point arithmetic will lose approximately log10(20000)4.3\log_{10}(20000) \approx 4.3 digits of precision. \blacksquare

8.10 Common Pitfalls

  • Singular values are always non-negative. Unlike eigenvalues, which can be negative or complex, singular values are the square roots of eigenvalues of ATAA^T AHence always real and non-negative.
  • The SVD is not unique. If AA has repeated singular values, the corresponding singular vectors can be any orthonormal basis of the eigenspace. The signs of singular vectors can also be flipped in pairs.
  • The pseudoinverse equals the inverse only for square, full-rank matrices. When AA is not full rank, A+AIA^+A \neq I; instead, A+AA^+A is the orthogonal projection onto im(AT)\mathrm{im}(A^T).
  • The SVD and eigendecomposition are different decompositions. The SVD always exists for any matrix, but the eigendecomposition requires the matrix to be square. Even for symmetric matrices, the singular values are λi|\lambda_i|Not λi\lambda_i.