Skip to content

Inner Product Spaces

7.1 Definition of an Inner Product

An inner product on a vector space VV over FF (where F=RF = \mathbb{R} or C\mathbb{C}) is a Function ,:V×VF\langle \cdot, \cdot \rangle : V \times V \to F satisfying:

  1. Conjugate symmetry: u,v=v,u\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}
  2. Linearity in the first argument: αu+βw,v=αu,v+βw,v\langle \alpha\mathbf{u} + \beta\mathbf{w}, \mathbf{v} \rangle = \alpha\langle \mathbf{u}, \mathbf{v} \rangle + \beta\langle \mathbf{w}, \mathbf{v} \rangle
  3. Positive definiteness: v,v0\langle \mathbf{v}, \mathbf{v} \rangle \geq 0 with equality iff v=0\mathbf{v} = \mathbf{0}

A vector space equipped with an inner product is called an inner product space.

Example. The standard inner product on Rn\mathbb{R}^n is x,y=i=1nxiyi\langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^n x_i y_i. On Cn\mathbb{C}^n x,y=i=1nxiyi\langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^n x_i \overline{y_i}.

Example. On C[a,b]C[a,b]The L2L^2 inner product is f,g=abf(x)g(x)dx\langle f, g \rangle = \int_a^b f(x)g(x)\,dx.

7.2 Norms

Every inner product induces a norm:

v=v,v\lVert \mathbf{v} \rVert = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}

Theorem 7.1 (Cauchy—Schwarz Inequality). For all u,vV\mathbf{u}, \mathbf{v} \in V

u,vuv\lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert \leq \lVert \mathbf{u} \rVert \, \lVert \mathbf{v} \rVert

With equality if and only if u\mathbf{u} and v\mathbf{v} are linearly dependent.

Proof. If v=0\mathbf{v} = \mathbf{0}Both sides are 0 and the result holds. Assume v0\mathbf{v} \neq \mathbf{0}. For any tRt \in \mathbb{R} (or C\mathbb{C}), positive definiteness gives

0utv,utv=u,utv,utu,v+t2v,v0 \leq \langle \mathbf{u} - t\mathbf{v}, \mathbf{u} - t\mathbf{v} \rangle = \langle \mathbf{u}, \mathbf{u} \rangle - t\langle \mathbf{v}, \mathbf{u} \rangle - \overline{t}\langle \mathbf{u}, \mathbf{v} \rangle + \lvert t \rvert^2 \langle \mathbf{v}, \mathbf{v} \rangle

Set t=u,vv,vt = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\langle \mathbf{v}, \mathbf{v} \rangle} (the value that minimises the right side):

0u2u,v2v20 \leq \lVert \mathbf{u} \rVert^2 - \frac{\lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert^2}{\lVert \mathbf{v} \rVert^2}

Rearranging: u,v2u2v2\lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert^2 \leq \lVert \mathbf{u} \rVert^2 \lVert \mathbf{v} \rVert^2. Taking square roots gives the result. Equality holds iff utv=0\mathbf{u} - t\mathbf{v} = \mathbf{0} I.e., u\mathbf{u} and v\mathbf{v} are linearly dependent. \blacksquare

Theorem 7.2 (Triangle Inequality).

u+vu+v\lVert \mathbf{u} + \mathbf{v} \rVert \leq \lVert \mathbf{u} \rVert + \lVert \mathbf{v} \rVert

Proof.

u+v2=u+v,u+v=u2+2Reu,v+v2\lVert \mathbf{u} + \mathbf{v} \rVert^2 = \langle \mathbf{u} + \mathbf{v}, \mathbf{u} + \mathbf{v} \rangle = \lVert \mathbf{u} \rVert^2 + 2\,\mathrm{Re}\langle \mathbf{u}, \mathbf{v} \rangle + \lVert \mathbf{v} \rVert^2

By Cauchy—Schwarz, Reu,vu,vuv\mathrm{Re}\langle \mathbf{u}, \mathbf{v} \rangle \leq \lvert\langle \mathbf{u}, \mathbf{v} \rangle\rvert \leq \lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVertSo

u+v2u2+2uv+v2=(u+v)2\lVert \mathbf{u} + \mathbf{v} \rVert^2 \leq \lVert \mathbf{u} \rVert^2 + 2\lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert + \lVert \mathbf{v} \rVert^2 = (\lVert \mathbf{u} \rVert + \lVert \mathbf{v} \rVert)^2

Taking square roots gives the result. \blacksquare

7.3 Orthogonality

Two vectors u,v\mathbf{u}, \mathbf{v} are orthogonal if u,v=0\langle \mathbf{u}, \mathbf{v} \rangle = 0. We write uv\mathbf{u} \perp \mathbf{v}.

An orthonormal set {e1,,ek}\{e_1, \ldots, e_k\} satisfies ei,ej=δij\langle e_i, e_j \rangle = \delta_{ij}.

Theorem 7.3 (Pythagorean Theorem). If uv\mathbf{u} \perp \mathbf{v}Then

u+v2=u2+v2\lVert \mathbf{u} + \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + \lVert \mathbf{v} \rVert^2

Proof. u+v2=u2+2u,v+v2=u2+v2\lVert \mathbf{u} + \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + 2\langle \mathbf{u}, \mathbf{v} \rangle + \lVert \mathbf{v} \rVert^2 = \lVert \mathbf{u} \rVert^2 + \lVert \mathbf{v} \rVert^2. \blacksquare

Proposition 7.4. Every orthonormal set is linearly independent.

Proof. If i=1kαiei=0\sum_{i=1}^k \alpha_i e_i = \mathbf{0}Then taking the inner product with eje_j: αj=αiei,ej=0,ej=0\alpha_j = \langle \sum \alpha_i e_i, e_j \rangle = \langle \mathbf{0}, e_j \rangle = 0 for each jj. \blacksquare

7.4 Gram—Schmidt Process

The Gram—Schmidt process converts a linearly independent set {v1,,vn}\{\mathbf{v}_1, \ldots, \mathbf{v}_n\} into an orthonormal set {e1,,en}\{e_1, \ldots, e_n\}:

u1=v1,e1=u1u1\mathbf{u}_1 = \mathbf{v}_1, \quad e_1 = \frac{\mathbf{u}_1}{\lVert \mathbf{u}_1 \rVert}

uk=vki=1k1vk,eiei,ek=ukuk\mathbf{u}_k = \mathbf{v}_k - \sum_{i=1}^{k-1} \langle \mathbf{v}_k, e_i \rangle e_i, \quad e_k = \frac{\mathbf{u}_k}{\lVert \mathbf{u}_k \rVert}

Proposition 7.5. At each step, span{e1,,ek}=span{v1,,vk}\mathrm{span}\{e_1, \ldots, e_k\} = \mathrm{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}.

Proof. By construction, uk\mathbf{u}_k is vk\mathbf{v}_k minus its projection onto span{e1,,ek1}=span{v1,,vk1}\mathrm{span}\{e_1, \ldots, e_{k-1}\} = \mathrm{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_{k-1}\}. So ukspan{v1,,vk}\mathbf{u}_k \in \mathrm{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} and vk=uk+i=1k1vk,eieispan{u1,,uk}\mathbf{v}_k = \mathbf{u}_k + \sum_{i=1}^{k-1}\langle \mathbf{v}_k, e_i \rangle e_i \in \mathrm{span}\{\mathbf{u}_1, \ldots, \mathbf{u}_k\}. Since each eie_i is a scalar multiple of ui\mathbf{u}_iThe spans coincide. \blacksquare

7.5 Orthogonal Projection

The orthogonal projection of v\mathbf{v} onto a subspace WW with orthonormal basis {e1,,ek}\{e_1, \ldots, e_k\} is

projW(v)=i=1kv,eiei\mathrm{proj_W}(\mathbf{v}) = \sum_{i=1}^k \langle \mathbf{v}, e_i \rangle e_i

Theorem 7.6 (Best Approximation). Among all vectors in WWThe orthogonal projection projW(v)\mathrm{proj_W}(\mathbf{v}) minimises the distance to v\mathbf{v}:

vprojW(v)vwfor all wW\lVert \mathbf{v} - \mathrm{proj_W}(\mathbf{v}) \rVert \leq \lVert \mathbf{v} - \mathbf{w} \rVert \quad \mathrm{for}~all~ \mathbf{w} \in W

Proof. For any wW\mathbf{w} \in WWrite vw=(vprojW(v))+(projW(v)w)\mathbf{v} - \mathbf{w} = (\mathbf{v} - \mathrm{proj_W}(\mathbf{v})) + (\mathrm{proj_W}(\mathbf{v}) - \mathbf{w}). The first term is orthogonal to WW (hence to the second term, which lies in WW), so by the Pythagorean theorem:

vw2=vprojW(v)2+projW(v)w2vprojW(v)2\lVert \mathbf{v} - \mathbf{w} \rVert^2 = \lVert \mathbf{v} - \mathrm{proj_W}(\mathbf{v}) \rVert^2 + \lVert \mathrm{proj_W}(\mathbf{v}) - \mathbf{w} \rVert^2 \geq \lVert \mathbf{v} - \mathrm{proj_W}(\mathbf{v}) \rVert^2

With equality iff w=projW(v)\mathbf{w} = \mathrm{proj_W}(\mathbf{v}). \blacksquare

7.6 Least Squares Approximation

A fundamental application of orthogonal projection is fitting functions to data. Given a subspace WW of an inner product space VV and a target vV\mathbf{v} \in VThe best approximation in WW Is the orthogonal projection projW(v)\mathrm{proj_W}(\mathbf{v}).

7.7 Worked Example: Gram—Schmidt

Problem. Apply the Gram—Schmidt process to v1=(1,1,0)\mathbf{v}_1 = (1, 1, 0) v2=(1,0,1)\mathbf{v}_2 = (1, 0, 1), v3=(0,1,1)\mathbf{v}_3 = (0, 1, 1) in R3\mathbb{R}^3 with the standard inner Product.

Solution

u1=v1=(1,1,0)\mathbf{u}_1 = \mathbf{v}_1 = (1, 1, 0), u1=2\lVert \mathbf{u}_1 \rVert = \sqrt{2}, e1=12(1,1,0)e_1 = \frac{1}{\sqrt{2}}(1, 1, 0).

u2=v2v2,e1e1=(1,0,1)1212(1,1,0)=(1,0,1)12(1,1,0)=(12,12,1)\mathbf{u}_2 = \mathbf{v}_2 - \langle \mathbf{v}_2, e_1 \rangle e_1 = (1, 0, 1) - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 1, 0) = (1, 0, 1) - \frac{1}{2}(1, 1, 0) = (\frac{1}{2}, -\frac{1}{2}, 1)

u2=1/4+1/4+1=3/2\lVert \mathbf{u}_2 \rVert = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}, e2=13/2(12,12,1)=16(1,1,2)e_2 = \frac{1}{\sqrt{3/2}}(\frac{1}{2}, -\frac{1}{2}, 1) = \frac{1}{\sqrt{6}}(1, -1, 2).

u3=v3v3,e1e1v3,e2e2\mathbf{u}_3 = \mathbf{v}_3 - \langle \mathbf{v}_3, e_1 \rangle e_1 - \langle \mathbf{v}_3, e_2 \rangle e_2

v3,e1=12(0+1+0)=12\langle \mathbf{v}_3, e_1 \rangle = \frac{1}{\sqrt{2}}(0 + 1 + 0) = \frac{1}{\sqrt{2}}

v3,e2=16(01+2)=16\langle \mathbf{v}_3, e_2 \rangle = \frac{1}{\sqrt{6}}(0 - 1 + 2) = \frac{1}{\sqrt{6}}

u3=(0,1,1)1212(1,1,0)1616(1,1,2)=(0,1,1)12(1,1,0)16(1,1,2)\mathbf{u}_3 = (0, 1, 1) - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 1, 0) - \frac{1}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(1, -1, 2) = (0, 1, 1) - \frac{1}{2}(1, 1, 0) - \frac{1}{6}(1, -1, 2)

=(1216,112+16,113)=(23,23,23)= (-\frac{1}{2} - \frac{1}{6}, 1 - \frac{1}{2} + \frac{1}{6}, 1 - \frac{1}{3}) = (-\frac{2}{3}, \frac{2}{3}, \frac{2}{3})

u3=4/9+4/9+4/9=4/3=2/3\lVert \mathbf{u}_3 \rVert = \sqrt{4/9 + 4/9 + 4/9} = \sqrt{4/3} = 2/\sqrt{3}, e3=32(23,23,23)=13(1,1,1)e_3 = \frac{\sqrt{3}}{2}(-\frac{2}{3}, \frac{2}{3}, \frac{2}{3}) = \frac{1}{\sqrt{3}}(-1, 1, 1).

Verification: e1,e2=112(11+0)=0\langle e_1, e_2 \rangle = \frac{1}{\sqrt{12}}(1 - 1 + 0) = 0. \checkmark e1,e3=16(1+1+0)=0\langle e_1, e_3 \rangle = \frac{1}{\sqrt{6}}(-1 + 1 + 0) = 0. \checkmark e2,e3=118(11+2)=0\langle e_2, e_3 \rangle = \frac{1}{\sqrt{18}}(-1 - 1 + 2) = 0. \checkmark

The orthonormal basis is {12(1,1,0), 16(1,1,2), 13(1,1,1)}\left\{\frac{1}{\sqrt{2}}(1,1,0),\ \frac{1}{\sqrt{6}}(1,-1,2),\ \frac{1}{\sqrt{3}}(-1,1,1)\right\}. \blacksquare

:::caution Common Pitfall The Gram—Schmidt process requires a linearly independent starting set. If the input vectors are Linearly dependent, one of the uk\mathbf{u}_k will be the zero vector, and the process will fail (attempting to divide by zero in the normalisation step).

7.8 Worked Example: Orthogonal Projection onto a Plane

Problem. Find the orthogonal projection of v=(3,1,2)\mathbf{v} = (3, -1, 2) onto the plane WW spanned by (1,0,1)(1, 0, 1) and (0,1,1)(0, 1, 1) in R3\mathbb{R}^3 with the standard inner product. Also find the distance from v\mathbf{v} to WW.

Solution

First, apply Gram—Schmidt to obtain an orthonormal basis for WW.

u1=(1,0,1)\mathbf{u}_1 = (1, 0, 1), u1=2\lVert \mathbf{u}_1 \rVert = \sqrt{2}, e1=12(1,0,1)e_1 = \frac{1}{\sqrt{2}}(1, 0, 1).

u2=(0,1,1)(0,1,1),e1e1=(0,1,1)1212(1,0,1)=(0,1,1)12(1,0,1)=(12,1,12)\mathbf{u}_2 = (0, 1, 1) - \langle (0,1,1), e_1 \rangle e_1 = (0, 1, 1) - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 0, 1) = (0, 1, 1) - \frac{1}{2}(1, 0, 1) = (-\frac{1}{2}, 1, \frac{1}{2}).

u2=1/4+1+1/4=3/2\lVert \mathbf{u}_2 \rVert = \sqrt{1/4 + 1 + 1/4} = \sqrt{3/2}, e2=16(1,2,1)e_2 = \frac{1}{\sqrt{6}}(-1, 2, 1).

Now compute the projection:

v,e1=12(3+0+2)=52\langle \mathbf{v}, e_1 \rangle = \frac{1}{\sqrt{2}}(3 + 0 + 2) = \frac{5}{\sqrt{2}}

v,e2=16(32+2)=36\langle \mathbf{v}, e_2 \rangle = \frac{1}{\sqrt{6}}(-3 - 2 + 2) = \frac{-3}{\sqrt{6}}

projW(v)=5212(1,0,1)+3616(1,2,1)\mathrm{proj_W}(\mathbf{v}) = \frac{5}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 0, 1) + \frac{-3}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(-1, 2, 1)

=52(1,0,1)+36(1,2,1)=(52,0,52)+(12,1,12)=(3,1,2)= \frac{5}{2}(1, 0, 1) + \frac{-3}{6}(-1, 2, 1) = (\frac{5}{2}, 0, \frac{5}{2}) + (\frac{1}{2}, -1, -\frac{1}{2}) = (3, -1, 2)

The residual is vprojW(v)=(0,0,0)\mathbf{v} - \mathrm{proj_W}(\mathbf{v}) = (0, 0, 0)So the distance is 0. This means vW\mathbf{v} \in W itself. Indeed, v=3(1,0,1)(0,1,1)span{(1,0,1),(0,1,1)}\mathbf{v} = 3(1, 0, 1) - (0, 1, 1) \in \mathrm{span}\{(1,0,1), (0,1,1)\}. \blacksquare

7.9 Worked Example: L2L^2 Least Squares Approximation

Problem. Find the constant function cc (i.e., the best approximation by a degree-0 polynomial) That minimises 01(exc)2dx\int_0^1 (e^x - c)^2\,dx.

Solution

We want the orthogonal projection of f(x)=exf(x) = e^x onto the subspace W=span{1}W = \mathrm{span}\{1\} in the L2[0,1]L^2[0,1] inner product space. The orthonormal basis for WW is e1=1e_1 = 1 (since 12=011dx=1\lVert 1 \rVert^2 = \int_0^1 1\,dx = 1).

projW(f)=f,11=(01exdx)1=(e1)1\mathrm{proj_W}(f) = \langle f, 1 \rangle \cdot 1 = \left(\int_0^1 e^x\,dx\right) \cdot 1 = (e - 1) \cdot 1

So the best constant approximation is c=e11.718c = e - 1 \approx 1.718.

Verification: The error is ex(e1)e^x - (e-1). Expanding exe^x as a Taylor series around x=1/2x = 1/2: The constant term is e1/21.649e^{1/2} \approx 1.649But our answer e11.718e - 1 \approx 1.718 is the L2L^2-optimal constant, not the Taylor approximation. The two optimisation criteria differ. \blacksquare

7.10 Common Pitfalls

  • The Cauchy—Schwarz inequality is not the triangle inequality. Cauchy—Schwarz bounds the inner product by the product of norms; the triangle inequality bounds the norm of a sum by the sum of norms. They are related (the triangle inequality follows from Cauchy—Schwarz) but distinct.
  • Gram—Schmidt is numerically unstable. For floating-point computation, modified Gram—Schmidt or Householder reflections are preferred.
  • Orthogonal projection decomposes v\mathbf{v} uniquely. v=projW(v)+v\mathbf{v} = \mathrm{proj_W}(\mathbf{v}) + \mathbf{v}^\perp where vW\mathbf{v}^\perp \in W^\perp. This decomposition is unique and is called the orthogonal decomposition.

:::