4.6. Least-squares Fitting to Data: Appendix on The Geometrical Approach#

References:

Introduction#

We have seen that one common and important approach to approximating data

\[(x_i, y_i),\; 1 \leq i \leq N\]

by a polynomial \(y = p(x)= c_0 + \cdots c_n x^n\) of degree at most \(n\) is to minimize the “average” of the errors

\[e_i = y_i - f(x_i),\]

in the sense of the root-mean-square error \(E_{RMS} = \displaystyle \sqrt{\sum_{i=1}^N e_i^2}\). Equivalently, we will avoid the square root and just minimize the sum of the squares of the errors:

\[E(c_0, c_1, \dots, c_n) = \sum_{i=1}^N e_i^2\]

Linear least squares: minimizing RMS error using calculus#

One way to derive the needed formulas is by seeking the critical point og th abev function via teh \(n+1\) equations

\[ \frac{\partial E}{\partial c_i} =0, \quad 0 \leq i \leq n \]

Fortunately these gives a systems of linear equations, and it has a unique solution, thus giving the desired global minimum.

However, there is another “geometrical” approach, that is also relevant as an introduction to strategies also used for other minimization problems, for example with application to the numerical solutions of boundary value problems for differential equations.

Linear least squares: minimizing RMS error by minimizing “Euclidean” distance with geometry#

For approximation by a polynomial \(y = p(x)= c_0 + \cdots c_n x^n\), we can think of the data \(y_i, 1 \leq i \leq N\) as giving a point in \(N\)-dimensional space \(\mathbb(R)^N\), and the approximations as giving another point with coordinates \(\tilde{y}_i := p(x_i)\).

Then the least squares problem is to minimize the Euclidean distance \(\| y - \tilde{y} \|_2\).

One way to think of this is that we attempt unsuccessfully to solve the collocation equations \(p(x_i) = y_i\) as an over-determined sytem of \(N\) equations in \(n+1\) unknowns \(A c = y\), where

\[\begin{split} A = \left[\begin{array}{ccccc} 1 & x_1 & x_1^2 & \dots & x_1^n \\ 1 & x_2 & x_2^2 & \dots & x_2^n \\ \vdots & \vdots & \vdots && \vdots \\ 1 & x_i & x_i^2 & \dots & x_i^n \\ \vdots & \vdots & \vdots && \vdots \\ 1 & x_N & x_N^2 & \dots & x_N^n \end{array}\right] \end{split}\]

so that \(Ac\) evaluates the polynomial at all the \(x_i\) values.

Now we introduce a key geometrical idea: the possible values of \(\tilde{y} = Ac\) lie in an \((n+1)\)-dimensional sub-space or “hyperplane” within \(\mathbb{R}^N\), and the point in this hyper-plane closest to \(y \in \mathbb{R}^N\) is the perpendular projection of the latter point onto this hyper-plane: that is, the error vector \(e = y - \tilde{y}\) is perpendicular to every vector in the subspace of vectors \(Ac'\). Thus, \(e \perp A c'\) for every \(c' \in \mathbb{R}^{n+1}\).

Writing this in terms of inner products,

\[ (y - \tilde{y}, A c') = 0 \quad \text{for every } c' \in \mathbb{R}^{n+1}. \]

Recall that \((x, Ay) = (A^Tx, y)\) where \(A^T\) is the transpose of \(A\): the mirror image with \(a^T_{i,j} = a_{j, i}\).

Using this gives

\[ (A^T (y - \tilde{y}), c') = 0 \quad \text{for every } c' \in \mathbb(R)^{n+1}. \]

and so the vector at left must be zero: \(A^T (y - \tilde{y})= 0\).

Inserting \(\tilde{y} = A c\) gives \(A^T y = A^T A c\), so

\[M c = A^T y\]

where \(M := A^T A\)

Since here \(A\) is \(N \times (n+1)\), \(A^T\) is \((n+1) \times N\), and the product \(M\) is an \((n+1) \times (n+1)\) square matrix.

Further calculation shows that in fact

\[\begin{split} M = \left[\begin{array}{cccc} m_0 & m_1 & \dots & m_n \\ m_1 & m_2 & \dots & m_{n+1} \\ \vdots & \vdots & \ddots & \vdots \\ m_n & m_{n+1} & \dots & m_{2n} \end{array}\right], \quad m_k = \sum_{i=1}^N x_i^k \end{split}\]

and the right-hand side is

\[ A^T y = p = \left[ p_0, p_1, \dots, p_n \right]^T, \quad p_k = \sum_{i=1}^N x_i^k y_i \]

so these equations are the same ones \(M c = p\) given by the previous calculus derivation.