Least-squares Fitting to Data: Appendix on The Geometrical Approach
Contents
4.6. Least-squares Fitting to Data: Appendix on The Geometrical Approach#
References:
Chapter 4 Least Squares of [Sauer, 2019], sections 1 and 2.
Section 8.1 Discrete Least Squares Approximation of [Burden et al., 2016].
Introduction#
We have seen that one common and important approach to approximating data
by a polynomial \(y = p(x)= c_0 + \cdots c_n x^n\) of degree at most \(n\) is to minimize the “average” of the errors
in the sense of the root-mean-square error \(E_{RMS} = \displaystyle \sqrt{\sum_{i=1}^N e_i^2}\). Equivalently, we will avoid the square root and just minimize the sum of the squares of the errors:
Linear least squares: minimizing RMS error using calculus#
One way to derive the needed formulas is by seeking the critical point og th abev function via teh \(n+1\) equations
Fortunately these gives a systems of linear equations, and it has a unique solution, thus giving the desired global minimum.
However, there is another “geometrical” approach, that is also relevant as an introduction to strategies also used for other minimization problems, for example with application to the numerical solutions of boundary value problems for differential equations.
Linear least squares: minimizing RMS error by minimizing “Euclidean” distance with geometry#
For approximation by a polynomial \(y = p(x)= c_0 + \cdots c_n x^n\), we can think of the data \(y_i, 1 \leq i \leq N\) as giving a point in \(N\)-dimensional space \(\mathbb(R)^N\), and the approximations as giving another point with coordinates \(\tilde{y}_i := p(x_i)\).
Then the least squares problem is to minimize the Euclidean distance \(\| y - \tilde{y} \|_2\).
One way to think of this is that we attempt unsuccessfully to solve the collocation equations \(p(x_i) = y_i\) as an over-determined sytem of \(N\) equations in \(n+1\) unknowns \(A c = y\), where
so that \(Ac\) evaluates the polynomial at all the \(x_i\) values.
Now we introduce a key geometrical idea: the possible values of \(\tilde{y} = Ac\) lie in an \((n+1)\)-dimensional sub-space or “hyperplane” within \(\mathbb{R}^N\), and the point in this hyper-plane closest to \(y \in \mathbb{R}^N\) is the perpendular projection of the latter point onto this hyper-plane: that is, the error vector \(e = y - \tilde{y}\) is perpendicular to every vector in the subspace of vectors \(Ac'\). Thus, \(e \perp A c'\) for every \(c' \in \mathbb{R}^{n+1}\).
Writing this in terms of inner products,
Recall that \((x, Ay) = (A^Tx, y)\) where \(A^T\) is the transpose of \(A\): the mirror image with \(a^T_{i,j} = a_{j, i}\).
Using this gives
and so the vector at left must be zero: \(A^T (y - \tilde{y})= 0\).
Inserting \(\tilde{y} = A c\) gives \(A^T y = A^T A c\), so
where \(M := A^T A\)
Since here \(A\) is \(N \times (n+1)\), \(A^T\) is \((n+1) \times N\), and the product \(M\) is an \((n+1) \times (n+1)\) square matrix.
Further calculation shows that in fact
and the right-hand side is
so these equations are the same ones \(M c = p\) given by the previous calculus derivation.