13. Polynomial Collocation (Interpolation/Extrapolation) and Approximation

Last Revised on Wednesday February 17.

References:

13.1. Introduction

Numerical methods for dealing with functions require describing them, at least approximately, using a finite list of numbers, and the most basic approach is to approximate by a polynomial. (Other important choices are rational functions and “trigonometric polynomials”: sums of multiples of sines and cosines.) Such polynomials can then be used to aproximate derivatives and integrals.

The simplest idea for approximating \(f(x)\) on domain \([a, b]\) is to start with a finite collection of node values \(x_i \in [a, b]\), \(0 \leq i \leq n\) and then seek a polynomial \(p\) which collocates with \(f\) at those values: \(p(x_i) = f(x_i)\) for \(0 \leq i \leq n\). Actually, we can put the function aside for now, and simply seek a polynomial that passes through a list of points \((x_i, y_i)\); later we will achieve collocation with \(f\) by choosing \(y_i = f(x_i)\).

In fact there are infinitely many such polynomials: given one, add to it any polynomial with zeros at all of the \(n+1\) notes. So to make the problem well-posed, we seek the collocating polynomial of lowest degree.

Theorem Given \(n+1\) distinct values \(x_i\), \(0 \leq i \leq n\), and corresponding \(y\)-values \(y_i\), there is a unique polynomial \(P\) of degree at most \(n\) with \(P(x_i) = y_i\) for \(0 \leq i \leq n\).

(Note: although the degree is typically \(n\), it can be less; as an extreme example, if all \(y_i\) are equal to \(c\), then \(P(x)\) is that constant \(c\).)

Historically there are several methods for finding \(P_n\) and proving its uniqueness, in particular, the divided difference method introduced by Newton and the Lagrange polynomial method. However for our purposes, and for most modern needs, a different method is easiest, and it also introduces a strategy that will be of repeated use later in this course: the Method of Undertermined Coefficients or MUC.

In general, this method starts by assuming that the function wanted is a sum of unknown multiples of a collection of known functions. Here, \(P(x) = c_n x^n + c_{n-1} x^{n-1} + \cdots+ c_1 x + c_0 = \sum_{j=0}^n c_j x^j\).
(Note: any of the \(c_i\) could be zero, including \(c_n\), in which case the degree is less than \(n\).)
The unknown factors (\(c_0 \cdots c_n\)) are the undetermined coefficients.

Next one states the problem as a system of equations for these undetermined coefficients, and solves them.
Here, we have \(n+1\) conditions to be met:

\[P(x_i) = \sum_{j=0}^n c_j x_i^j = y_i, \quad 0 \leq i \leq n\]

This is a system if \(n+1\) simultaneous linear equations in \(n+1\) iunknowns, so the question of existence and uniqueness is exactly the question of whether the corresponding matrix is singular, and so is equivalent to the case of all \(y_i = 0\) having only the solution with all \(c_i = 0\).

Back in terms of polynomials, this is the claim that the only polynomial of degree at most \(n\) with zeros \(x_0 \dots x_n\). And this is true, because any non-trivial polynomial with those \(n+1\) distinct roots is of degree at least \(n+1\), so the only “degree n” polynomial fitting this data is \(P(x) \equiv 0\). The theorem is proven.

The proof of this theorem is completely constructive; it gives the only numerical method we need, and which is the one implemented in Numpy through the pair of functions numpy.polyfit and numpy.polyval that we will use in examples. (Aside: here as in many places, Numpy mimics the names and functionality of corresponding Matlab tools.)

Briefly, the algorithm is this (indexing from 0 !)

  • Create the \(n+1 \times n+1\) matrix \(V\) with elements

\[v_{i,j} = x_i^j,\; 0 \leq i \leq n, \, 0 \leq j \leq n\]

and the \(n+1\)-element column vector \(y\) with elements \(y_i\) as above.

  • Solve \(V c = y\) for the vector of coefficients \(c_j\) as above.

I use the name \(V\) because this is called the Vandermonde Matrix.

from numpy import array, linspace, polyfit, polyval, zeros_like, empty_like, exp
from  matplotlib.pyplot import figure, plot, title, grid, legend

13.2. Python example with polyfit and polyvalfrom Numpy

As usual, I concoct a first example with known correct answer, by using a polynomial as \(f\).

def f(x):
    return 3*x**4 - 5*x**3 -2*x**2 + 7*x + 4
n = 4
x_nodes = array([1., 2., 7., 5., 4.])  # They do not need to be in order
print(f"The x nodes 'x_i' are {x_nodes}")
y_nodes = empty_like(x_nodes)
for i in range(len(x_nodes)):
    y_nodes[i] = f(x_nodes[i])
print(f"The y values at the nodes are {y_nodes}")
c = polyfit(x_nodes, y_nodes, n)  # Note: n must as be specified; we will see why soon
print(f"The coefficients of P are {c}")
P_values = polyval(c, x_nodes)
print(f"The values of P(x_i) are {P_values}")
The x nodes 'x_i' are [1. 2. 7. 5. 4.]
The y values at the nodes are [   7.   18. 5443. 1239.  448.]
The coefficients of P are [ 3. -5. -2.  7.  4.]
The values of P(x_i) are [   7.   18. 5443. 1239.  448.]

13.2.1. Python presentation aside: pretty-print the result

print("P(x) = ", end="")
for j in range(n-1):  # count up to j=n-2 only, omitting the x and constant terms
    print(f"{c[j]:.4}x^{n-j} + ", end="")
print(f"{c[-2]:.4}x + {c[-1]:.4}")
P(x) = 3.0x^4 + -5.0x^3 + -2.0x^2 + 7.0x + 4.0

13.3. Example 2: \(f(x)\) not a polynomial of degree \(\leq n\)

# Reduce the degree of $P$ to at most 3:
n = 3
# Truncate the data to 4 values:
x_nodes = array([0., 2., 3., 4.])
y_nodes = empty_like(x_nodes)
for i in range(len(x_nodes)):
    y_nodes[i] = f(x_nodes[i])
print(f"n is now {n}, the nodes are now {x_nodes}, with f(x_i) values {y_nodes}")
c = polyfit(x_nodes, y_nodes, n)
print(f"The coefficients of P are now {c}")
print("P(x) = ", end="")
for j in range(n-1):  # count up to j=n-2 only, omitting the x and constant terms
    print(f"{c[j]:.4}x^{n-j} + ", end="")
print(f"{c[-2]:.4}x + {c[-1]:.4}")
P_values = polyval(c, x_nodes)
print(f"The values of P(x_i) are now {P_values}")
n is now 3, the nodes are now [0. 2. 3. 4.], with f(x_i) values [  4.  18. 115. 448.]
The coefficients of P are now [ 22. -80.  79.   4.]
P(x) = 22.0x^3 + -80.0x^2 + 79.0x + 4.0
The values of P(x_i) are now [  4.  18. 115. 448.]

There are several ways to assess the accuracy of this fit; we start graphically, and later consider the maximum and root-mean-square (RMS) errors.

x = linspace(0, 4)  # With a default 50 points, for graphing
figure(figsize=[12,8])
plot(x, f(x), label="y=f(x)")
plot(x_nodes, y_nodes, "*", label="nodes")
P_n_x = polyval(c, x)
plot(x, P_n_x, label="y = P_n(x)")
legend()
grid(True)
../_images/polynomial-collocation+approximation-python_16_0.png
P_error = f(x) - P_n_x
figure(figsize=[12,8])
title("Error in P_n(x)")
plot(x, P_error, label="y=f(x)")
plot(x_nodes, zeros_like(x_nodes), "*")
grid(True)
../_images/polynomial-collocation+approximation-python_17_0.png

13.4. Example 3: \(f(x)\) not a polynomial at all

def g(x): return exp(x)
a_g = -1
b_g = 1
n = 3
x_g_nodes = linspace(a_g, b_g, n+1)
y_g_nodes = empty_like(x_g_nodes)
for i in range(len(x_g_nodes)):
    y_g_nodes[i] = g(x_g_nodes[i])
print(f"{n=}")
print(f"node x values {x_g_nodes}")
print(f"node y values {y_g_nodes}")
c_g = polyfit(x_g_nodes, y_g_nodes, n)
print("P(x) = ", end="")

for j in range(n-1):  # count up to j=n-2 only, omitting the x and constant terms
    print(f"{c_g[j]:.4}x^{n-j} + ", end="")
print(f"{c_g[-2]:.4}x + {c_g[-1]:.4}")

P_values = polyval(c_g, x_g_nodes)
print(f"The values of P(x_i) are {P_values}")
n=3
node x values [-1.         -0.33333333  0.33333333  1.        ]
node y values [0.36787944 0.71653131 1.39561243 2.71828183]
P(x) = 0.1762x^3 + 0.5479x^2 + 0.999x + 0.9952
The values of P(x_i) are [0.36787944 0.71653131 1.39561243 2.71828183]

There are several ways to assess the accuracy of this fit. We start graphically, and later consider the maximum and root-mean-square (RMS) errors.

x_g = linspace(a_g - 0.25, b_g + 0.25)  # Go a bit beyond the nodes in each direction
figure(figsize=[12,8])
title("With $g(x) = e^x$")
plot(x_g, g(x_g), label="y = $g(x)$")
plot(x_g_nodes, y_g_nodes, "*", label="nodes")
P_n_x = polyval(c_g, x_g)
plot(x_g, P_n_x, label=f"y = $P_{n}(x)$")
legend()
grid(True)
../_images/polynomial-collocation+approximation-python_23_0.png
P_error = g(x_g) - P_n_x
figure(figsize=[12,8])
title(f"Error in $P_{n}(x)$ for $g(x) = e^x$")
plot(x_g, P_error)
plot(x_g_nodes, zeros_like(x_g_nodes), "*")
grid(True)
../_images/polynomial-collocation+approximation-python_24_0.png

13.5. Why the maximum degree \(n\) must be specified in polyval

The function numpy.polyval can also give a polynomial of lower degree, in which case it is no longer an exact fit; we will consider this next; returning to the first example of the quartic \(y = f(x)\).

m = 1  # m is the maximum degree of the polynomial: m < n now
c_m = polyfit(x_nodes, y_nodes, m)
print(f"The coefficients of P are {c_m}")

P_m_x = polyval(c_m, x)

# Try to print P(x) again:
print(f"P_{m}(x) = ", end="")
for j in range(m-1):
    print(f"{c_m[j]:.4}x^{m-j} + ", end="")
print(f"{c_m[-2]:.4}x + {c_m[-1]:.4}")
print(f"The values of P(x_i) are {P_values}")

figure(figsize=[12,8])
title("With $f(x)$")
plot(x, f(x), label="$y=f(x)$")
plot(x_nodes, y_nodes, "*", label="nodes")
plot(x, P_m_x, label=f"y=$P_{m}(x)$")
legend()
grid(True)
The coefficients of P are [ 97.91428571 -74.05714286]
P_1(x) = 97.91x + -74.06
The values of P(x_i) are [0.36787944 0.71653131 1.39561243 2.71828183]
../_images/polynomial-collocation+approximation-python_26_1.png
P_error = f(x) - P_m_x
figure(figsize=[12,8])
title("Error in $P_1(x)$ for $f(x)$")
plot(x, P_error, label="y=f(x)")
plot(x_nodes, zeros_like(x_nodes), "*")
grid(True)
../_images/polynomial-collocation+approximation-python_27_0.png

13.6. What is the degree \(m\) polynomial given by polyfit(x, y, m)? Least squares fitting.

Another form of polynomial approximation is to find the polynomial of degree at most \(m\) that gives the “best fit” to more data: \(n > m\) points. Then one cannot demand zero error at each point, and so instead aim to minimize some oveall measure of error.

Given a function \(P(x)\) intended to approximately fit a set of data points \((x_i, y_i)\), the approximation errors are the values \(E_i = y_i - P(x_i)\) at each node. Considerening these as forming a vector \(e\), two error minimization strategies are most often used:

  1. Minimize the worst case error, by mimimizing the largest of the absolute errors \(|E_i|\) — this is choosing \(P\) to minimize \(\| E \|_\infty\), often called Chebychev approximation.

  2. Minimize the average error in the root-mean-square sense — this is minimizing the Euclidean norm \(\| E \|_2\).

This time, the Euclidean norm or RMS error method is by far the easiest, giving a simple linear algebra solution.

For now, a quick sketch of the method; the justification will come soon.

If above we seek a polynomial of degree at most \(m\) with \(m \leq n\), we still get a system of equations \(V c = y\), but now \(V\) is \((n+1) \times (m+1)\), \(c\) is \(m+1\) and \(y\) is \(n+1\), so the system is overdetermined.

At this point the “+1” bits are annoying, so we will change notation and consider a more general setting:

Given a \(n \times m\) matrix \(A\), \(n \geq m\), and an \(n\)-vector \(b\), seek the \(m\)-vector \(x\) that is the best approximate solution of \(Ax = b\) in the sense that the residual \(Ax - b\) is as small as possible. It turns out that the solution comes from multiplying through by \(A^T\), getting

\[(A^T A) x = A^T b\]

which is now an \(m \times m\) system of equations with a unique solution — and it is the “right” one!


This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International