5. Measures of Error and Order of Convergence¶

Last Revised on Monday January 25

References:

Section 1.3.1 Forward and backward error in Sauer
Section 2.4 of Burden&Faires; just the part on order of convergence.

These notes cover a number of small topics:

Measures of error: absolute, relative, forward, backward, etc.
Measuring the rate of convergence of a sequence of approximations.
Big-O and little-o notation for describing how small a quantity (usually an error) is.

5.1. Error measures¶

Consider a quantity \(\tilde{x}\) considered as an approximation of an exact value \(x\). (This can be a number or a vector.)

Definition 1: Error The error in \(\tilde{x}\) is \(\tilde{x} - x\) (or \(x - \tilde{x}\); different sources use both versions and either is fine so long as you are consistent.)

Definition 2: Absolute Error The absolute error is the absolute value of the error: \(|\tilde{x} - x|\). For vector quantities this means the norm \(\| \tilde{x} - x \|\), and it can be any norm, so long as we again choose one and use it consistently. Two favorites are the Euclidean norm \(\| x \| = \sqrt{\sum |x_i|}\), denoted \(\| x \|_2\), and the maximum norm (also mysteriously known at the infinity norm):

\[\| x \|_\text{max} = \| x \|_\infty = \max_i |x_i|.\]

For real-valued quantities, the absolute error is related to the number of correct decimal places: \(p\) decimal places of accuracy corresponds roughly to absolute error no more than \(0.5 \times 10^{-p}\).

Definition 3: Relative Error The relative error is the ratio of the absolute error to the size of the exact quantity: \(\displaystyle\frac{| \tilde{x} - x |}{|x|}\) (again possibly with vector norms). This is often more relevant than absolute error for inherently positive quantities, but is obviously unwise where \(x = 0\) is a possibility. For real-valued quantities, this is related to the number of significant digits: accuracy to \(p\) significant digits corresponds roughly to relative error no more than \(0.5 \times 10^{-p}\).

When working with computer arithmetic, \(p\) significant bits corresponds to relative error no more than \(2^{-(p+1)}\).

5.1.1. Backward error (and forward error)¶

An obvious problem is that we usually do not know the exact solution \(x\), so cannot evaluate any of these; instead we typically seek upper bounds on the absolute or relative error. Thus, when talking of approximate solutions to an equation \(f(x) = 0\) the concept of backward error can be very useful, for example as a step in getting bounds on the size of the error.

Definition 4: Backward Error

The Backward Error in \(\tilde{x}\) as an approxiate solution to the equation \(f(x) = 0\) is \(f(\tilde{x})\); the amount by which function \(f\) would have to be changed in order for \(\tilde{x}\) to be an exact root.

For the case of solving simultaneous linear equations in matrix-vector form \(A x = b\), this is \(b - A \tilde{x}\), also known as the residual.

Definition 5: Absolute Backward Error

The absolute backward error is — as you might have guessed — the absolute value of the backward error: \(|f(\tilde{x})|\). This is sometimes also called simply the backward error. (The usual disclaimer about vector quantities applies.)

For the case of solving simultaneous linear equations in matrix-vector form \(A x = b\), this is \(\| b - A \tilde{x} \|\), also known as the residual norm.

Notes

One obvious advantage of the backward error concept is that you can actually evaluate it without knowing the exact solution \(x\).
Also, one significance of backward error is that if the values of \(f(x)\) are only known to be accurate within an absolute error of \(E\) then any approximation with absolute backward error less than \(E\) could in fact be exact, so there is no point in seeking greater accuracy.
The names forward error and absolute forward error are sometimes used as synonyms for error etc. as defined above, when they need to be distinguished from backward errors.

5.2. Order of convergence of a sequence of approximations¶

We have seen that for the sequence of approximations \(x_k\) to a quantity \(x\) given by the fixed point iteration \(x_{k+1} = g(x_k)\), the absolute errors \(E_k := |x_k - x|\) typically have

\[\frac{E_{k+1}}{E_k} \to C = |g'(x)|.\]

so that eventually the errors diminsh in a roughly geometric fashion: \(E_k \approx K C^k\). This is called linear convergence.

Aside: Why “linear” rather than “geometric”? Because there is an approximately linear relationship between consecutive error values,

\[E_{n+1} \approx C E_n.\]

This is a very common behavior for iterative numerical methods, but we will also see that a few methods do even better; for example, when Newton’s method converges to a simple root \(r\) of \(f\) (one with \(f'(r) \neq 0\))

\[E_{k+1} \approx C E_k^2\]

This is called quadratic convergence. More generally, convergence of order \(p\) means that

\[E_{k+1} \approx C E_k^p, \text{ or more precisely, } \lim_{k \to \infty} \frac{E_{k+1}}{E_k^p} \text{ is finite.}\]

We have already observed experimentally the intermediate result that “\(C = 0\)” for Newtons’s method in this case; that is,

\[\frac{E_{k+1}}{E_k} \to 0.\]

This is called super-linear convergence and includes any situation with order of convergence \(p > 1\).

For most practical purposes, if you have established super-linear convergence, you can be happy, and not worry much about refinements like the particular order \(p\).

5.3. Big-O and little-o notation¶

Consider the error formula for approximation of a function \(f\) with the Taylor polynomial of degree \(n\), center \(a\):

\[ | f(a+h) - T_n(h) | \leq \frac{M_{n+1}}{(n+1)!} |h|^{n+1} \text{ where } M_{n+1} = \max |f^{(n+1)}(x)|. \]

Since the coefficient of \(h^{n+1}\) is typicaly not known in practice, it is wise to focus on the power law part, and for this the “big-O” and little-o” notation is convenient.

If a function \(E(h)\) goes to zero at least as fast as \(h^p\), we say that it is of order \(h^p\), written \(O(h^p)\).

More precisely, \(E(h)\) is no bigger than a multiple of \(h^p\) for \(h\) small enough; that is, there is a constant \(C\) such that for some positive number \(\delta\)

\[\frac{|E(h)|}{|h|^p} \leq C \text{ for } |h| < \delta.\]

Another way to say this is in terms of the lim-sup, if you have seen that jargon:

\[\limsup_{h \to 0} \frac{|E(h)|}{|h|^p} \text{ is finite.}\]

This can be used to rephrase the above Taylor’s theorem error bound as

\[f(x) - T_n(x) = O(|x-a|^{n+1})\]

or

\[f(a+h) - T_n(h) = O(h^{n+1}),\]

and for the case of the linearization,

\[ f(a+h) - L(x) = f(a+h) - (f(a) + f'(a) h) = O(h^2). \]

5.3.1. Little-o notation, for “negligibly small terms”¶

Sometimes it is enough to say that some error term is small enough to be neglected, at least when \(h\) is close enough to zero. For example, with a Taylor series we might be able to neglect the powers of \(x-a\) or of \(h\) higher than \(p\).

We will thus say that a quantity \(E(h)\) is small of order \(h^p\), written \(o(h^p)\) when

\[\lim_{h \to 0} \frac{|E(h)|}{|h|^p} = 0.\]

Note the addition of the word small compared to the above description of the big-O case!

With this, the Taylor’s theorem error bound can be stated as

\[f(x) - T_n(x) = o(|x-a|^n),\]

or

\[f(a+h) - T_n(h) = o(h^n),\]