2.6. The Convergence Rate of Newton’s Method#

References:

  • Section 1.4.1 Quadratic Convergence of Newton’s Method in [Sauer, 2019].

  • Theorem 2.9 in Section 2.4 Error Analysis of Iterative Methods in [Burden et al., 2016], but done quite differently.

Jumping to the punch line, we will see that when the iterates \(x_k\) given by Newton’s method converge to a simple root \(r\) (that is, a solution of \(f(r) = 0\) with \(f'(r) \neq 0\)) then the errors \(E_k = x_k - r\) satisfy

\[E_{k+1} = O(E_k^2) \text{ and therefore } E_{k+1} = o(E_k)\]

In words, the error at each iteration is of the order of the square of the previous error, and so is small of order the previous error.

(Yes, it this a slight abuse of the notation as defined above, but all will become clear and rigorous soon.)

The first key step is getting a recursive relationship between consecutive errors \(E_k\) and \(E_{k+1}\) from the recursion formula for Newton’s method,

\[x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}.\]

Start by subtracting \(r\):

\[E_{k+1} = x_{k+1} - r = x_k - \frac{f(x_k)}{f'(x_k)} - r = E_k - \frac{f(x_k)}{f'(x_k)}\]

The other key step is to show that the two terms at right are very close, using the linearization of \(f\) at \(x_k\) with the error \(E_k\) as the small term \(h\), and noting that \(r = x_k - E_k\):

\[0 = f(r) = f(x_k - E_k) = f(x_k) - f'(x_k) E_k + O(E_k^2)\]

Solve for \(f(x_k)\) to insert into the numerator above: \(f(x_k) = f'(x_k) E_k + O(E_k^2)\). (There is no need for a minus sign on that last term; big-O terms can be of either sign, and this new one is a different but still small enough quantity!)

Inserting above,

\[ E_{k+1} = E_k - \frac{f'(x_k) E_k + O(E_k^2)}{f'(x_k)} = E_k - E_k + \frac{O(E_k^2)}{f'(x_k)} = \frac{O(E_k^2)}{f'(x_k)} \to \frac{O(e_k^2)}{f'(r)} = O(E_k^2) \]

As \(k \to \infty\), \(f'(E_k) \to f'(r) \neq 0\), so the term at right is still no larger than a multiple of \(E_k^2\): it is \(O(E_k^2)\), as claimed.

If you wish to verify this more carefully, note that

  • this \(O(E_k^2)\) term is no bigger than \(\frac{M}{2} E_k^2\) where \(M\) is an upper bound on \(|f''(x)|\), and

  • once \(E_k\) is small enough, so that \(x_k\) is close enough to \(r\), \(|f'(x_k)| \geq |f'(r)|/2\).

Thus the term \(\displaystyle \frac{O(E_k^2)}{f'(x_k)}\) has magnitude no bigger than \(\displaystyle \frac{M/2}{|f'(r)|/2} E_k^2 = \frac{M}{|f'(r)|} E_k^2\), which meets the definition of being of order \(E_k^2\).

A more careful calculation actually shows that

\[ \lim_{k \to \infty} \frac{|E_{k+1}|}{E_k^2} = \left| \frac{f''(r)}{2 f'(r)} \right|, \]

which is the way that this result is often stated in texts. For either form, it then easily follows that

\[\lim_{k \to \infty} \frac{|E_{k+1}|}{|E_k|} = 0,\]

giving the super-linear convergence already seen using the Contraction Mapping Theorem, now restated as \(E_{k+1} = o(E_k)\).

A Practical error estimate for fast-converging iterative methods#

One problem for Newton’s Method (and many other numerical methods we will see) is that there is not a simple way to get a guaranteed upper bound on the absolute error in an approximation. Our best hope is finding an interval that is guaranteed to contain the solution, as the Bisection Method does, and we can sometimes also do that with Newton’s Method for a real root. But that approach fails as soon as the solution is a complex number or a vector.

Fortunately, when convergnce is “fast enough” is some sense, the following heuristic or “rule of thumb” applies in many cases:

The error in the latest approximation is typically smaller than the difference between the two most recent approximations.

When combined with the backward error, this can give a fairly reliable measure of accuracy, and so can serve as a fairly reliable stopping condition for the loop in an iterative calculation.

When is a fixed point iteration “fast enough” for this heuristic?#

This heuristic can be shown to be reliable in several important cases:

Proposition 2.4

For the iterations \(x_k\) given by a contraction mapping that has \(C \leq 1/2\),

\[|E_k| \leq |x_k - x_{k-1}|,\]

or in words the error in \(x_k\) is smaller than the change from \(x_{k-1}\) to \(x_k\), so the above guideline is valid.

Proposition 2.5

For a super-linearly convergent iteration, eventually \(|E_{k+1}|/|E_{k}| < 1/2\), and from that point onwards in the iterations, the above applies again.

I leave verification as an exercise, or if you wish, to discuss in class.