3.10. Solving Nonlinear Systems of Equations by generalizations of Newton’s Method — a brief introduction#

References:

  • Section 2.7 Nonlinear Systems of Equations of [Sauer, 2019]; in particular Sub-section 2.7.1 Multivariate Newton’s Method.

  • Chapter 10 Numerical Solution of Nonlinear Systems of Equations of [Burden et al., 2016]; in particular Sections 10.1 and 10.2.

Background#

A system of simultaneous nonlinear equations

\[ F(x) = 0, \; F:\mathbb{R}^n \to \mathbb{R}^n \]

can be solved by a variant of Newton’s method.

Remark 3.22 (Some notation)

For the sake of emphasise analogies between results for vector-valued quantities and what we have already seen for real- or compex-valued quantities, several notational choices are made in thes notes:

  • Using the same notation for vectors as for real or complex numbers (no over arrows or bold face).

  • Often denoting the derivative of function \(f\) as \(Df\) rather than \(f'\), and higher derivatives as \(D^p f\) rather than \(f^{(p)}\). Partial derivatives of \(f(x, y, \dots)\) are \(D_x f\), \(D_y f\), etc., and for vector arguments \(x = (x_1, \dots , x_j, \dots x_n)\), they are \(D_{x_1} f, \dots , D_{x_j} f , \dots , D_{x_n} f\), or more concisely, \(D_1 f \dots , D_j f , \dots ,D_n f\). (This also fits better with Julia code notation — even for partial derivatives.)

  • Subscripts will mostly be reserved for components of vectors, labelling the terms of a sequence with superscripts, \(x^{(k)}\) and such.

  • Explicit division is avoided.

However, I use capital letters for vector-valued functions, for analogy to the use of capital letter for matrices.

Rewriting Newton’s method according to this new style, $\(x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{Df(x^{(k)})}\)$

or to avoid explicit division and introducing the useful increment \(\delta^{(k)} := x^{(k+1)} - x^{(k)}\),

\[Df(x^{(k)}) (\delta^{(k)}) = f(x^{(k)}), \quad x^{(k+1)} = x^{(k)} + \delta^{(k)}.\]

Newton’s method iteration formula for systems#

For vector valued functions, we will see in a while that an analogous result is true:

\[ (D \textbf{F}(x^{(k)}) ({\delta}^{(k)}) = \textbf{F}(x^{(k)}), \quad x^{(k+1)} = x^{(k)} + \delta^{(k)} \]

where \(DF(x)\) is the \(n \times n\) matrix of all the partial derivatives \((D_{x_j} F_i)(x)\) or \((D_j F_i)(x)\), where \(x = (x_1, x_2, \dots, x_n).\)

Justification: linearization for function of several variables#

To justify the above result, we need at least a case of Taylor’s Theorem for functions of several variables, for both \(f:\mathbb{R}^n \to \mathbb{R}\) and \(F:\mathbb{R}^n \to \mathbb{R}^n\); just for linear approximations. This material from multi-variable calculus will be reviewed when we need it.

Warning: although mathematically this can be written with matrix inverses as

\[ \textbf{X}^{(k+1)} = \textbf{X}^{(k)} - (D \textbf{F}(\textbf{X}^{(k)})^{-1}(\textbf{F}(\textbf{X}^{(k)}), \]

evaluation of the inverse is in general about three times slower than solving the linear system, so is best avoided. (We have seen a good compromise; Solving Ax = b with LU factorization the LU factorization of a matrix.)

Even avoiding matrix inversion, this involves repeatedly solving systems of \(n\) simultaneous linear equations in \(n\) unknowns, \(A x = b\), where the matrix \(A\) is \(D\textbf{F}(x^{(k)})\), and that will be seen to involve about \(n^3/3\) arithmetic operations.

It also requires computing the new values of these \(n^2\) partial derivatives at each iteration, also potentially with a cost proportional to \(n^3\).

When \(n\) is large, as is common with differential equations problems, this factor of \(n^3\) indicates a potentially very large cost per iteration, so various modifications have been developed to reduce the computational cost of each iteration (with the trade-off being that more iterations are typically needed): so-called quasi-Newton methods.