26. Simultaneous Linear Equations, Part 5: Error bounds for linear algebra, condition numbers, matrix norms, etc.

Version of April 13, 2021

References:

  • Section 2.3.1 Error Magnification and Condition Number in Sauer

  • Section 7.5 Error Bounds and Iterative Refinement in Burden&Faires — but the last part, on Iterative Refinement, is not relevant here.

  • Section 8.4 in Chenney&Kincaid

26.1. Residuals, backward errors, forward errors, and condition numbers

For an approximation \(x_a\) of the solution \(x\) of \(A x = b\), the residual \(r = A x_a - b\) measures error as backward error, often measured by a single number, the residual norm \(\| A x_a - b \|\). Any norm could be used, but the maximum norm is usualt preferred, for reasons that we will see soon.

The corresponding (dimensionless) measure of relative error is defined as

\[\frac{\|r\|}{\|b\|}.\]

However, these can greatly underestimate the forward errors in the solution: the absolute error \(\|x - x_a\|\) and relative error

\[Rel(x_a) = \frac{\|x - x_a\|}{\| x \|}\]

To relate these to the residual, we need the concepts of a matrix norm and the condition number of a matrix.

26.2. Matrix norms induced by vector norms

Given any vector norm \(\| \cdot \|\) — such as the maximum (“infinity”) norm \(\| \cdot \|_\infty\) or the Euclidean norm (length) \(\| \cdot \|_2\) — the correponding induced matrix norm is

\[ \| A \| := \max_{x \neq 0} \frac{\| Ax \|}{\| x \|}, = \max_{\|x\|=1} \| Ax \| \]

This maximum exists for ethe rof these vector norms, and for the infinity norm there ia an explicit formula for it: for any \(m\times n\) matrix,

\[ \|A\|_\infty = \max_{i=1}^m \sum_{j=1}^n |a_{ij}| \]

(On the other hand, it is far harder to compute the Euclidean norm of a matrix: the formula requires computing eigenvalues.)

Note that when the matrix is a vector considered as a matrix with a single column — so \(n=1\) — the sum goes away, and this agrees with the infinity vector norm. This allows us to consider vectors as being just matrices with a single column, which we will often do from now on.

26.3. Properties of (induced) matrix norms

These induced matrix norms have many properties in common with Euclidean length and other vector norms, but there can also be products, and then one has to be careful.

  1. \(\|A\| \geq 0\) (positivity)

  2. \(\| A \| = 0\) if and only if \(A = 0\) (definiteness)

  3. \(\| c A \| = |c| \, \|A\|\) for any constant \(c\) (absolute homogeneity)

  4. \(\| A + B \| \leq \| A \| + \| B \|\) (sub-additivity or the triangle inequality),
    and when the product of two matrices makes sense (including matrix-vector products),

  5. \(\| A B \| \leq \| A \| \, \| B \|\) (sub-multiplicativity)

Note the failure to always have equality with products. Indeed one can have \(A B = 0\) with \(A\) and \(B\) both non-zero, such as when \(A\) is a singular matrix and \(B\) is a null-vector for it.

Aside: There are other matrix norms of use in some contexts, in particular the Frobenius norm. Then the above properties are often used to define what it is to be a matrix form, much as the the first four define what it is to be a vector norm.

26.3.1. Python Note: numpy.linalg.norm

Python package Numpy provides the function numpy.linalg.norm for evaluating matrix norms. The default usage numpy.linalg.norm(A) computes \(\| A \|_2\), which one can also specify explicitly with numpy.linalg.norm(A, 2); to get the maximum norm \(\| A \|_\infty\), one uses numpy.linalg.norm(A, numpy.inf).

26.4. Relative error bound and condition number

It can be proven that, for any choice of norm,

\[\text{Rel}(x_a) = \frac{\|x - x_a\|}{\| x \|} \leq \|A\|\|A^{-1}\| \frac{\|r\|}{\|b\|},\]

where the last factor \(\displaystyle \frac{\|r\|}{\|b\|}\) is the relative backward error.

Since we can (though often with considerable effort, due to the inverse!) compute the right-hand side when the infinity norm is used, we can compute an upper bound on the relative error. From this, an upper bound on the absolute error can be computed if needed.

The growth factor between the relative backward error measured by the residual and the relative (forward) error is called the condition number, \(K(A)\):

\[\kappa(A) := \|A\| \|A^{-1}\|\]

so that the above bound on the relative error can be restated as

\[\text{Rel}(x_a) = \frac{\|x - x_a\|}{\| x \|} \leq \kappa(A) \frac{\|r\|}{\|b\|}\]

Actually there is a different condition number for each choice of norm; we work with

\[\kappa_\infty(A) := \|A\|_\infty \|A^{-1}\|_\infty\]

Note that for a singular matrix, this is undefined: we can intuitively say that the condition number is then infinite.
At the other extreme, the identity matrix \(I\) has norm 1 and condition number 1 (using any norm), and this is the best possible because in general \(\kappa(A) \geq 1\). (This follows from property 5, sub-multiplicativity.)

26.4.1. Aside: estimating \(\|A^{-1}\|_\infty\) and thence the condition number, and numpy.linalg.cond

In Python, good approximations of condition numbers are given by the function numpy.linalg.cond.
As with numpy.linalg.norm, the default numpy.linalg.cond(A) gives \(\kappa_2(A)\), based on the Euclidian length \(\| \cdot \|_2\) for vectors; to get the infinity norm version \(\kappa_\infty(A)\) use numpy.linalg.cond(A, numpy.inf).

This is not done exactly, since computing the inverse is a lot of work for large matrices, and good estimates can be got far more quickly. The basic idea is start with the formula

\[\| A^{-1} \| = \max_{\|x\|=1} \| A ^{-1} x \|\]

and instead compute the maximum over some finite selection of values for \(x\): call them \(x^{(k)}\). Then to evaluate \(y^{(k)} = A ^{-1} x^{(k)}\), express this through the equation \(A y^{(k)} = x^{(k)}\). Once we have an LU factorization for \(A\) (which one probably would have when exploring errors in a numerical solution of \(Ax = b\)) each of these systems can be solved relatively fast: Then

\[\| A^{-1} \| \approx \max_k \| y^{(k)} \|.\]

26.5. Well-conditioned and ill-conditioned problems and matrices

Condition numbers, giving upper limit on the ratio of forward error to backward error, measure the amplification of errors, and have counterparts in other contexts. For example, with an approximation \(r_a\) of a root \(r\) of the equation \(f(x) = 0\), the ratio of forward error to backward error is bounded by \(\displaystyle \max 1/| f'(x) | = \frac{1}{\min | f'(x) |}\), where the maximum only need be taken over an interval known to contain both the root and the approximation. This condition number becomes “infinite” for a multiple root, \(f'(r) = 0\), related to the problems we have seen in that case.

Careful calculation of an approximate solution \(x_a\) of \(Ax = b\) can often get a residual that is at the level of machine rounding error, so that roughly the relative backward error is of size comparable to the machine unit, \(u\). The condition number then guarantees that the (forward) relative error is no greater than about \(u \, \kappa(A)\).

In terms of significant bits, with \(p\) bit machine arithmetic, one can hope to get \(p - \log_2(\kappa(A))\) significant bits in the result, but can not rely on more, so one loses \(\log_2(\kappa(A))\) significant bits. Compare this to the observation that one can expect to lose at least \(p/2\) significant bits when using the approximation \(Df(x) \approx D_hf(x) - (f(x+h) = f(x))/h\).

A well-conditioned problem is one that is not too highly sensitive to errors in rounding or input data; for an eqution \(Ax = b\), this corresponds to the condition number of \(A\) not being to large; the matrix \(A\) is then sometimes also called well-conditioned. This is of course vague, but might typically mean that \(p - \log_2(\kappa(A))\) is a sufficient number of significant bits for a particular purpose.

A problem that is not deemed well-conditioned is called ill-conditioned, so that a matrix of uncomfortably large condition number is also sometimes called ill-conditioned. An ill-conditioned problem might still be well-posed, but just requiring careful and precise solution methods.

26.6. An important example: the Hilbert matrices

The \(n \times n\) Hilbert matrix \(H_n\) has elements

\[H_{i, j} = \frac{1}{i+j-1}\]

Or if we index from zero in Pythonic fashion, the entries are \(1/(1+i+j)\)

For example

\[\begin{split}H_4 = \left[ \begin{array}{cccc} 1 & 1/2 & 1/3 & 1/4 \\ 1/2 & 1/3 & 1/4 & 1/5 \\1/3 & 1/4 & 1/5 & 1/6 \\1/4 & 1/5 & 1/6 & 1/7 \end{array} \right]\end{split}\]

and for larger or smaller \(n\), one simply adds or remove rows below and columns at right.

Thee matrices arise in important situations like finding the polynomial of degree \(n-1\) that fits given data in the sense of minimizing the root-mean-square error — as we will discuss later in this course if there is time and interest.

Unfortunately as \(n\) increases the condition number grows rapidly, causing severe rounding error problems. To illustrate this, I will do something that one should usually avoid: compute the inverse of these matrices. This is also a case that shows th advatage of the LU factorization, since one compute thr inverse by succesively computing each column, by solving \(n\) different systems of equations, each with the same matrix \(A\) on the left-hand side.

import numpy as np
from numpy import inf
from numerical_methods_module import lu, forwardSubstitution, backwardSubstitution, rowReduce
from numpy.linalg import norm, cond
from numpy.random import random
def inverse(A):
    """Use sparingly; there is usually a way to avoid computing inverses that is faster and with less rounding error!"""
    n = len(A)
    A_inverse = np.zeros_like(A)
    (L, U) = lu(A)
    for i in range(n):
        b = np.zeros(n)
        b[i] = 1.0
        c = forwardSubstitution(L, b)
        A_inverse[:,i] = backwardSubstitution(U, c)
    return A_inverse
def hilbert(n):
    H = np.zeros([n,n])
    for i in range(n):
        for j in range(n):
            H[i,j] = 1.0/(1.0 + i + j)
    return H
for n in range(2,5):
    H_n = hilbert(n)
    print(f"H_{n} is")
    print(H_n)
    H_n_inverse = inverse(H_n)
    print("and its inverse is")
    print(H_n_inverse)
    print("to verify, their product is")
    print(H_n @ H_n_inverse)
    print()
H_2 is
[[1.         0.5       ]
 [0.5        0.33333333]]
and its inverse is
[[ 4. -6.]
 [-6. 12.]]
to verify, their product is
[[1. 0.]
 [0. 1.]]

H_3 is
[[1.         0.5        0.33333333]
 [0.5        0.33333333 0.25      ]
 [0.33333333 0.25       0.2       ]]
and its inverse is
[[   9.  -36.   30.]
 [ -36.  192. -180.]
 [  30. -180.  180.]]
to verify, their product is
[[ 1.00000000e+00  9.62193288e-16  1.40628250e-15]
 [ 0.00000000e+00  1.00000000e+00  0.00000000e+00]
 [-2.22044605e-17 -1.99840144e-15  1.00000000e+00]]

H_4 is
[[1.         0.5        0.33333333 0.25      ]
 [0.5        0.33333333 0.25       0.2       ]
 [0.33333333 0.25       0.2        0.16666667]
 [0.25       0.2        0.16666667 0.14285714]]
and its inverse is
[[   16.  -120.   240.  -140.]
 [ -120.  1200. -2700.  1680.]
 [  240. -2700.  6480. -4200.]
 [ -140.  1680. -4200.  2800.]]
to verify, their product is
[[ 1.00000000e+00  0.00000000e+00  4.54747351e-13 -2.27373675e-13]
 [ 0.00000000e+00  1.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-7.10542736e-15  0.00000000e+00  1.00000000e+00  0.00000000e+00]
 [ 0.00000000e+00 -5.68434189e-14  2.27373675e-13  1.00000000e+00]]

Note how the inverses have some surprisingly large elements; this is the matrix equivalent of a number being very close to zero and so with a very large reciprocal.

Since we have the inverses, we can compute the matrix norms of each \(H_n\) and its inverse, and thence their condition numbers; then this can be compared to the approximations of these condition numbers given by numpy.linalg.cond

for n in range(2,6):
    H_n = hilbert(n)
    print(f"H_{n} is")
    print(H_n)
    print(f"with infinity norm {norm(H_n, inf)}") 
    H_n_inverse = inverse(H_n)
    print("and its inverse is")
    print(H_n_inverse)
    print(f"with infinity norm {norm(H_n_inverse, inf)}") 
    print(f"Thus the condition number of H_{n} is {norm(H_n, inf) * norm(H_n_inverse, inf)}")
    print(f"For comparison, numpy.linalg.cond gives {cond(H_n, inf)}")
    print()
H_2 is
[[1.         0.5       ]
 [0.5        0.33333333]]
with infinity norm 1.5
and its inverse is
[[ 4. -6.]
 [-6. 12.]]
with infinity norm 18.000000000000007
Thus the condition number of H_2 is 27.00000000000001
For comparison, numpy.linalg.cond gives 27.00000000000001

H_3 is
[[1.         0.5        0.33333333]
 [0.5        0.33333333 0.25      ]
 [0.33333333 0.25       0.2       ]]
with infinity norm 1.8333333333333333
and its inverse is
[[   9.  -36.   30.]
 [ -36.  192. -180.]
 [  30. -180.  180.]]
with infinity norm 408.00000000000165
Thus the condition number of H_3 is 748.000000000003
For comparison, numpy.linalg.cond gives 748.0000000000028

H_4 is
[[1.         0.5        0.33333333 0.25      ]
 [0.5        0.33333333 0.25       0.2       ]
 [0.33333333 0.25       0.2        0.16666667]
 [0.25       0.2        0.16666667 0.14285714]]
with infinity norm 2.083333333333333
and its inverse is
[[   16.  -120.   240.  -140.]
 [ -120.  1200. -2700.  1680.]
 [  240. -2700.  6480. -4200.]
 [ -140.  1680. -4200.  2800.]]
with infinity norm 13619.999999996671
Thus the condition number of H_4 is 28374.999999993062
For comparison, numpy.linalg.cond gives 28374.99999999729

H_5 is
[[1.         0.5        0.33333333 0.25       0.2       ]
 [0.5        0.33333333 0.25       0.2        0.16666667]
 [0.33333333 0.25       0.2        0.16666667 0.14285714]
 [0.25       0.2        0.16666667 0.14285714 0.125     ]
 [0.2        0.16666667 0.14285714 0.125      0.11111111]]
with infinity norm 2.283333333333333
and its inverse is
[[ 2.500e+01 -3.000e+02  1.050e+03 -1.400e+03  6.300e+02]
 [-3.000e+02  4.800e+03 -1.890e+04  2.688e+04 -1.260e+04]
 [ 1.050e+03 -1.890e+04  7.938e+04 -1.176e+05  5.670e+04]
 [-1.400e+03  2.688e+04 -1.176e+05  1.792e+05 -8.820e+04]
 [ 6.300e+02 -1.260e+04  5.670e+04 -8.820e+04  4.410e+04]]
with infinity norm 413279.99999865814
Thus the condition number of H_5 is 943655.9999969361
For comparison, numpy.linalg.cond gives 943655.9999992514

Next, experiment with solving equations, to compare residuala with actual errors.

I will use the testing strategy of starting with a known solution \(x\), from which the right-hand side \(b\) is computed; then slight simulated error is introduced to \(b\). Rngin the os repeatedlt with use differnt random “errors”, to gin an ide of how the actual error

for n in range(2,6):
    print(f"{n=}")
    H_n = hilbert(n)
    #x = np.zeros(n)
    x = np.linspace(1.0, n, n)
    print(f"x is {x}")
    b = H_n @ x
    print(f"b is {b}")
    error_scale = 1e-8
    b_imperfect = b + 2.0 * error_scale * (random(n) - 0.5) # add random "errors" between -error_scale and error_scale
    print(f"b has been slightly changed to {b_imperfect}")
    (U, c) = rowReduce(H_n, b_imperfect)
    x_computed = backwardSubstitution(U, c)
    residual = b - H_n @ x_computed
    relative_backward_error = norm(residual, inf)/norm(b, inf)
    print(f"The residual maximum norm is {norm(residual, inf)}")
    print(f"and the relative backward error ||r||/||b|| is {relative_backward_error:0.4}")
    absolute_error = norm(x - x_computed, inf)
    relative_error = absolute_error/norm(x, inf)
    print(f"The absolute error is {absolute_error:0.4}")
    print(f"The relative error is {relative_error:0.4}")
    error_bound = cond(H_n, inf) * relative_backward_error
    print(f"For comparison, the relative error bound from the formula above is {error_bound:0.4}")
    print(f"\nBeware: the relative error is larger than the relative backward error by a factor {relative_error/relative_backward_error:0.8}")
    print()
n=2
x is [1. 2.]
b is [2.         1.16666667]
b has been slightly changed to [2.00000001 1.16666667]
The residual maximum norm is 6.413956921136332e-09
and the relative backward error ||r||/||b|| is 3.207e-09
The absolute error is 6.335e-09
The relative error is 3.167e-09
For comparison, the relative error bound from the formula above is 8.659e-08

Beware: the relative error is larger than the relative backward error by a factor 0.98764383

n=3
x is [1. 2. 3.]
b is [3.         1.91666667 1.43333333]
b has been slightly changed to [2.99999999 1.91666667 1.43333333]
The residual maximum norm is 7.406832036593869e-09
and the relative backward error ||r||/||b|| is 2.469e-09
The absolute error is 1.287e-06
The relative error is 4.289e-07
For comparison, the relative error bound from the formula above is 1.847e-06

Beware: the relative error is larger than the relative backward error by a factor 173.72107

n=4
x is [1. 2. 3. 4.]
b is [4.         2.71666667 2.1        1.72142857]
b has been slightly changed to [4.         2.71666668 2.10000001 1.72142857]
The residual maximum norm is 9.815670942714405e-09
and the relative backward error ||r||/||b|| is 2.454e-09
The absolute error is 3.499e-05
The relative error is 8.748e-06
For comparison, the relative error bound from the formula above is 6.963e-05

Beware: the relative error is larger than the relative backward error by a factor 3564.9129

n=5
x is [1. 2. 3. 4. 5.]
b is [5.         3.55       2.81428571 2.34642857 2.01746032]
b has been slightly changed to [5.         3.55       2.81428572 2.34642857 2.01746031]
The residual maximum norm is 5.082728016247984e-09
and the relative backward error ||r||/||b|| is 1.017e-09
The absolute error is 0.000315
The relative error is 6.3e-05
For comparison, the relative error bound from the formula above is 0.0009593

Beware: the relative error is larger than the relative backward error by a factor 61972.109

We see in these experiments that:

  • As the condition number increases, the relative error becomes increasingly larger than the backward error computed from the residual.

  • It is less than the above bound \(\displaystyle \text{Rel}(x_a) = \frac{\|x - x_a\|}{\| x \|} \leq \kappa(A) \frac{\|r\|}{\|b\|},\) and typically quite a bit less.


This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International