3. Simultaneous Linear Equations, Part 7: Faster Methods for Solving \(Ax = b\) for Tridiagonal and Banded matrices, and Strict Diagonal Dominance

Version of April 13, 2021

Reference:

Section 6.6 Special Types of Matrices in Burden&Faires, the sub-sections on Band Matrices and Tridiagonal Matrices.

3.1. Tridiagonal systems

Differential equations often lead to the need to solve systems of equations \(T x = b\) where the matrix \(T\) is tri-diagonal: the only non-zero elements are on the main diagonal and the diagonals adjacent to it on either side, so that \(T_{i,j} = 0\) if \(|i - j| > 1\). That is, the system looks like:

\[\begin{split} T x = \left[\begin{array}{cc} d_1 & u_1\\l_1 & d_2 & u_2 &\\& l_2 & d_3 & u_3\\&& \ddots & \ddots & \ddots\\&&& l_{n-2} & d_{n-1} & u_{n-1}\\&&&& l_{n-1} & d_{n}\end{array}\right] \left[\begin{array}{c}x_1\\x_2\\\vdots\\x_n \end{array}\right] = \left[\begin{array}{c}b_1\\b_2\\\vdots\\b_n\end{array}\right] \end{split}\]

with all “missing” entries being zeros. The notation used here suggests one efficient way to store such a matrix: as three 1D arrays \(d\), \(l\) and \(u\).

(Such equations also arise in cubic spline interpolation.)

It can be verified that LU factorization preserves all the non-zero values, so that the Doolittle algorithm — if it succeeds without any division by zero — gives \(T = L U\) with the form $\( L = \left[\begin{array}{cc} 1\\ L_1 & 1 \\ & L_2 & 1 \\ && \ddots & \ddots \\ &&& L_{n-2} & 1 \\ &&&& L_{n-1} & 1 \end{array}\right], \; U = \left[\begin{array}{cc} D_1 & u_1\\ & D_2 & u_2 \\ && D_3 & u_3\\ &&& \ddots & \ddots \\ &&&& D_{n-1} & u_{n-1} \\ &&&&& D_n \end{array}\right] \)\( Note that the first non-zero element in each column is unchanged, as with a full matrix, but now it means that the upper diagonal elements \)u_i$ are unchanged.

Again, one way to describe and store this information is with just the two new 1D arrays \(L\) and \(D\), along with the unchanged array \(u\).

3.2. Algorithms

3.2.1. LU factorization

\(D_1 = d_1\)
for i from 2 to n
\(\quad L_{i-1} = l_{i-1}/D_{i-1}\)
\(\quad D_{i} = d_{i} - L_{i-1} u_{i-1}\)
end for

3.2.2. Forward substitution

\(c_1 = b_1\)
for i from 2 to n
\(\quad c_{i} = b_{i} - L_{i-1} c_{i-1}\)
end for

3.2.3. Backward substitution

\(x_n = c_n/D_n\)
for i from n-1 down to 1
\(\quad x_{i} = (c_{i} - u_{i} x_{i+1})/D_i\)
end for

3.3. Generalizing to Banded Matrices

As we have seen, approximating derivatives to higher order of accuracy and approximating derivatives of order greater than two requires more than three nodes, but the locations needed are all close to the ones where the derivative is being approximated. For example, the simplest symmetric approximation of the fourth derivative \(D^4 f(x)\) used values from \(f(x-2h)\) to \(f(x+2h)\). Then row \(i\) of the corresponding matrix has all its non-zero elements at locations \((i,i-2)\) to \((i, i+2)\): the non-zero elements lie in the narrow “band” where \(|i - j| \leq 2\), and thus on five “central” diagonals.

This is a penta-digonal matrix, and an example of the larger class of banded matrices: ones in which all the non-zero elements have indices \( -p \leq j - i \leq q\) for \(p\) and \(q\) smaller than \(n\) — usually far smaller; \(p = q = 2\) for a penta-digonal matrix.

3.3.1. Recap: the general Doolittle algorithm for computing an LU factorization

The top row is unchanged:
for j from 1 to n
\(\quad\) \(u_{1,j} = a_{1,j}\)
end for
The left column requires no sums:
for i from 2 to n
\(\quad\) \(l_{i,1} = a_{i,1}/u_{1,1}\)
end for

The main loop: for k from 2 to n
\(\quad\) for j from k to n
\(\quad\quad\) \(u_{k,j} = a_{k,j} - \sum_{s=1}^{k-1} l_{k,s}u_{s,j}\)
\(\quad\) end for
\(\quad\) for i from k+1 to n
\(\quad\quad\) \(l_{i,k} = \left( a_{i,k} - \sum_{s=1}^{k-1} l_{i,s}u_{s,k} \right)/u_{k,k}\)
\(\quad\) end for
end for

3.3.2. Eliminating redundant calculation in the above

With a banded matrix, many of the entries at right are zero, particularly in the two sums, which is where most of the operations are. Thus we can rewrite, exploiting the fact that all elements with indices \(j-i < -p\) or \(j-i > q\) are zero. To start with, the top diagonal is not modified, as already noted for the tridiagonal case: \(u_{k,k+q} = a_{k,k+q}\) for \(1 \leq k \leq n-q\).

3.4. Algorithm for LU factorization of a banded matrix

The top row is unchanged:
for j from 1 to 1+q
\(\quad\) \(u_{1,j} = a_{1,j}\)
end for
The top non-zero diagonal is unchanged:
for k from 1 to n - q
\(\quad\) \(u_{k,k+q} = a_{k,k+q}\)
end for
The left column requires no sums:
for i from 2 to 1+p
\(\quad\) \(l_{i,1} = a_{i,1}/u_{1,1}\)
end for
The main loop:
for k from 2 to n
\(\quad\) for j from k to min(n, k+q-1)
\(\quad\quad\) \(u_{k,j} = a_{k,j} - \displaystyle\sum_{s=max(1, k-p, j-q)}^{k-1} l_{k,s}u_{s,j}\)
\(\quad\) end for
\(\quad\) for i from k+1 to min(n,k+p-1)
\(\quad\quad\) \(l_{i,k} = \left( a_{i,k} - \displaystyle\sum_{s=max(1,i-p,k-q)}^{k-1} l_{i,s}u_{s,k} \right)/u_{k,k}\)
\(\quad\) end for
end for

3.5. The common symmetrical case, \(p=q\)

It is common for a banded matrix to have equal band-width on either side, \(p=q\), as with tridiagonal and pentadiagonal matrices. Then the algorithm is somewhat simpler:

The top row is unchanged:
for j from 1 to 1+p
\(\quad\) \(u_{1,j} = a_{1,j}\)
end for
The top non-zero diagonal is unchanged:
for k from 1 to n - p
\(\quad\) \(u_{k,k+p} = a_{k,k+p}\)
end for
The left column requires no sums:
for i from 2 to 1+p
\(\quad\) \(l_{i,1} = a_{i,1}/u_{1,1}\)
end for

The main loop:
for k from 2 to n
\(\quad\) for j from k to min(n, k+p-1)
\(\quad\quad\) \(u_{k,j} = a_{k,j} - \displaystyle\sum_{s=max(1, j-p)}^{k-1} l_{k,s}u_{s,j}\)
\(\quad\) end for
\(\quad\) for i from k+1 to min(n,k+p)
\(\quad\quad\) \(l_{i,k} = \left( a_{i,k} - \displaystyle\sum_{s=max(1,i-p)}^{k-1} l_{i,s}u_{s,k} \right)/u_{k,k}\)
\(\quad\) end for
end for

3.6. When is it safe to do without pivoting: Strict Diagonal Dominant Matrices

These algorithms for banded matrices do no pivoting, and that is highly desirable, because pivoting creates non-zero elements outside the “band” and so can force one back to the general algorithm. Fortunately one characteristic of such systems makes the safe, and it is quite common for systems that come from approximation of second derivatives.

Definition A \(n \times n\) matrix \(A\) is strictly diagonally dominant [SDD] if for every row \(i\), \(1 \leq i \leq n\), $\( |a_{i,i}| > \sum_{1 \leq j \leq n, j \neq i} |a_{i,j}| \)$

That is each diagonal element is greater in magnitude that the sum of the magnitudes of all other entries in the same row.

Note: If the corresponding non-strict in equality holds, the matrix is called diagonally dominant.

Theorem 1. For any strictly diagonally dominant tridiagonal matrix \(A\), each of the intermediate matrices \(A^{(k)}\) given by the naive Gaussan elimination algorithm is also strictly diagonally dominant, and so the final upper triangular matrix \(U\) is. In particular, all the diagonal elements \(a_{i,i}^{(k)}\) and \(u_{i,i}\) are non-zero, so no division by zero occurs in any of these algorithms, including the backward substitution solving for \(x\) in \(Ux = c\).

As a consequence the Doolittle factorization also has \(u_{i,i} \neq 0\) and those are the only values ever divided by in either the Doolittle algorithm or in forward and backward substutition, so one never encounters division by zero.

This says more: at each stage, the pivot element \(a_{i,i}^{(k)}\) has a larger absolute value than all other elements on its row: although this does not guarantee that it is larger than any candidate pivot elements below it in the column, it does indicate that one is probably at a safe distance from very small denominators and loss of significance.

3.7. The Crout version of LU factorization

There is a generalization and improvement, but it requires a twist (actualy, a reflection):

Theorem 2. If the transpose \(A^T\) is SDD, so is each intermediate matrix \(A^{(k)}\) in naive Gaussian elimination and thus the Doolittle factorization succeeds with \(U\), also SDD.

However, we want to use this for the more natural case of \(A\) itself being SDD. Here is one way to proceeed.

Let \(B = A^T\), so that \(B^T = A\) is SDD and thus there is a Doolittle LU factorization \(B = L_B U_B\).

Then \(A = (L_B U_B)^T = (U_B)^T (L_B)^T\). Note that \(L := (U_B)^T\) is lower triangular and \(U = (L_B)^T\) is unit upper triangular, so that this given an LU factorization \(A = L U\), differing from the Doolittle one in having the ones on the upper diagonal.

This is the Crout decomposition, and the Doolittle algorithm can be easily “flipped” to compute this, and the forward and backward subsitution algorithms are also easily adapted.

Theorem 3. If \(A\) is SDD, it has a Crout decomposition \(A = L U\): \(L\) lower triangular, \(U\) unit upper triangular.

Note also that \(L\) is the transpose of the SDD matrix \(U_B\), so it is strictly diagonally dominant by column: each diagonal element \(l_{i,i}\) has magnitude greater than the sum of the magnitudes of all element below it in the same column:

\[ |l_{i,i}| > \sum_{j>i} |l_{j,i}| \]

These reassuringly big diagonal elements \(l_{i,i}\) are the only values divided by in forward substitution, and indeed also in the Crout algorithm: in the mirror-image world of the Crout algorithm, these are the pivot elements.


This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International