Solving Ax = b With Both Pivoting and LU Factorization
Contents
3.5. Solving \(Ax = b\) With Both Pivoting and LU Factorization#
References:
Section 2.4 The PA=LU Factorization of [Sauer, 2019].
Section 6.5 Matrix Factorizations of [Burden et al., 2016].
Section 8.1 Matrix Factorizations of [Chenney and Kincaid, 2012].
Introduction#
The last step in producing an algorithm for solving the general case of \(n\) simultaneous linear equations in \(n\) variables that is robust, efficient and with good control of rounding error is to combine the ideas of partial pivoting from Partial Pivoting and LU factorization from Solving Ax = b with LU factorization.
This is sometimes described in three parts:
permute (reorder) the rows of the matirx \(A\) by multiplying it at left by a suitable permutation matrix \(P\); one with a single “1” in each row and each column and zeros elsewhere;
Get the LU factorization of this matrix: \(PA = LU\).
To solve \(A x = b\)
Express as \(P A x = L U x = P b\) (which just involves computing \(Pb\), which reorders the elements of \(b\))
Solve \(L c = P b\) for \(c\) by forward substitution
Solve \(U x = c\) for \(x\) by backward substitution: as before, this gives \(L U x = L c = P b\) and \(L U x = P A x\), so \(PAx = Pb\); since a permutation matrix \(P\) is invertible (just unravel the row swaps), this ensures that \(Ax = b\).
This gives a nice formulas in terms of matrices; however we can describe it a bit more compactly and efficiently by just talking about the permutation of the rows, described by a permutation vector — an \(n\) component vector \(\pi = [\pi_1, \pi_2 , \dots, \pi_n]\) whose elements are the integers from 1 to \(n\) in some order. So that is how the algorithm will be described below.
(Aside: I use the conventional name \(\pi\) for a permutation vector, partly to distinguish from the notation \(p_i\) used for pivot rows; however, feel free to use the name \(p\) instead, especially in Julia code.)
A number of details of this sketch will now be filled in, including the very useful fact that the permutation vector (or matrix) can be contsructed “on the fly”, as rows are swapped in partial pivoting.
Row swapping is all you need#
Let us look at maximal element partial pivoting, but described in terms of the entries of the factors \(L\) and \(U\), and updating matrix \(A\) with a succession of row swaps.
(For now, I omit what happens to the right-hand side vector \(b\); that is where the permutation vector \(p\) will come in, as addressed below.)
What happens if pivoting occurs at some stage \(k\), with swapping of row \(k\) with a row \(p_k > 5\)?
One might fear that the process has to start again from the top using the modified version of matrix \(A\), but in fact all previous work can be reused, just swapping those rows “everywhere”.
Example: what happens at stage 5 (\(k=5\))?#
To see this with a concrete example consider what happens if at stage \(k=5\) we swap rows 5 and 10 of \(A\).
A) Firstly, what happens to matrix \(A\)?
The previous steps of the LU factorization process only involved entries of \(A\) in its first four rows and first four columns, and this row swap has no effect of them. Likewise, in row reduction, changes at and below row \(k=5\) have no effect on the first four rows of the row reduced form, \(U\).
Thus, the only change here is to swap the entries of \(A\) between rows 5 and 10. What is more, the subsequent calculations only involve columns of index \(j=5\) upwards, so in fact we only need to update those entries. This can be written as
Thus if we are working in Julia with \(A\) stored in an array, the update is the slice operation
( A[5, 5:end], A[10, 5:end] ) = ( A[10, 5:end], A[5, 5:end] )
B) Next, look at the work done so far on \(U\).
That just consists of the previous rows \(1 \leq i \leq 4\), and the swapping of rows 5 with 10 has no effect up there:
Values already computed in \(U\) are unchanged.
C) Finally, look at the work done so far on the multipiers \(l_{i,j}\); that is, matrix \(L\).
The values computed so far are the first four columns of \(L\); the multiples \(l_{i,j}\), \(1 \leq j \leq 4\) of row \(j\) subtracted from row \(i > j\). These do change: for example, the multiple \(l_{5,2}\) of row \(2\) is now subtracted from what was row 5 but is now row 10: thus, the new value of \(l_{10,2}\) is the previous value of \(l_{5,2}\).
Likewise, the same is true in reverse: the new value of \(l_{5,2}\) is the previous value of \(l_{10,2}\). This applies for all of the first four rows, so second index \(1 \leq j \leq 4\):
The entries of \(L\) computed so far are swapped between rows 5 and 10, leaving the rest unchanged.
As this is again only for some columns — the first four — the swaps needed are:
or in Julia’a slice notation for an array \(L\):
( L[5, 1:4], L[10, 1:4] ) = ( L[10, 1:4], L[5, 1:4] )
The general pattern#
The example above extends to all stages \(k\) of row reduction or computing the LU factorization of a row-permuted version of matrix \(A\), where we adjust the pivot element at position \((k, k)\) by first swapping row \(k\) with a row \(p_k, \geq k\). (Allowing that sometimes no swap is needed, so that \(p_k = k\).)
Gathering the key formulas above, this part of the algorithm is
for k from 1 to n-1
\(\quad\) Find the pivot row \(p_k, \geq k\).
\(\quad\) if \(p_k > k\)
\(\quad\quad\) Swap \(l_{k, j} \leftrightarrow l_{p_k, j}, \quad 1 \leq j < k \)
\(\quad\quad\) Swap \(a_{k, j} \leftrightarrow a_{p_k, j}, \quad k \leq j \leq n \)
\(\quad\) end
end
Pseudo-code for LU factorization with row swapping (first version)#
Here I also adopt slice notation; for example, \(a_{k, k:n}\) denotes the slice \([a_{k, k} \dots a_{k, n}]\).
(LU factorization with row swapping, I)
for k from 1 to n
\(\quad\) Find the pivot element:
\(\quad\) \(p = k\) \(\quad\) (p will be the index of the pivot row)
\(\quad\) for i from k+1 to n
\(\quad\)\(\quad\) if |u_{i, k}| > |u_{p, k}|
\(\quad\)\(\quad\)\(\quad\) p \(\leftarrow\) i
\(\quad\)\(\quad\) end
\(\quad\) end
\(\quad\) if p > k \(\quad\) (Swap rows)
\(\quad\)\(\quad\) \(l_{k, 1:k-1} \leftrightarrow l_{p, 1:k-1}\)
\(\quad\)\(\quad\) \(a_{k, k:n} \leftrightarrow a_{p, k:n}\)
\(\quad\) end
\(\quad\) for j from k to n \(\quad\) (Get the non-zero elements in row \(k\) of \(U\))
\(\quad\quad\) \(u_{k,j}=a_{k,j}-\sum_{s=1}^{k-1}l_{k,s}u_{s,j}\)
\(\quad\) end
\(\quad\) for i from k+1 to n \(\quad\) (Get the non-zero elements in column \(k\) of \(L\) — except the 1’s on its diagonal)
\(\quad\quad\) \(l_{i,k}=\displaystyle\frac{a_{i,k}-\sum_{s=1}^{k-1}l_{i,s}u_{s,k}}{u_{k,k}}\)
\(\quad\) end
end
But what about the right-hand side, \(b\)?#
One thing is missing from this strategy so far: if we are solving with a given right-hand-side column vector \(b\), we would also swap its rows at each stage, with
but with the LU factorization we need to keep track of these swaps for use later.
This turns out to mesh nicely with another detail: we can avoid actually copying array entries around by just keeping track of the order in which we use rows to get zeros in other rows. Our goal will be a permutation vector \(\pi = [\pi_1, \pi_2, \dots \pi_n]\) which says:
First use row \(\pi_1\) to get zeros in column 1 of the \(n-1\) other rows.
Then use row \(\pi_2\) to get zeros in column 2 of the \(n-2\) remaining rows.
…
To do this:
first, initialize an array \(\pi = [1, 2, \dots, n]\)
at stage \(k\), if the pivot element is in row \(p_k \neq k\), swap the corresponding elements in \(\pi\) (rather than swapping entire rows of arrays):
Introducing the name \(A'\) for the new version of matrix \(A\), its row \(k\) has entries \(a'_{k, j} = a_{\pi_k, j}\).
This pattern persists through each row swap: instead of computing a succesion of updated versions of matrix \(A\), we leave it alone and just change the row indices:
All references to entries of \(A\) are now done with permuted row index: \(a_{\pi_i, j}\)
The same applies to the array \(L\) of multipliers:
All references to entries of \(L\) are now done with \(l_{\pi_i, j}\).
Finally, since these row swaps also apply to the right-hand side \(b\), we do the same there:
All references to entries of \(b\) are now done with \(b_{\pi_i}\).
Pseudo-code for LU factorization with a permutation vector#
(LU factorization with row swapping, II)
Initialize the permutation vector, \(\pi \leftarrow [1, 2, \dots, n]\)
for k from 1 to n
\(\quad\) Find the pivot element:
\(\quad\) \(p \leftarrow k\) \(\quad\) (p will be the index of the pivot row)
\(\quad\) for i from k+1 to n
\(\quad\)\(\quad\) if \(|u_{i, k}| > |u_{p, k}|\):
\(\quad\)\(\quad\)\(\quad\) \(p \leftarrow i\)
\(\quad\)\(\quad\) end
\(\quad\) if p > k \(\quad\) (Just swap indices, not rows)
\(\quad\)\(\quad\) \(\pi_k \leftrightarrow \pi_p\)
\(\quad\) end
\(\quad\) for j from k to n \(\quad\) (Get the non-zero elements in row \(k\) of \(U\))
\(\quad\quad\) \(u_{k,j} \leftarrow a_{k,j}-\sum_{s=1}^{k-1}l_{k,s}u_{s,j}\)
\(\quad\) end
\(\quad\) for i from k+1 to n \(\quad\) (Get the non-zero elements in column \(k\) of \(L\) — except the 1’s on its diagonal)
\(\quad\quad\) \(l_{i,k} \leftarrow \displaystyle\frac{a_{i,k}-\sum_{s=1}^{k-1}l_{i,s}u_{s,k}}{u_{k,k}}\)
\(\quad\) end
end
For the version with a permutation matrix \(P\), instead:
start with an array \(P\) that is the identity matrix, and then
swap its rows \(k \leftrightarrow p_k\) at stage \(k\) instead of swapping the entries of \(\pi\) or the rows of \(A\) and \(L\).
using LinearAlgebra: norm
using LinearAlgebra: ⋅ # For the dot product (the centered dot can be typed as \cdot then tab)
include("NumericalMethods.jl")
using .NumericalMethods: printmatrix
function plu(A; demomode=false)
# Compute the Doolittle PA=LU factorization of A —
# but with the permutation recorded as permutation vector, not as the permutation matrix P.
# Sums like $\sum_{s=1}^{k-1} l_{k,s} u_{s,j}$ are done as matrix products;
# in the above case, row matrix L[k, 1:k-1] by column matrix U[1:k-1,j] gives the sum for a give j,
# and row matrix L[k, 1:k-1] by matrix U[1:k-1,k:n] gives the relevant row vector.
n = size(A)[1] # gives the number of rows in the 2D array.
# Julia can use Greek letters (and in fact, UNICODE):
# to insert character π, type \pi, hit tab, and select "π" from the menu.
# Or just call it "perm" or such.
π = collect(1:n)
# Julia language note: function "collect" converts the abstract entity "1:n" into an array of numbers.
# Initialize U as the zero matrix;
# correct below the main diagonal, with the other entries to be computed below.
U = zeros(n,n)
# Initialize L as zeros;
# correct above the main diagonal, with the other entries to be computed below,
# including the ones on the diagonal.
L = zeros(n,n)
for k in 1:n-1
if demomode; println("k=$k"); end
# Find the pivot element in column k:
pivotrow = k
abs_u_ik_max = abs(A[π[k],k])
for row in k+1:n
abs_u_ik = abs(A[π[row],k])
if abs_u_ik > abs_u_ik_max
pivotrow = row
abs_u_ik_max = abs_u_ik
end
end
if pivotrow > k # swap rows, virtually
if demomode; println("Swap row $k with row $pivotrow"); end
(π[k], π[pivotrow]) = (π[pivotrow], π[k])
else
if demomode; println("No row swap needed."); end
end
U[k,k:end] = A[[π[k]],k:end] - L[[π[k]],1:k] * U[1:k,k:end]
L[π[k],k] = 1.
for row in k+1:n
L[π[row],k] = ( A[π[row],k] - L[π[row],1:k] ⋅ U[1:k,k] ) / U[k,k]
# Julia note: To enter the centered dot notation for the dot product, type "\cdot" and then hit the tab key.
end
if demomode
println("permuted A is:")
for row in 1:n
println(A[π[row],:])
end
println("Intermediate L is"); printmatrix(L)
println("Intermediate U is"); printmatrix(U)
end
end
# The last row (index "end") is special: nothing to do for L except put in the 1 on the "permuted main diagonal"
L[π[end],end] = 1.
U[end,end] = A[π[end],end] - L[π[end],1:end-1] ⋅ U[1:end-1,end]
if demomode
println("After the final step, k=$(n-1)")
println("L is"); printmatrix(L)
println("U is"); printmatrix(U)
end
return (L, U, π)
end;
A = [ 1.0 -3.0 22.0 ; 3.0 5.0 -6.0 ; 4.0 235.0 7.0 ]
println("A is"); printmatrix(A)
(L, U, π) = plu(A, demomode=true)
println("\nFunction plu gives")
println("L="); printmatrix(L)
println("U="); printmatrix(U)
println("row permution $(π)")
println("The 'residual' or 'backward error' A-LU is"); printmatrix(A - L*U)
A is
[ 1.0 -3.0 22.0
3.0 5.0 -6.0
4.0 235.0 7.0 ]
k=1
Swap row 1 with row 3
permuted A is:
[4.0, 235.0, 7.0]
[3.0, 5.0, -6.0]
[1.0, -3.0, 22.0]
Intermediate L is
[ 0.25 0.0 0.0
0.75 0.0 0.0
1.0 0.0 0.0 ]
Intermediate U is
[ 4.0 235.0 7.0
0.0 0.0 0.0
0.0 0.0 0.0 ]
k=2
No row swap needed.
permuted A is:
[4.0, 235.0, 7.0]
[3.0, 5.0, -6.0]
[1.0, -3.0, 22.0]
Intermediate L is
[ 0.25 0.3605839416058394 0.0
0.75 1.0 0.0
1.0 0.0 0.0 ]
Intermediate U is
[ 4.0 235.0 7.0
0.0 -171.25 -11.25
0.0 0.0 0.0 ]
After the final step, k=2
L is
[ 0.25 0.3605839416058394 1.0
0.75 1.0 0.0
1.0 0.0 0.0 ]
U is
[ 4.0 235.0 7.0
0.0 -171.25 -11.25
0.0 0.0 24.306569343065693 ]
Function plu gives
L=
[ 0.25 0.3605839416058394 1.0
0.75 1.0 0.0
1.0 0.0 0.0 ]
U=
[ 4.0 235.0 7.0
0.0 -171.25 -11.25
0.0 0.0 24.306569343065693 ]
row permution [3, 2, 1]
The 'residual' or 'backward error' A-LU is
[ 0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0 ]
Matrix \(L\) is not actually lower triangular, due to the permutation of its rows, but is still fine for a version of forward substitution, because
row \(\pi_1\) only involves \(x_1\) (multiplied by 1) and so can be used to solve for \(x_1\)
row \(\pi_2\) only involves \(x_1\) and \(x_2\) (the latter multiplied by 1) and so can be used to solve for \(x_2\)
…
(Psychologically [lower] triangular)
A matrix like this — one that is a row-permutation of a [lower] triangular matrix — is called psychologically [lower] triangular. (Maybe because it believes itself to be such?)
Forward and backward substitution with a permutation vector#
To solve \(L c = b\), all one has to change from the formulas for forward substitution seen in the previous section Solving Ax = b with LU factorization is to put the permuted row index \(\pi_i\) in both \(L\) and \(b\):
function forwardsubstitution(L, b, π)
# Version 2: with permutation of rows
# Solve L c = b for c, with permutation of the rows of L and of b.
n = length(b)
c = zeros(n)
c[1] = b[π[1]]
for i in 2:n
c[i] = b[π[i]] - L[π[i], 1:i] ⋅ c[1:i]
end
return c
end;
b = [2.0, 3.0, 4.0];
c = forwardsubstitution(L, b, π)
print("c = $c")
c = [4.0, 0.0, 1.0]
Then the final step, solving \(Ux = b\) for \(x\), needs no change, because \(U\) had no rows swapped,
so we are done; we can import the function backwardSubstitution
seen previously
using .NumericalMethods: backwardsubstitution
x = backwardsubstitution(U, c)
println("x = $x")
Ax = A*x
r = b - A*x
println("The residual r = b - Ax is \n$r\nwith maximum norm $(norm(r, Inf))")
x = [1.0867867867867869, -0.002702702702702703, 0.04114114114114114]
The residual r = b - Ax is
[0.0, 0.0, 0.0]
with maximum norm 0.0