3.9. Computing Eigenvalues and Eigenvectors: the Power Method, and a bit beyond#

References:

  • Section 12.1 Power Iteration Methods of [Sauer, 2019].

  • Section 7.2 Eigenvalues and Eigenvectors of [Burden et al., 2016].

  • Chapter 8, More on Linear Equations of [Chenney and Kincaid, 2012], in particular section 3 Power Method, and also section 2 Eigenvalues and Eigenvectors as background reading.

The eigenproblem for a square \(n \times n\) matrix \(A\) is to compute some or all non-trivial solutions of

\[A \vec{v} = \lambda \vec{v}.\]

(By non-trivial, I mean to exclude \(\vec{v} = 0\), which gives a solution for any \(\lambda\).) That is, to compute the eigenvalues \(\lambda\) (of which generically there are \(n\), but sometimes less) and the eigenvectors \(\vec{v}\) corresponding to each.

With eigenproblems, and particularly those arising from differential equations, one often needs only the few smallest and/or largest eigenvalues. For these, the power method described next can be adapted, leading to the shifted inverse power method.

Here we often restict our attention to the case of a real symmetric matrix (\(A^T = A\), or \(A_{ij} = A_{ji}\)), or a Hermitian matrix (\(A^T = A^*\)), for which many things are a bit simpler:

  • all eigenvalues are real,

  • for symmetric matrices, all eigenvectors are also real,

  • there is a complete set of orthogonal eigenvectors \(\vec{v}_k\), \(1 \leq i \leq n\) that form a basis for all vectors, and so on.

However, the methods described here can be used more generally, or can be made to work with minor adjustments.

The eigenvalue are roots of the characteristic polynomial, \(\det(A - \lambda I)\); repeated roots are possible, and they will all be named, so there are always values \(\lambda_i\), \(1 \leq i \leq n\). Here, these eigenvalues will be enumerated in decreasing order of magnitude:

\[|\lambda_1| \geq |\lambda_2| \cdots \geq |\lambda_n|.\]

Generically, all the magnitudes are different, which makes things works more easily, so that will sometimes be assumed while developing the intuition of the method.

The Power Method#

The basic tool is the Power Method, which will usually but not always succeed in computing the eigenvalue of largest magnitude, \(\lambda_1\), and a corresponding eigenvector \(\vec{v}_1\). Its success mainly involves assuming there being a unique largest eigenvalue: \(\lambda_1 > \lambda_i\) for \(i>1\).

In its simplest form, one starts with a unit-length vector \(\vec{x}^{\,0}\), so \(\|\vec{x}^{\,0}\| = 1\), constructs the successive multiples \(\vec{y}^{\,k} = A^k \vec{x}^{\,0}\) by successive multiplications, and rescales at each stage to the unit vectors \(\vec{x}^{\,k} = \vec{y}^{\,k}/\|\vec{y}^{\,k}\|\).

Note that \(\vec{y}^{\,k+1} = A \vec{x}^{\,k}\), so that once \(\vec{x}^{\,k}\) is approximately an eigenvector for eigenvalue \(\lambda\), \(\vec{y}^{\,k+1} \approx \lambda \vec{x}^{\,k}\), leading to the eigenvalue approximation

\[r^{(k)} := \langle\vec{y}^{\,k+1}, \vec{x}^{\,k} \rangle \approx \langle\lambda \vec{x}^{\,k}, \vec{x}^{\,k}\rangle \approx \lambda\]

Remark 3.20 (dot products in Julia)

Here and below, I use \(\langle \vec{a}, \vec{b} \rangle\) to denote the inner product (a.k.a. dot product or scalar product) of two vectors.

With Julia arrays this is given by function dot(a,b) from package LinearAlgebra, obtained with

using LinearAlgebra: dot

You can even type this as

a ⋅ b

(To get that centered dot in Julia, type \cdot and then tab.)

Algorithm 3.15 (A basic version of the power method)

Choose initial vector \(\vec{y}_0\), maybe with a random number generator.

Normalize to \(\vec{x}^{\,0} = \vec{y}^{\,0}/\|\vec{y}^{\,0}\|\).

for \(k\) from 0 to \(k_{max}\)
\(\quad\) \(\vec{y}^{\,k+1} = A \vec{x}^{\,k}\)
\(\quad\) \(r^{(k)} = \langle \vec{y}^{\,k+1}, \vec{x}^{\,k} \rangle\)
\(\quad\) \(\vec{x}^{\,k+1} = \vec{y}^{\,k+1}/\|\vec{y}^{\,k+1}\|\)
end

The final values of \(r^{(k)}\) and \(\vec{x}^{\,k}\) approximate \(\lambda_1\) and \(\vec{v}_1\) respectively.

See Exercise 1.

Here and below you could check your work with Julia, using function eigvals from package LinearAlgebra.

However, that is almost cheating, so note that there is also a backward error check: see how small \(\| A v - \lambda v \| / \| v \|\) is.

using LinearAlgebra: eigvals
include("NumericalMethods.jl")
using .NumericalMethods: printmatrix
delta = 0.01
A = [3.0 delta delta ; delta 8.0 delta ; delta delta 4.0]
eigenvalues = eigvals(A);
println("With delta=$(delta), so that A is"); printmatrix(A)
println("the eigenvalues are $(eigenvalues)")
With delta=0.01, so that A is
[ 3.0 0.01 0.01 
  0.01 8.0 0.01 
  0.01 0.01 4.0 ]
the eigenvalues are [2.999880409987068, 4.000074490251736, 8.00004509976119]

Remark 3.21 (On Julia)

That package LinearAlgebra also has a function eigvecs for computing eignevevctors.

It can compute them “from scratch” with eigenvectors = eigvecs(A), which returns them in the columns of that matrix eigenvectors.

However, if you already have the eigenvalues, the calculation is much quicker by using them: that is done with eigenvectors = eigvecs(A, eigenvalues)

Refinement: deciding the iteration count#

Some details are omitted above; above all, how to decide the number of iterations.

One approach is to use the fact that an eigenvector-eigenvalue pair satisfies \(A \vec{v} - \lambda \vec{v} = 0\), so the “residual norm”

\[ \frac{\|A \vec{x}^{\,k} - r^{(k)} \vec{x}^{\,k}\|}{\|\vec{x}^{\,k}\|}, = \|\vec{y}^{\,k+1} - r^{(k)} \vec{x}^{\,k}\| \text{ since $\|\vec{x}^{\,k}\| = 1$} \]

is a measure of “relative backward error”.

Thus one could repace the above for loop by a while loop based on a condition like stopping when

(3.4)#\[ \|\vec{y}^{\,k+1} - r^{(k)} \vec{x}^{\,k}\| \leq \epsilon. \]

Alternatively, keep the for loop, but exit early (with break) if this condition is met.

I generally recommend this for-if-break form for implementing iterative methods, because it makes avoidance of infinite loops simpler, and avoids the common while loop issue that you do not yet have an error estimate when the loop starts.

See Exercise 2.

The Inverse Power Method#

The next step is to note that if \(A\) is nonsingular, its inverse \(B = A^{-1}\) has the same eigenvectors, but with eigenvalues \(\mu_i = 1/\lambda_i\).

Thus we can apply the power method to \(B\) in order to compute its largest eigenvalue, which is \(\mu_n = 1/\lambda_n\), along with the corresponding eigenvector \(\vec{v}_n\).

The main change to the above is that

\[\vec{y}^{\,k+1} = A^{-1} \vec{x}_{k}.\]

However, as usual one can (and should) avoid actually computing the inverse. Instead, express the above as the sysem of equations.

\[A \vec{y}^{\,k+1} = \vec{x}_{k}.\]

Here is an important case where the LU factorization method can speed things up greatly: a single LU factorization is needed, after which for each \(k\) one only has to do the far quicker forward and backward substitution steps: \(O(n^2)\) cost for each iteration instead of \(O(n^3/3)\).

Algorithm 3.16 (A basic version of the inverse power method)

Choose initial vector \(\vec{y}_0\), maybe with a random number generator.

Normalize to \(\vec{x}^{\,0} = \vec{y}^{\,0}/\|\vec{y}^{\,0}\|\).

Compute an LU factorization \(L U = A\).

for \(k\) from 0 to \(k_{max}\)
\(\quad\) Solve \(L \vec{z}^{\,k+1} = \vec{x}^{\,k}\)
\(\quad\) Solve \(U \vec{y}^{\,k+1} = \vec{z}^{\,k+1}\)
\(\quad\) \(r^{(k)} = \langle \vec{y}^{\,k+1}, \vec{x}^{\,k} \rangle\)
\(\quad\) \(\vec{x}^{\,k+1} = \vec{y}^{\,k+1}/\| \vec{y}^{\,k+1} \|\)
end

(If all goes well) the final values of \(r^{(k)}\) and \(\vec{x}^{\,k}\) approximate \(\lambda_n\) and \(\vec{v}_n\) respectively.

See Exercise 3.

Getting other eigenvalues with the shifted inverse power method#

The inverse power method computes the eigenvalue closest to 0; by shifting, we can compute the eigenvalue closest to any chosen value \(s\). Then by searching various values of \(s\), we can hope to find all the eigenvectors. As a variant, once we have \(\lambda_1\) and \(\lambda_n\), we can search nearby for other large or small eigenvalues: often the few largest and/or the few smallest are most important.

With a symmetric (or Hermitian) matrix, once the eigenvalue of largest magnitude, \(\lambda_1\) is known, the rest are known to be real values in the interval \([-|\lambda_1|,|\lambda_1|]\), so we know roughly where to seek them.

The main idea here is that for any number \(s\), matrix \(A - s I\) has eigenvalues \(\lambda_i - s\), with the same eigenvectors as \(A\):

\[(A - sI)\vec{v}_i = (\lambda_i - s)\vec{v}_i\]

Thus, applying the inverse power method to \(A - s I\) computes its largest eigenvalue \(\gamma\), and then \(\lambda = 1/(\gamma + s)\) is the eigenvalue of \(A\) closest to \(s\).

See Exercise 4.

Further topics: getting all the eigenvalues with the QR Method, etc.#

The above methods are not ideal when many or all of the eigenvalues of a matrix are wanted; then a variety of more advanced methods have been developed, starting with the QR (factorization) method.

We will not address the details of that method in this course, but one way to think about it for a symmetric matrix is that:

  • The eigenvectors are orthogonal.

  • Thus, if after computing \(\lambda_1\) and \(\vec{v}_1\), one uses the power iteration starting with \(\vec{x}^{\,0,2}\) orthogonal to \(\vec{v}_1\), then all the new iterates \(\vec{x}^{\,k,2}\) will stay orthogonal, and one will get the eigenvector corresponding to the largest remaining eigenvector: you get \(\vec{v}_2\) and \(\lambda_2\).

  • Continuing likewise, one can get the eigenvalues in descending order of magnitude.

  • As a modification, one can do all these almost in parallel: at iteration \(k\), have an approximation \(\vec{x}^{\,k,i}\) for each \(\lambda_i\) and at each stage, got by adjusting these new approximations so that \(\vec{x}^{\,k,i}\) is orthogonal to all the approximations \(\vec{x}^{\,k,j}\), \(j < i\), for all the previous (larger) eigenvalues. This uses a variant of the Gram-Schmidt method for orthogonalizing a set of vectors.


Exercises#

Exercise 1#

Implement the power method Algorithm 3.15 and test it on the real, symmetric matrix

\[\begin{split} A = \left[ \begin{array}{ccc} 3 & 1 & 1 \\ 1 & 8 & 1 \\ 1 & 1 & 4 \end{array} \right] \end{split}\]

This all real eigenvalues, all within \(2\) of the diagonal elements (this claim should be explained as part of the project write-up), so start with it.

As a debugging strategy, you could replace all those off-diagonal ones by a small value \(\delta\):

\[\begin{split} A_\delta = \left[ \begin{array}{ccc} 3 & \delta & \delta \\ \delta & 8 & \delta \\ \delta & \delta & 4 \end{array} \right] \end{split}\]

Then the Gershgorin circle theorem ensures that each eigenvalue is within \(2\delta\) of an entry on the main diagonal. Furthermore, if \(\delta\) is small enough that the circles of radius \(2\delta\) centered on the diagonal elements do not overlap, then there is one eigenvalue in each circle.

You could even start with \(\delta=0\), for which you know exactly the eigenvalues: they are the diagonal elements.

Exercise 2#

Modify your code from Exercise 1 to implement the stopping condition in Equation (3.4).

Exercise 3#

Implement the Inverse Power Method (with a fixed iteration count, as in Exercise 1 and then create a second version that imposes an accuracy target (as in Exercise 2).

Exercise 4#

As above, implement the shifted inverse power method, probably starting with a fixed iteration count version.

For the test case above, some plausible initial choices for the shifts are each of the entries on the main diagonal, and as above, testing with \(A_s\).