The Lanczos method
Whether for simplifying the representation of complicated systems, characterizing the asymptotic behavior of differential equations, or even just fitting polynomials to data via least-squares, decomposing linear operators has had significant use in many areas of sciences and engineering.
Central to operator theory is the spectral theorem, which provides conditions under which a linear operator A : \mathbb{R}^n \to \mathbb{R}^n can be diagonalized in terms of its eigenvalues and eigenvectors: A = U \Lambda U^{-1}
In the case where A is symmetric, the eigen-decomposition is not only guarenteed to exist, but its canonical form may be obtained via orthogonal diagonalization. Such matrices are among the most commonly encountered matrices in applications.
In 1950, Cornelius Lanczos studied an alternative means of decomposition via tridiagonalization: AQ = Q T \quad \Leftrightarrow \quad Q^T A Q = T The algorithm by which one produces such a T is known as the Lanczos method. Despite its age, it remains the standard algorithm1 both for computing eigensets and solving linear systems in the large-scale regime. Having intrinsic connections to the conjugate gradient method, the theory of orthogonal polynomials, and Gaussian quadrature, it is one of the most important numerical methods of all time—indeed, an IEEE guest editorial places it among the top 10 most influential algorithms of the 20th century.
As the Lanczos method lies at the heart of primate
’s design, this post introduces it, with a focus on its motivating principles and computational details. For its API usage, see the lanczos page.
Lanczos on a bumper sticker
Given any non-zero v \in \mathbb{R}^n, Lanczos generates a Krylov subspace via successive powers of A:
\mathcal{K}(A, v) \triangleq \{ \, A^{0} v, A^{1}v, A^{2}v, \dots, A^{n}v \, \}
These vectors are independent, so orthogonalizing them not only yields an orthonormal basis for \mathcal{K}(A, v) but also a change-of-basis matrix Q, allowing A to be represented by a new matrix T:
\begin{align*} K &= [\, v \mid Av \mid A^2 v \mid \dots \mid A^{n-1}v \,] && \\ Q &= [\, q_1, q_2, \dots, q \,] \gets \mathrm{qr}(K) && \\ T &= Q^T A Q && \end{align*}
It turns out that since A is symmetric, T is guaranteed to have a symmetric tridiagonal structure:
T = \mathrm{tridiag}\Bigg( \begin{array}{ccccccccc} & \beta_2 & & \beta_3 & & \cdot & & \beta_n & \\ \alpha_1 & & \alpha_2 & & \cdot & & \cdot & & \alpha_n \\ & \beta_2 & & \beta_3 & & \cdot & & \beta_n & \end{array} \Bigg)
Since \mathrm{range}(Q) = \mathcal{K}(A, v), the change-of-basis A \mapsto Q^{-1} A Q is in fact a similarity transform, which are known to be equivalence relations on \mathcal{S}^n—thus we can obtain \Lambda by diagonalizing T:
T = Y \Lambda Y^T
As T is quite structured, it can be easily diagonalized, thus we have effectively solved the eigenvalue problem. To quote the Lanczos introduction from Parlett, could anything be more simple?
The “iteration” part
Lanczos originally referred to his algorithm as the method of minimized iterations, and indeed nowadays it is often called an iterative method. Where’s the iterative component?
If you squint hard enough, you can deduce that for every j \in [1, n): A Q_j = Q_j T_j + \beta_{j+1} q_{j+1} e_{j}^T Equating the j-th columns on each side of the equation and rearranging yields a three-term recurrence: \begin{align*} A q_j &= \beta_{j\text{-}1} q_{j\text{-}1} + \alpha_j q_j + \beta_j q_{j+1} \\ \Leftrightarrow \beta_{j} \, q_{j+1} &= A q_j - \alpha_j \, q_j - \beta_{j\text{-}1} \, q_{j\text{-}1} \end{align*}
The equation above is a variable-coefficient second-order linear difference equation, and it is known such equations have unique solutions; they are given below: \alpha_j = q_j^T A q_j, \;\; \beta_j = \lVert r_j \rVert_2, \;\; q_{j+1} = r_j / \beta_j
\text{where } r_j = (A - \alpha_j I)q_j - \beta_{j\text{-}1} q_j
In other words, if (q_{j\text{-}1}, \beta_j, q_j) are known, then (\alpha_j, \beta_{j+1}, q_{j+1}) are completely determined. In theory, this means we can iteratively generate both Q and T using just a couple vectors at a time—no need to explicitly call to the QR algorithm as shown above. Pretty nifty, eh!
Wait, isn’t T arbitrary?
Unfortunately—and unlike the spectral decomposition2—there is no canonical choice of T. Indeed, as T is a family with n - 1 degrees of freedom and v \in \mathbb{R}^n was chosen arbitrarily, there are infinitely many essentially distinct such decompositions.
Not all hope is lost though, as it turns out that T is actually fully characterized by v. To see this, notice that since Q is an orthogonal matrix, we have:
Q Q^T = I_n = [e_1, e_2, \dots, e_n]
By extension, given an initial pair (A, q_1) satisfying \lVert q_1 \rVert = 1, the following holds:
K_n(A, q_1) = Q Q^T K_n(A, q_1) = Q[ \, e_1 \mid T e_1 \mid T^2 e_1 \mid \dots \mid T^{n-1} e_1 \, ]
…this is actually a QR factorization, which is essentially unique! Indeed, the Implicit Q Theorem asserts that if an upper Hessenburg matrix T \in \mathbb{R}^{n \times n} has only positive elements on its first subdiagonal and there exists an orthogonal matrix Q such that Q^T A Q = T, then Q and T are uniquely determined3 by (A, q_1).
Beating the complexity bounds
Elegant and as theoretically founded as the Lanczos method may be, is it efficient in practice?
Let’s start by establishing a baseline on its complexity:
Theorem 1 (Parlett 1994) Given a symmetric rank-r matrix A \in \mathbb{R}^{n \times n} whose operator x \mapsto A x requires O(\eta) time and O(\nu) space, the Lanczos method computes \Lambda(A) in O(\max\{\eta, n\}\cdot r) time and O(\max\{\nu, n\}) space, when computation is done in exact arithmetic
As its clear from the theorem, if we specialize it such that r = n and \eta = \nu = n, then the Lanczos method requires just O(n^2) time and O(n) space to execute. In other words, the Lanczos method drops both the time and space complexity4 of obtaining spectral information by order of magnitude over similar eigen-algorithms that decompose A directly.
To see why this is true, note that a symmetric tridiagonal matrix is fully characterized by its diagonal and subdiagonal terms, which requires just O(n) space. If we assume that v \mapsto Av \sim O(n), then carrying out the recurrence clearly takes at most O(n^2) time, since there are most n such vectors \{q_i\}_{i=1}^n to generate!
Now, if we need to store all of Y or Q explicitly, we clearly need O(n^2) space to do so. However, if we only need the eigen-values \Lambda(A) (and not their eigen-vectors U), then we may execute the recurrence keeping at most three vectors \{q_{j-1}, q_{j}, q_{j+1}\} in memory at any given time. Since each of these is O(n) is size, the claim of O(n) space is justified!
Footnotes
A variant of the Lanczos method is actually at the heart
scipy.sparse.linalg
’s defaulteigsh
solver (which is a port of ARPACK).↩︎The spectral decomposition A = U \Lambda U^T identifies a diagonalizable A with its spectrum \Lambda(A) up to a change of basis A \mapsto M^{-1} A M↩︎
The spectral decomposition A = U \Lambda U^T identifies a diagonalizable A with its spectrum \Lambda(A) up to a change of basis A \mapsto M^{-1} A M↩︎
For general A \in \mathbb{R}^{n \times n}, computing the spectral-decomposition is essentially bounded by the matrix-multiplication time: \Theta(n^\omega) time and \Theta(n^2) space, where \omega \approx 2.37\dots is the matrix multiplication constant. If we exclude the Strassen model for computation, we get effectively a \Omega(n^3) time and \Omega(n^2) space bound.↩︎