Matrix function estimation with primate
In the introduction, the basics of implicit trace estimation were introduced, wherein it shown both in theory and with code using primate
how to estimate the trace of matrix functions:
f(A) = U f(\Lambda) U^T, \quad f: [a,b] \to \mathbb{R}
In particular, the introduction briefly covered how the Lanczos method is intimately connected to Gaussian Quadrature, and how this connection enables fast randomized trace estimation of f(A).
In this post, I’ll cover how to approximate the action v \mapsto f(A)v for any function f with primate
with its matrix_function
API, and how to compose this functionality with other trace estimators.
Matrix function approximation
If A \in \mathbb{R}^{n \times n} and n is large, obtaining f(A) \in \mathbb{R}^{n \times n} explicitly can be very expensive. One way to sidestep this difficulty is to approximate v \mapsto f(A)v using the degree-k Lanczos expansion:
Q^T A Q = T \quad \Leftrightarrow \quad f(A)v \approx \lVert x \rVert \cdot Q f(T) e_1
It’s been shown by (Musco, Musco, and Sidford 2018) this approximation has the following guarantee:
\|f(\mathbf{A}) \mathbf{x}-\mathbf{y}\| \leq 2\|\mathbf{x}\| \cdot \min _{\substack{\text { polynomial } p \\ \text { with degree }<k}}\left(\max _{x \in\left[\lambda_{\min }(\mathbf{A}), \lambda_{\max }(\mathbf{A})\right]}|f(x)-p(x)|\right)
In other words, up to a factor of 2, the error \|f(\mathbf{A}) \mathbf{x}-\mathbf{y}\| is bounded by the uniform error of the best polynomial approximation to f with degree < k. For general matrix functions, this implies that finite-precision Lanczos essentially matches strongest known exact arithmetic bounds.
Modifying operators
The above approximation result suggests an idea: can we modify the existing matrix-free algorithms that rely matvec
functionality v \mapsto Av to work matrix functions v \mapsto f(A)v?
This is precisely what primate
enables with its matrix_function
API: given a existing matrix-like object A
and callable fun
provided by the user, matrix_function
constructs a LinearOperator that transparently converts matrix-vector products Av into products f(A)v.
As a baseline example, consider the action that adds an \epsilon amount of mass to the diagonal of A:
v \mapsto (A + \epsilon I)v
For any fixed \epsilon, imitating this matrix action can be done in three different ways:
- Obtain u = Av and then add u \gets u + \epsilon \cdot v
- Form (A + \epsilon I) and explicitly carry out the multiplication
- Multiply v by f(A) induced by f(\lambda) = \lambda + \epsilon
Lets see what the code to accomplish this using (3) looks like:
from primate.random import symmetric
from primate.operator import matrix_function
## Random matrix + input vector
= symmetric(150, pd = True)
A = np.random.uniform(size=A.shape[1])
v
## Ground truth v |-> f(A)v
= np.linalg.eigh(A)
Lambda, U = (U @ np.diag(Lambda + 0.10) @ U.T) @ v
v_truth
## Lanczos approximation
= matrix_function(A, fun = lambda x: x + 0.10)
M = np.ravel(M @ v)
v_test
= np.linalg.norm(v_truth), np.linalg.norm(v_test)
vn, vn_approx print(f"f(A)v norm: {vn:.6f}")
print(f"Approx norm: {vn_approx:.6f}")
print(f"Max diff: {np.max(np.abs(v_truth - np.ravel(v_test))):.6e}")
print(f"cosine sim: {np.dot(v_test, v_truth) / (vn * vn_approx):6e}")
f(A)v norm: 4.392660
Approx norm: 4.392660
Max diff: 5.218048e-15
cosine sim: 1.000000e+00
Observe M matches the ground truth v \mapsto (A + \epsilon I)v. In this way, one benefit of using matrix_function
is that it allows one to approximate f(A) by thinking only about what is happening at the spectral level (as opposed to the matrix level). We can check the result is identical to approach (1) and (2) above:
@ v + 0.10 * v, v_truth) np.allclose(A
True
Baseline established.
When v \mapsto f(A)v is not known
On the other hand, there are many situations where the explicit expression of the matrix polynomial corresponding to f is analytically intractable, too computationally expensive to obtain, or simply unknown. For example, consider the map:
v \mapsto (A + \epsilon I)^{-1} v
Such expressions pop up in a variety of settings, such as in Tikhonov regularization, in Schatten-norm estimation (Ubaru, Saad, and Seghouane 2017), in the Cholesky factorization of PSD matrices, and so on.
Note that unlike the previous setting, we cannot readily access v \mapsto f(A)v unless we explicitly compute the full spectral decomposition of A or the inverse of A, both of which are expensive to obtain. Alternatively, observe that for A \succeq 0, we have: \lambda \in \Lambda(\, A \, ) \; \Leftrightarrow \; (\lambda+\epsilon)^{-1} \in \Lambda(\, (A + \epsilon I)^{-1} \,)
Thus, obtaining an operator approximating v \mapsto (A + \epsilon I)^{-1}v requires just a trivial modification of the code above:
## Alternative: v_truth = np.linalg.inv((A + 0.10 * np.eye(A.shape[0]))) @ v
= (U @ np.diag(np.reciprocal(Lambda + 0.10)) @ U.T) @ v
v_truth
## Lanczos approximation
= matrix_function(A, fun = lambda x: 1.0 / (x + 0.10))
M = np.ravel(M @ v)
v_test
= np.linalg.norm(v_truth), np.linalg.norm(v_test)
vn, vn_approx print(f"f(A)v norm: {vn:.6f}")
print(f"Approx norm: {vn_approx:.6f}")
print(f"Max diff: {np.max(np.abs(v_truth - np.ravel(v_test))):.6e}")
print(f"cosine sim: {np.dot(v_test, v_truth) / (vn * vn_approx):6e}")
f(A)v norm: 32.982583
Approx norm: 32.982583
Max diff: 1.626085e-05
cosine sim: 1.000000e+00
There is a larger degree of error compared to the base as evidenced by the \lVert \cdot \rVert_\infty-normed difference between v_truth
and v_test
, however this is to be expected, as in general approximating the action v \mapsto A^{-1} v will always be more difficult that v \mapsto A v, even if A is well-conditioned.
Back to trace estimation
There are several use-cases wherein one might be interested in the output f(A)v itself, e.g. principal component regression or spectral clustering. Another use case is implicit matrix function trace estimation.
from primate.trace import hutch, xtrace
= matrix_function(A, fun="log")
M print(f"Logdet exact: {np.sum(np.log(np.linalg.eigvalsh(A))):6e}")
print(f"Logdet Hutch: {hutch(A, fun='log'):6e}")
print(f"Logdet XTrace: {xtrace(M):6e}")
Logdet exact: -1.483218e+02
Logdet Hutch: -1.473786e+02
Logdet XTrace: -1.483155e+02
As with the hutch estimators applied to matrix functions, note that the action v \mapsto f(A)v is subject to the approximation errors mentioned above, making such extensions limited to functions that are well-approximated by the Lanczos method.