Matrix function estimation with `primate`

In the introduction, the basics of implicit trace estimation were introduced, wherein it shown both in theory and with code using primate how to estimate the trace of matrix functions:

f(A) = U f(\Lambda) U^T, \quad f: [a,b] \to \mathbb{R}

In particular, the introduction briefly covered how the Lanczos method is intimately connected to Gaussian Quadrature, and how this connection enables fast randomized trace estimation of f(A).

In this post, I’ll cover how to approximate the action v \mapsto f(A)v for any function f with primate with its matrix_function API, and how to compose this functionality with other trace estimators.

Matrix function approximation

If A \in \mathbb{R}^{n \times n} and n is large, obtaining f(A) \in \mathbb{R}^{n \times n} explicitly can be very expensive. One way to sidestep this difficulty is to approximate v \mapsto f(A)v using the degree-k Lanczos expansion:

Q^T A Q = T \quad \Leftrightarrow \quad f(A)v \approx \lVert x \rVert \cdot Q f(T) e_1

It’s been shown by (Musco, Musco, and Sidford 2018) this approximation has the following guarantee:

\|f(\mathbf{A}) \mathbf{x}-\mathbf{y}\| \leq 2\|\mathbf{x}\| \cdot \min _{\substack{\text { polynomial } p \\ \text { with degree }<k}}\left(\max _{x \in\left[\lambda_{\min }(\mathbf{A}), \lambda_{\max }(\mathbf{A})\right]}|f(x)-p(x)|\right)

In other words, up to a factor of 2, the error \|f(\mathbf{A}) \mathbf{x}-\mathbf{y}\| is bounded by the uniform error of the best polynomial approximation to f with degree < k. For general matrix functions, this implies that finite-precision Lanczos essentially matches strongest known exact arithmetic bounds.

Modifying operators

The above approximation result suggests an idea: can we modify the existing matrix-free algorithms that rely matvec functionality v \mapsto Av to work matrix functions v \mapsto f(A)v?

This is precisely what primate enables with its matrix_function API: given a existing matrix-like object A and callable fun provided by the user, matrix_function constructs a LinearOperator that transparently converts matrix-vector products Av into products f(A)v.

As a baseline example, consider the action that adds an \epsilon amount of mass to the diagonal of A:

v \mapsto (A + \epsilon I)v

For any fixed \epsilon, imitating this matrix action can be done in three different ways:

Obtain u = Av and then add u \gets u + \epsilon \cdot v
Form (A + \epsilon I) and explicitly carry out the multiplication
Multiply v by f(A) induced by f(\lambda) = \lambda + \epsilon

Lets see what the code to accomplish this using (3) looks like:

from primate.random import symmetric
from primate.operator import matrix_function

## Random matrix + input vector
A = symmetric(150, pd = True)
v = np.random.uniform(size=A.shape[1])

## Ground truth v |-> f(A)v
Lambda, U = np.linalg.eigh(A)
v_truth = (U @ np.diag(Lambda + 0.10) @ U.T) @ v

## Lanczos approximation
M = matrix_function(A, fun = lambda x: x + 0.10)
v_test = np.ravel(M @ v)

vn, vn_approx = np.linalg.norm(v_truth), np.linalg.norm(v_test)
print(f"f(A)v norm:  {vn:.6f}")
print(f"Approx norm: {vn_approx:.6f}")
print(f"Max diff:    {np.max(np.abs(v_truth - np.ravel(v_test))):.6e}")
print(f"cosine sim:  {np.dot(v_test, v_truth) / (vn * vn_approx):6e}")

f(A)v norm:  4.392660
Approx norm: 4.392660
Max diff:    5.218048e-15
cosine sim:  1.000000e+00

Observe M matches the ground truth v \mapsto (A + \epsilon I)v. In this way, one benefit of using matrix_function is that it allows one to approximate f(A) by thinking only about what is happening at the spectral level (as opposed to the matrix level). We can check the result is identical to approach (1) and (2) above:

np.allclose(A @ v + 0.10 * v, v_truth)

True

Baseline established.

When v \mapsto f(A)v is not known

On the other hand, there are many situations where the explicit expression of the matrix polynomial corresponding to f is analytically intractable, too computationally expensive to obtain, or simply unknown. For example, consider the map:

v \mapsto (A + \epsilon I)^{-1} v

Such expressions pop up in a variety of settings, such as in Tikhonov regularization, in Schatten-norm estimation (Ubaru, Saad, and Seghouane 2017), in the Cholesky factorization of PSD matrices, and so on.

Note that unlike the previous setting, we cannot readily access v \mapsto f(A)v unless we explicitly compute the full spectral decomposition of A or the inverse of A, both of which are expensive to obtain. Alternatively, observe that for A \succeq 0, we have: \lambda \in \Lambda(\, A \, ) \; \Leftrightarrow \; (\lambda+\epsilon)^{-1} \in \Lambda(\, (A + \epsilon I)^{-1} \,)

Thus, obtaining an operator approximating v \mapsto (A + \epsilon I)^{-1}v requires just a trivial modification of the code above:

## Alternative: v_truth = np.linalg.inv((A + 0.10 * np.eye(A.shape[0]))) @ v
v_truth = (U @ np.diag(np.reciprocal(Lambda + 0.10)) @ U.T) @ v

## Lanczos approximation
M = matrix_function(A, fun = lambda x: 1.0 / (x + 0.10))
v_test = np.ravel(M @ v)

vn, vn_approx = np.linalg.norm(v_truth), np.linalg.norm(v_test)
print(f"f(A)v norm:  {vn:.6f}")
print(f"Approx norm: {vn_approx:.6f}")
print(f"Max diff:    {np.max(np.abs(v_truth - np.ravel(v_test))):.6e}")
print(f"cosine sim:  {np.dot(v_test, v_truth) / (vn * vn_approx):6e}")

f(A)v norm:  32.982583
Approx norm: 32.982583
Max diff:    1.626085e-05
cosine sim:  1.000000e+00

There is a larger degree of error compared to the base as evidenced by the \lVert \cdot \rVert_\infty-normed difference between v_truth and v_test, however this is to be expected, as in general approximating the action v \mapsto A^{-1} v will always be more difficult that v \mapsto A v, even if A is well-conditioned.

Back to trace estimation

There are several use-cases wherein one might be interested in the output f(A)v itself, e.g. principal component regression or spectral clustering. Another use case is implicit matrix function trace estimation.

from primate.trace import hutch, xtrace
M = matrix_function(A, fun="log")
print(f"Logdet exact:  {np.sum(np.log(np.linalg.eigvalsh(A))):6e}")
print(f"Logdet Hutch:  {hutch(A, fun='log'):6e}")
print(f"Logdet XTrace: {xtrace(M):6e}")

Logdet exact:  -1.483218e+02
Logdet Hutch:  -1.473786e+02
Logdet XTrace: -1.483155e+02

As with the hutch estimators applied to matrix functions, note that the action v \mapsto f(A)v is subject to the approximation errors mentioned above, making such extensions limited to functions that are well-approximated by the Lanczos method.

References

Musco, Cameron, Christopher Musco, and Aaron Sidford. 2018. “Stability of the Lanczos Method for Matrix Function Approximation.” In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 1605–24. SIAM.

Ubaru, Shashanka, Yousef Saad, and Abd-Krim Seghouane. 2017. “Fast Estimation of Approximate Matrix Ranks Using Spectral Densities.” Neural Computation 29 (5): 1317–51.