linalg.pca

linalg.pca(X, d=2, center=True, coords=True)

Projects X onto a d-dimensional linear subspace via Principal Component Analysis.

PCA is a linear dimensionality reduction algorithm that projects a point set X onto a lower dimensional space using an orthogonal projector built from the eigenvalue decomposition of its covariance matrix.

Parameters

Name	Type	Description	Default
`X`	ArrayLike	(n x D) point cloud / design matrix of n points in D dimensions.	required
`d`	int	dimension of the embedding to produce.	`2`
`center`	bool	whether to center the data prior to computing eigenvectors. Defaults to True.	`True`
`coords`	bool	whether to return the embedding or the eigenvectors. Defaults to the embedding.	`True`

Returns

Type	Description
Union[np.ndarray, tuple]	if coords = True (default), returns the projection of `X` onto the largest `d` eigenvectors of `X`s covariance matrix. Otherwise, the eigenvalues and eigenvectors can be returned as-is.

Notes

PCA is dual to CMDS in the sense that the embedding produced by CMDS on the Euclidean distance matrix from X satisfies the same reconstruction loss as with PCA. In particular, when X comes from Euclidean space, the output of pca(…) will match the output of cmds(…) exactly up to rotation and translation.

Examples

import numpy as np 
from geomcover.linalg import pca, cmds

## Start with a random set of points in R^3 + its distance matrix
X = np.random.uniform(size=(50,3))
D = np.linalg.norm(X - X[:,np.newaxis], axis=2)

## Note that CMDS takes as input *squared* distances
Y_pca = pca(X, d=2)
Y_mds = cmds(D**2, d=2)

## Get distance matrices for both embeddings
Y_pca_D = np.linalg.norm(Y_pca - Y_pca[:,np.newaxis], axis=2)
Y_mds_D = np.linalg.norm(Y_mds - Y_mds[:,np.newaxis], axis=2)

## Up to rotation and translation, the coordinates are identical
all_close = np.allclose(Y_pca_D, Y_mds_D)
print(f"PCA and MDS coord. distances identical? {all_close}")

PCA and MDS coord. distances identical? True