This is part of a (planned) multi-part tutorial series demonstrating both the utility and generality of the Mapper method. This vignette describes how one might use the Mapper method to compare shapes.

Motivation

One of the central preoccupations of computer vision is understanding how to recognize objects representing semantically ‘equivalent’ things. In particular, much work has been done on quantifying notions of equivalence between shapes in both robust and meaningful ways. Generally speaking, to formally ascertain whether certain shapes represent the ‘same’ object, it’s necessary to first equip these shapes with some kind of geometric structure. For example, if the object comes in the form of point cloud data, a typical approach is to equip these shapes with a distance metric and view them as metric spaces.

Recall that a metric space is a pair $$(d, X)$$ where $$X$$ is a set equipped with a function $$d: X \times X \to \mathbb{R}$$ which expresses the some measure of distance between points. To qualify as a true metric, for any points $$x, x', x'' \in X$$, $$d$$ must satisfy:

1. $$d(x, x') \geq 0, d(x, x') = 0$$ iff $$x = x'$$ (non-negativity, identity)
2. $$d(x, x') = d(x', x)$$ (symmetry)
3. $$d(x, x') + d(x', x'') \geq d(x, x'')$$ (triangle-equality)

The conditions satisfied by a metric express intuitive notions of distance in an element-wise manner. However, if one wishes to compare groups of elements, a convenient and natural extension to a metric is the Hausdorff distance.

Hausdorff distance

The Hausdorff distance measures how far apart subsets of a metric space are from each other; that is, given two non-empty subsets $$X, Y$$ of some common metric space $$(Z, d_Z)$$, it expresses the maximum distance a set is from the nearest point in the other set. The typical definition of the ‘directed’ Hausdorff distance is:

$$$\tag{1} d_{DH}(X, Y) = \sup_{x \in X}\inf_{y \in Y} d_Z(x, y)$$$

The above distance is not symmetric (hence the ‘directed’ designation) and extension to $$d_Z$$ does not result in a not a true metric. We can generalize $$d_{DH}$$ by taking the maximum directed Hausdorff distance between $$X$$ and $$Y$$ and $$Y$$ and $$X$$:

$$$\tag{2} d_{H}(X, Y) = \max \{ \, d_{DH}(X,Y), \,d_{DH}(Y,X) \, \}$$$

It can be shown that this symmetric Hausdorff distance is in fact a metric on the space of compact subsets of $$Z$$. This nice property of the Hausdorff distance enables a convenient and natural extension of any metric $$d_Z$$ to be defined between subsets of $$Z$$.

Intuitively, the Hausdorff distance provides very convenient distance measure for the mapper construction. Since each node in a mapper represents a subset of the data, if the space the mapper nodes were constructed on was a metric space, the Hausdorff distance can be used to embed the mapper into a metric space.

To illustrate this idea, I’m going to borrow some high-fidelity pose data sets from this source

target_dir <- tempdir()
target_file <- file.path(target_dir, "elephant-poses.tgz")
pose_db <- "http://people.csail.mit.edu/sumner/research/deftransfer/data"
if (!file.exists(target_file)){
}
library("readobj")
rglwidget()