The Mapper package provides a configurable implementation of the Mapper framework (see 1). The package includes:

• Efficient implementation of the Mapper framework

• Multiple options for visualizing and interacting with mappers

• Practical defaults for those unfamiliar with Mapper

• Expressive parameterization interface through method chaining

• Portable R6 generators for Mappers components

The package was designed to make modifying or extending the Mapper method simple and efficient, without limiting its generality.

### Getting started

#### Installation

Install the Mapper package from github as follows:

require("devtools")
devtools::install_github("peekxc/mapper")

A CRAN release is planned for the near future.

#### Creating a complex

Mapper takes as input a point cloud $$X$$ and a reference map $$f : X \to Z$$, and returns a topological summary of $$X$$ expressed via a cover equipped on the codomain of the map. Here’s an example where points are sampled from an ‘eight-curve’ in $$\mathbb{R}^2$$:

$g(t) = [\cos(t), \sin(t)\cos(t)],\; t \in \Big(-\frac{1}{2}\pi, \frac{3}{2}\pi\Big)$ In the example below, $$X$$ is created from equally spaced samples over $$t$$, and the map chosen is simply the $$x$$-coordinate of the shape, i.e. $$f(X) = Z = x_1$$.

t <- seq(-0.5*pi, (3/2)*pi, length.out = 100)
eight <- cbind(x1=cos(t), x2=sin(t)*cos(t))
f_x <- matrix(cos(t))

## Use a rainbow palette
rbw_col <- rainbow(length(f_x), start = 0, end = 4/6)
col <- rbw_col[cut(f_x, length(f_x), labels = FALSE)]

## View the data along with the mapping
layout(matrix(1:2, nrow = 1))
plot(eight, pch = 20, col = col, main = expression(X %subset% R^2))
stripchart(f_x, pch = "|", main = expression(f(X) %subset% R))
points(cbind(f_x, 1), pch = "|", col = col, cex = 2)

Below is one possible way to construct a mapper.

m <- MapperRef$new(X = eight)$
use_cover(filter_values = f_x,
typename = "restrained rectangular",
number_intervals = 10, percent_overlap = 50)$use_distance_measure(measure = "euclidean")$
use_clustering_algorithm(cl = "single")$compute_k_skeleton(k = 1L) print(m) ## Mapper construction for 100 objects ## Cover: (typename = Restrained Rectangular, number intervals = [10], overlap = [50]) Mapper construction for 100 objects Cover: (typename = Restrained Rectangular, number intervals = [10], overlap = [50]) There are multiple options one may use to visualize mapper. The easiest is to export the $$1$$-skeleton to a mature graph plotting library, such as igraph: am <- m$simplicial_complex$as_adjacency_matrix() g <- igraph::graph_from_adjacency_matrix(am, mode = "undirected", add.colnames = NA) avg_filter_val <- sapply(m$vertices, function(idx) mean(f_x[idx,]))
plot(g, layout=igraph::layout_with_fr(g, niter = 5500),
vertex.size = 18L, vertex.color = bin_color(avg_filter_val), label.cex = 0.3)

For other visualization options, see the section below.

#### Customizing Mapper

Almost any component of the Mapper method can be customized.

Want to change the metric? Pass the name of any proximity measure used in the proxy package.

## See ?proxy::pr_DB for more details.
m$use_distance_measure("manhattan") ## This is stored as m$measure

Prefer a different linkage criteria to cluster over? Any of the criteria used by hclust can be swapped in.

m$use_clustering_algorithm(cl = "average", num_bins = 10L) Or, just replace the clustering algorithm entirely ## If using a custom metric, just compute the distance matrix m$use_clustering_algorithm(cl = function(X, idx, num_bins=10){
dist_x <- dist(X[idx,], method = self$measure) hc <- hclust(dist_x, method = "average") cutoff_first_bin(hc, diam = max(dist_x), num_bins) }) If you prefer a different covering, just assign a valid object inheriting from CoverRef. m$use_cover(filter_values = f_x,
typename="fixed rectangular",
number_intervals = 10L, percent_overlap=50)

A list of available covering methods, their correspondings parameters, and their generators can be printed as follows:

covers_available()
## Typename:                    Generator:                         Parameters:
##  fixed rectangular            FixedRectangularCover              number_intervals, percent_overlap
##  restrained rectangular       RestrainedRectangularCover         number_intervals, percent_overlap
##  ball                         BallCover                          epsilon

Alternatively, you can create your own cover. See the article on how to make custom cover.

The cover is composed of subsets of the data indexed by an index set

print(m$cover$index_set)
##  [1] "(1)"  "(2)"  "(3)"  "(4)"  "(5)"  "(6)"  "(7)"  "(8)"  "(9)"  "(10)"

These indices can be used as keys into the collection of sets comprising the cover.

all(m$cover$index_set %in% names(m$cover$level_sets))
## [1] TRUE

This can be useful if you want to only update only part of the complex for some reason, e.g. re-cluster part of the data with different clustering parameters. For example, to re-cluster the data associated with the preimages the first two indices, you can do:

clustering_params <- list(num_bins=5L)
m$compute_vertices(which_levels = m$cover$index_set[1:2], clustering_params) The same rule applies to computing the edges. idx_pairs <- t(apply(combn(5, 2), 2, function(ls_idx) { m$cover$index_set[ls_idx] })) m$compute_edges(which_level_pairs = idx_pairs)

If the construction has too many edges, you could require the intersection size be above a certain threshold (default is 1). This isn’t technically part of Mappers definition, but could be useful in practical applications.

m$compute_edges(min_weight = 10) The simplicial complex is available via the $simplicial_complex member, and the points in contained in each vertex is available via the $vertices member. m$simplicial_complex
## Simplex Tree with (15, 15) (0, 1)-simplices
head(m$vertices, 2) ##$0
##  [1] 63 64 65 66 67 68 69 70 71 72 73 74 75 76
## attr(,"level_set")
## [1] 1
##
## $1 ## [1] 59 60 61 62 63 64 65 66 67 68 ## attr(,"level_set") ## [1] 2 The complex is stored in a Simplex Tree (see 2), and the vertices are stored as a list. Each vertex also has an attribute “level_set” denoting the index in the index set of the cover its associated with. The $$1$$-skeleton can be exported to any of the usual graph-type data structures. m$simplicial_complex$as_adjacency_matrix() m$simplicial_complex$as_adjacency_list() m$simplicial_complex$as_edge_list() ### Visualizing the mapper To get a quick overview of what the graph looks like, you can convert to an igraph object. By default, the vertices are sized logarithmically according to how many points they have in them and colored based on the mean value of the points $$f$$ values. Each vertex is given a label of the form “x:y” where ‘x’ denotes the vertex id, and ‘y’ denotes the number of points in the vertex. plot(m$as_igraph())

More more interactive plotting methods, the mapper can be converted to a grapher object.

## Visualize interactively w/ grapher
library("grapher")
m\$as_grapher(construct_widget = TRUE) %>% center()

## References

1. Singh, Gurjeet, Facundo Mémoli, and Gunnar E. Carlsson. “Topological methods for the analysis of high dimensional data sets and 3d object recognition.” SPBG. 2007.
2. Boissonnat, Jean-Daniel, and Clément Maria. “The simplex tree: An efficient data structure for general simplicial complexes.” Algorithmica 70.3 (2014): 406-427.