Uses the heuristic described in Section 3.1 of the original Mapper paper to cut a given hierarchical tree to produce a partitioning of the data.

cutoff_first_bin(hcl, num_bins, check_skew = TRUE)

Arguments

hcl

hierarchical clustering in the form an 'hclust' object.

num_bins

Controls how many bins there are in the histogram used to determine cutoff.

check_skew

Whether to check if the distribution is left-skewed. See details.

Value

The cut-off value to use with cutree.

Details

This method implements a cutting heuristic to determine the cut height in the given hierarchical tree hcl. The cut value is chosen as the lowest break point corresponding to the first empty bin of a histogram of the linkage distances. The motivation for the heuristic is that its often observed empirically for 'nice' clustering situations that the majority of linkage distances representing inter-cluster distances are relatively smooth, whereas the intra-cluster distances are sparse. Intuitively, such a distribution may be thought as being right-skewed, and the first empty interval in a histogram of linkage distances may be a decent splot to cut the hierarchy. If check_skew is TRUE (default), the linkage distances are checked that they are indeed right-skewed, and then the heuristic is used. If the distribution is left-skewed, the assumption for the heuristic is not true, and the trivial clustering is returned instead.