• Matt Piekenbrock

  • Graduate Research Assistant
  • M.S. in Computer Science 2018 (PhD intending)
  • B.S. in Computer Science 2015 (minor in statistics)

Summary - Research Interests

I'm currently a graduate research assistant majoring in Computer Science, with a focus (and minor) in Statistics and Statistical Modeling. My graduate research focus is in Machine Learning (ML) and Artificial Intelligence (AI), and my interests are in exploring the intersections between unsupervised learning, statistical learning theory, and empirical analysis. I also enjoy building software in the realm of scientific computing and for reproducible research. Topic areas that interest me include e.g. clustering, dimensionality reduction, topology theory, density estimation, etc. I have supplemental research interests, background knowledge, or experience in random graph modeling, bayesian statistics, computational geometry, reinforcement Learning (such as adversarial learning!) and high performance computing. A few selected research projects are listed below.


Focused Research Areas

I first started doing research part time at the Air Force Institute of Technology with a heavily multidisciplinary team called the Low Orbitals Radar and Electromagnetism (LORE) group in 2013 doing either 1) research for an independent project under supervision of Dr. Andrew Terzuoli or 2) supporting graduate students' research efforts in the group. I worked actively with the group until 2016, after which I maintained an advisory-role until early 2017.

In 2015, I started working for the Machine Learning and Complex Systems Lab as part of a research-based independent study shortly after taking an introductory ML/data analysis course taught by Derek Doran. I received a graduate research assistant position in the same lab shortly after, working towards an M.S. in Computer Science.

Since late 2016, I've been involved in a small group associated with the Air Force Research Laboratory that does applied topological data analysis (TDA) with the Mapper framework. I originally worked for the group under a very part-time status, but since Fall 2018 began doing research there full time.

My computational experience is diverse. Since 2015, I started using the R Project for Statistical Computing for statistical modeling, and I continue to prefer R for research. In my undergraduate years, I used both C++ (primarily C++11) and ANSI-C89/C90 extensively for a myriad of projects (see below). Part of my undergraduate research delved into project involving computational geometry which required a final implementation written in Compute Unified Device Architecture (CUDA) and OpenCL. Of course, I'm proficient in both Python and Java.

Research Highlights

Efficient Multiscale Simplicial Complex Generation for Mapper

(ongoing research effort)

The primary result of the Mapper framework is the geometric realization of a simplicial complex, depicting topological relationships and structures suitable for visualizing, analyzing, and comparing high dimensional data. As an unsupervised tool that may be used for exploring or modeling heterogeneous types of data, Mapper naturally relies on a number of parameters which explicitly control the quality of the resulting construction; one such critical parameter controls the entire relational component of the output complex. In practice, there is little guidance on what values may provide "better" or more "stable" sets of simplices. In this effort, we provide a new algorithm that enables efficient computation of successive mapper realizations with respect to this crucial parameter. Our results not only enhance the exploratory/confirmatory aspect of Mapper, but also give tractability to recent theoretical extensions to Mapper related as persistence and stability.

Automating Point of Interest Discovery in Geospatial Contexts

  (ongoing research effort)

With the rapid development and widespread deployment sensor dedicated to location-acquisition, new types of models have emerged to predict macroscopic patterns that manifest in large data sets representing "significant" group behavior. Partially due to the immense scale of geospatial data, current approaches to discover these macroscopic patterns are primarily driven inherently heuristic detection methods. Although useful in practice, the inductive bias adopted by the detection scheme is generally unstated or simply unknown. In this research effort, we describe a semi-supervised framework for automated point of interest discovery inspired by recent theoretical advances in efficient non-parametric density level set estimation techniques. We outline the flexibility and utility of the approach through numerous examples, and give a systematic framework for incorporating semisupervised information while retaining finite-sample guarantees.

Bringing High Performance Density-based Clustering to R

Density-based clustering techniques have become extremely popular in the past decade. It's often conjectured that the reason for the success of these methods is due to their ability of identify "natural groups" in data. These groups are often non-convex (in terms of shape), deviating the typical premise of 'minimal variance' that underlies parametric, model-based approaches, and often appear in very large data sets. As the era of "Big Data" continues to rise in popularity, it seems that typical notions having access to scalable, easy-to-use, and scalable implementations of these density-based methods is paramount. In this research effort, we provide fast, state-of-the-art density-based algorithms in the form of an open-source package in R. We also provide several related density-based clustering tools to help bring make state of the art density-based clustering accessible to people with large, computationally difficult problems.

Massive Parallel Iterative Closest Point

The Iterative Closest Point (ICP) problem is now a well-studied problem that seeks to align a given query point cloud to a fixed, reference point cloud a pairwise distance minimization. Intuitively, the "brute-force approach" approach is to calculate the pairwise distance from every point in the query set to every point in the reference set, resulting in quadratic runtime complexity. Alternatively, many spatial indexing data structures utilizing branch-and-bound (B&B) properties have been proposed as a means of reducing the algorithmic complexity of the ICP problem. While these structures are certainly useful, many were primarily developed for serial applications: is well known that direct conversion to their parallel equivalents often results in slower runtime performance than GPU-employed brute-force approaches due to the frequent suboptimal memory access patterns and conditional computations these spatial indexing structures often produce. In this application-motivated effort, we propose a novel two-step method which exposes the intrinsic parallelism of the ICP problem. Our solution involves an O(log n) approximate search, followed by fast vectorized search we call the Delaunay Traversal, which we show empirically finishes in O(k) time on average, where k << n. We demonstrate the superiority of our method compared to the traditional B&B and brute-force implementations using a variety of heterogeneous, benchmark data sets. We also show the usefulness of our method in the context of Automated Aerial Refueling by improving the runtime of the well known ICP algorithm.


GPA: 3.83
SchoolDegreeGraduation Year
Wright State UniversityMasters of Science in Computer Science2018
Wright State UniversityBachelor of Science in Computer Science
Minor in Statistics
Courses Taken*
CEG 7900:
Network Science
CS 7830:
Machine Learning
CS 3250:
Computational Tools and Techniques for Data Analysis
STT 7020:
Applied Stochastic Processes
CS 7230:
Information Theory
CS 4850:
Foundations of Artificial Intelligence
STT 3600/3610:
Applied Statistics I & II
STT 4610:
Theoretical Statistics I
CS 7200:
Algorithm Design and Analysis
  *( relevant to field of interest )


Project Reports / Code Samples / Package Contributions

Awards, Extra Curricular, Misc.

Outreach, Volunteer Work