• Matt Piekenbrock

  • Graduate Research Assistant
  • M.S. in Computer Science 2018 (PhD intending)
  • B.S. in Computer Science 2015 (minor in statistics)

Summary - Research Interests

I'm currently a research associate with a M.S. in Computer Science, with a focus (and minor) in Statistics and Statistical Modeling. My graduate research focused on Machine Learning (ML) and Artificial Intelligence (AI); my interests are in exploring the intersections between unsupervised learning, statistical learning theory, and empirical analysis. I also enjoy building and contributing software in the realm of scientific computing, and for reproducible research purposes. Topic areas that interest me include e.g. clustering, dimensionality reduction, topology theory, manifold learning, density estimation, etc. I have supplemental research interests, background knowledge, or experience in random graph modeling, bayesian statistics, computational geometry, reinforcement learning (such as adversarial learning!) and high performance computing. A few selected research projects are listed below.


Focused Research Areas

I first started doing research part time at the Air Force Institute of Technology with a heavily multidisciplinary team called the Low Orbitals Radar and Electromagnetism (LORE) group in 2013 doing either 1) research for an independent project under supervision of Dr. Andrew Terzuoli or 2) supporting graduate students' research efforts in the group. I worked actively with the group until 2016, after which I maintained an advisory-role until early 2017.

In 2015, I started working for the Machine Learning and Complex Systems Lab as part of a research-based independent study shortly after taking an introductory ML/data analysis course taught by Derek Doran. I received a graduate research assistant position in the same lab shortly after, working towards an M.S. in Computer Science.

Since late 2016, I've been involved in a small group associated with the Air Force Research Laboratory that does applied topological data analysis (TDA) with the Mapper framework. I originally worked for the group under a very part-time status, but since Fall 2018 began doing research there full time.

My computational experience is diverse. Since 2015, I started using the R Project for Statistical Computing for statistical modeling, and I continue to prefer R for research. In my undergraduate years, I used both C++ (primarily C++11) and ANSI-C89/C90 extensively for a myriad of projects (see below). Part of my undergraduate research delved into project involving computational geometry which required a final implementation written in Compute Unified Device Architecture (CUDA) and OpenCL. Of course, I'm proficient in both Python and Java.

Research Highlights

Efficient Multiscale Simplicial Complex Generation for Mapper

(ongoing research effort)

The primary result of the Mapper framework is the geometric realization of a simplicial complex, depicting topological relationships and structures suitable for visualizing, analyzing, and comparing high dimensional data. As an unsupervised tool that may be used for exploring or modeling heterogeneous types of data, Mapper naturally relies on a number of parameters which explicitly control the quality of the resulting construction; one such critical parameter controls the entire relational component of the output complex. In practice, there is little guidance on what values may provide "better" or more "stable" sets of simplices. In this effort, we provide a new algorithm that enables efficient computation of successive mapper realizations with respect to this crucial parameter. Our results not only enhances the exploratory/confirmatory aspect of Mapper, but also give tractability to recent theoretical extensions to Mapper related to persistence and stability.

Automating Point of Interest Discovery in Geospatial Contexts

  (ongoing research effort)

With the rapid development and widespread deployment of sensors dedicated to location-acquisition, new types of models have emerged to predict macroscopic patterns that manifest in large data sets representing "significant" group behavior. Partially due to the immense scale of geospatial data, current approaches to discover these macroscopic patterns are primarily driven by inherently heuristic detection methods. Although useful in practice, the inductive bias adopted by such mainstream detection schemes is often unstated or simply unknown. Inspired by recent theoretical advances in efficient non-parametric density level set estimation techniques, in this research effort we describe a semi-supervised framework for automating point of interest discovery in geospatial contexts. We outline the flexibility and utility of our approach through numerous examples, and give a systematic framework for incorporating semisupervised information while retaining finite-sample estimation guarantees.

Bringing High Performance Density-based Clustering to R

Density-based clustering techniques have become extremely popular in the past decade. It's often conjectured that the reason for the success of these methods is due to their ability of identify "natural groups" in data. These groups are often non-convex (in terms of shape), deviating the typical premise of 'minimal variance' that underlies parametric, model-based approaches, and often appear in very large data sets. As the era of "Big Data" continues to rise in popularity, it seems that typical notions having access to scalable, easy-to-use, and scalable implementations of these density-based methods is paramount. In this research effort, we provide fast, state-of-the-art density-based algorithms in the form of an open-source package in R. We also provide several related density-based clustering tools to help bring make state of the art density-based clustering accessible to people with large, computationally difficult problems.

Towards Autonomous Aerial Refueling: Massive Parallel Iterative Closest Point

The Iterative Closest Point (ICP) problem is now a well-studied problem that seeks to align a given query point cloud to a fixed reference point cloud. The ICP problem computationally is dominated by the first phase, a pairwise distance minimization. The "brute-force" approach, an embarrassingly parallel problem amenable to GPU-acceleration, involves calculating the pairwise distance from every point in the query set to every point in the reference set. This however still requires linear runtime complexity per thread, rendering the trivial solution unsuitable for e.g. real-time applications. Alternative spatial indexing data structures utilizing branch-and-bound (B&B) properties have been proposed as a means of reducing the algorithmic complexity of the ICP problem, however they were originally developed for serial applications: it is well known that direct conversion to their parallel equivalents often results in slower runtime performance than GPU-employed brute-force approaches due to frequent suboptimal memory access patterns and conditional computations. In this application-motivated effort, we propose a novel two-step method which exposes the intrinsic parallelism of the ICP problem, yet retains a number of the B&B properties. Our solution involves an O(log n) approximate search, followed by fast vectorized search we call the Delaunay Traversal, which we show empirically finishes in O(k) time on average, where k << n, and is demonstrated to generally exhibit extremely small growth factors on average. We demonstrate the superiority of our method compared to the traditional B&B and brute-force implementations using a variety of benchmark data sets, and demonstrate its usefulness in the context of Autonomous Aerial Refueling.


GPA: 3.88
SchoolDegreeGraduation Year
Wright State UniversityMasters of Science in Computer Science2018
Wright State UniversityBachelor of Science in Computer Science
Minor in Statistics
Courses Taken*
CEG 7900:
Network Science
CS 7830:
Machine Learning
CS 3250:
Computational Tools and Techniques for Data Analysis
STT 7020:
Applied Stochastic Processes
CS 7230:
Information Theory
CS 4850:
Foundations of Artificial Intelligence
STT 3600/3610:
Applied Statistics I & II
STT 4610:
Theoretical Statistics I
CS 7200:
Algorithm Design and Analysis
  *( relevant to field of interest )


Project Reports / Code Samples / Package Contributions

Awards, Extra Curricular, Misc.

Outreach, Volunteer Work