Traditional paper CV available here

I'm currently a graduate research assistant majoring in Computer Science, with a focus (and minor) in Statistics and Statistical Modeling. My graduate research focus is in Machine Learning (ML) and Artificial Intelligence (AI), and my interests are in exploring the intersections between unsupervised learning, statistical learning theory, and empirical analysis. I also enjoy building software in the realm of scientific computing and for reproducible research. Topic areas that interest me include e.g. clustering, dimensionality reduction, topology theory, density estimation, etc. I have supplemental research interests, background knowledge, or experience in random graph modeling, bayesian statistics, computational geometry, reinforcement Learning (such as adversarial learning!) and high performance computing. A few selected research projects are listed below.

- Unsupervised and Semi-supervised Learning
- Density-based clustering
- Techniques for Topological Data Analysis
- Manifold Learning
- Statistical Learning Theory

I first started doing research part time at the Air Force Institute of Technology with a heavily multidisciplinary team called the Low Orbitals Radar and Electromagnetism (LORE) group in 2013 doing either 1) research for an independent project under supervision of Dr. Andrew Terzuoli or 2) supporting graduate students' research efforts in the group. I worked actively with the group until 2016, after which I maintained an advisory-role until early 2017.

In 2015, I started working for the Machine Learning and Complex Systems Lab as part of a research-based independent study shortly after taking an introductory ML/data analysis course taught by Derek Doran. I received a graduate research assistant position in the same lab shortly after, working towards an M.S. in Computer Science.

Since late 2016, I've been involved in a small group associated with the Air Force Research Laboratory that does applied topological data analysis (TDA) with the Mapper framework. I originally worked for the group under a very part-time status, but since Fall 2018 began doing research there full time.

My computational experience is diverse. Since 2015, I started using the R Project for Statistical Computing for statistical modeling, and I continue to prefer R for research. In my undergraduate years, I used both C++ (primarily C++11) and ANSI-C89/C90 extensively for a myriad of projects (see below). Part of my undergraduate research delved into project involving computational geometry which required a final implementation written in Compute Unified Device Architecture (CUDA) and OpenCL. Of course, I'm proficient in both Python and Java.

The primary result of the Mapper framework is the geometric realization of a simplicial complex, depicting topological relationships and structures suitable for visualizing, analyzing, and comparing high dimensional data. As an unsupervised tool that may be used for exploring or modeling heterogeneous types of data, Mapper naturally relies on a number of parameters which explicitly control the quality of the resulting construction; one such critical parameter controls the entire relational component of the output complex. In practice, there is little guidance on what values may provide "better" or more "stable" sets of simplices. In this effort, we provide a new algorithm that enables efficient computation of successive mapper realizations with respect to this crucial parameter. Our results not only enhance the exploratory/confirmatory aspect of Mapper, but also give tractability to recent theoretical extensions to Mapper related as persistence and stability.

With the rapid development and widespread deployment sensor dedicated to location-acquisition, new types of models have emerged to predict macroscopic patterns that manifest in large data sets representing "significant" group behavior. Partially due to the immense scale of geospatial data, current approaches to discover these macroscopic patterns are primarily driven inherently heuristic detection methods. Although useful in practice, the inductive bias adopted by the detection scheme is generally unstated or simply unknown. In this research effort, we describe a semi-supervised framework for automated point of interest discovery inspired by recent theoretical advances in efficient non-parametric density level set estimation techniques. We outline the flexibility and utility of the approach through numerous examples, and give a systematic framework for incorporating semisupervised information while retaining finite-sample guarantees.

Density-based clustering techniques have become extremely popular in the past decade. It's often conjectured that the
reason for the success of these methods is due to their ability of identify "natural groups" in data. These groups
are often non-convex (in terms of shape), deviating the typical premise of 'minimal variance' that underlies
parametric, model-based approaches, and often appear in very large data sets. As the era of "Big Data" continues to rise in popularity, it seems that typical notions
having access to scalable, easy-to-use,
and *scalable* implementations of these density-based methods is paramount. In this research
effort, we provide fast, state-of-the-art density-based algorithms in the form of an open-source package in R.
We also provide several related density-based clustering tools to help bring make state of the art density-based clustering accessible
to people with large, computationally difficult problems.

The Iterative Closest Point (ICP) problem is now a well-studied problem that seeks to align a given query point cloud
to a fixed, reference point cloud a pairwise distance minimization. Intuitively, the "brute-force approach" approach
is to calculate the pairwise distance from every point in the query set to every point in the reference set, resulting in
quadratic runtime complexity.
Alternatively, many spatial indexing data structures utilizing branch-and-bound (B&B) properties
have been proposed as a means of reducing the algorithmic complexity of the ICP problem.
While these structures are certainly useful, many were primarily developed for serial applications:
is well known that direct conversion to their parallel equivalents often results in *slower* runtime performance than
GPU-employed brute-force approaches due to the frequent suboptimal memory access patterns and conditional computations these
spatial indexing structures often produce.
In this application-motivated effort,
we propose a novel two-step method which exposes the intrinsic parallelism of the ICP problem. Our solution involves an O(log n) approximate search,
followed by fast vectorized search we call the Delaunay Traversal, which we show empirically finishes in O(k) time on average, where k << n.
We demonstrate the superiority of our method compared to the traditional B&B and brute-force implementations using
a variety of heterogeneous, benchmark data sets. We also show the usefulness of our method in the context of
Automated Aerial Refueling by improving the runtime of the well known ICP algorithm.

School | Degree | Graduation Year |
---|---|---|

Wright State University | Masters of Science in Computer Science | 2018 |

Wright State University | Bachelor of Science in Computer Science Minor in Statistics | 2015 |

CEG 7900: Network Science | CS 7830: Machine Learning | CS 3250: Computational Tools and Techniques for Data Analysis |

STT 7020: Applied Stochastic Processes | CS 7230: Information Theory | CS 4850: Foundations of Artificial Intelligence |

STT 3600/3610: Applied Statistics I & II | STT 4610: Theoretical Statistics I | CS 7200: Algorithm Design and Analysis |

- M. Piekenbrock and D. Doran. Enabling Multi-Scale Simplical Complex Generation for Mapper. SIAM Journal on Applied Algebra and Geometry, 2018 (intending). [Preliminary draft available]
- [Presentation material available on request]
- M. Piekenbrock, T. Ricks, and S. Arnold. Using surrogate modeling to interpreting PSP linkage relationships. NASA tech report, 2019 (intending).
- Density-based clustering (R/Rcpp)
- Nonparametric Geospatial Point of Interest detection (R/C++)
- Spatio-temporal Social Network Model for spatial data (R)
- Computer Vision Project involving a parallelized Iterative Closest Point (ICP) algorithm (C++/CUDA)
- Parallelization of existing atmospheric absorption routines (MATLAB MEX/OpenCL)
- Modeling web navigation patterns using hierarchical Markov Models (R/MATLAB)
- Web interface to viewing 3D models (WebGL/JavaScript)
- J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December)
- L. Burchett, J. Robinson, M. Piekenbrock, et. al. “Automated aerial refueling: Parallelized 3d iterative closest point,” in IEEE NAECON, 2016, pp. 1–5 (2016)
- Dynamic Geospatial analysis of wide-area motion imagery (R/Python/Java)
- Maurice, Matthew, Matthew Piekenbrock, and Derek Doran. "WAMINet: An Open Source Library for Dynamic Geospace Analysis Using WAMI." Multimedia (ISM), 2015 IEEE International Symposium on. IEEE, 2015.
- Conversion of Nonlinear Optimization algorithm to C89 implementation (MATLAB/C)
- Implementation of unsplittable flow approximation algorithm (C++/Python)
- Search engine/web application for the Ozone Widget Framework (JavaScript/PHP)
- Conversion tool from Oracle’s Abstract Data Type to XMLType in Oracle’s Enterprise DBMS

Studied:

As I read more into theoretical foundations of density-based clustering, my research began to intersect Topology Theory and Manifold Learning. In 2017, I began to research these connections in a minor capacity with a local research group studying the intersection of TDA and machine learning. The research primarily involved understanding the basic foundations of Topology towards extending the Mapper framework, a popular and very general method which has been used successfully for data analysis.

Starting Fall 2018 I was hired full-time to begin enhancing/extending Mapper, and to assist the team in using Mapper on real-world applications. My primary research towards this end has been two-fold: (1) to enable efficient construction of mappers in a multiscale setting, (2) understand the full range of use-cases for the Mapper framework. For more details, see below.

Published:

NOTE: This is paper is still developing, and is made available in spirit of transparent research. Some equations may be incorrect and there may be notational errors.

Related Materials:

Studied:

I was hired by Dr. Steven Arnold under NASAs LERCIP program to apply Machine Learning to a specific Material Science problem. The research project involving creating a surrogate model for the generalized method of cells (GMC) technique using Deep Learning, and then interpreting various aspects of the data produced under the model using a non-parameteric Optimal Experimental Design-motivated optimization procedure recently made possible by the Approximate Coordinate Exchange algorithm.

Related Materials:

NOTE: A technical report and associated journal article is currently being developed to fully capture and report the results of the research project. Preliminary draft versions of both the code and the report(s) are available upon request to U.S. citizens only.

Worked on:

Studied:

My graduate research involved a large, multifaceted project aimed at modeling real-world traffic network networks at a macroscopic scale. The goal of the project is to turn raw positioning/track information into a dynamic network representation, and then model that representation. The project involved researching density-based clustering algorithms, cluster validation measures, non-parametric density estimation techniques, markov chain monte carlo optimization techniques, and random graph modeling (stochastic block models).

Worked on:

I submitted a successful funding proposal under the Google Summer of Code (GSOC) Initiative to the R Project for Statistical Computing to explore, develop, and unify recent developments related the theory of density-based clustering. This involved a mixture of code development which culminated in the form of an R package, as well as deep research to further understand the theory and utility of the cluster tree. There was also a WSU newsroom piece that describes the proposal in a non-technical way.

Related Materials:

Worked on:

Studied:

Being a heavily multi-disciplinary team, I worked on several exploratory or educational projects involving computational, statistical, and physics-based problems. Much of the work involved assisting the Air Force graduate students with their research work. In that time, I studied topics including branch-and-bound spatial indexing data structures (kd-trees, cover trees, locality sensitive hashing), the ICP problem, finite mixture modeling, markov chain modeling, and general parameter estimation techniques (EM/MAP estimation).

Published:

Worked on:

Studied:

Various random graph models such as Erdős–Rényi models and Exponential Random Graph Models (ERGMs), entropy measures over networks, density-based clustering techniques (DBSCAN and OPTICS), non-parametric time-varying regression models (ARMA + ARIMA models)

Published:

Related Materials:

Worked on:

Studied:

Gauss–Newton Method, approximation algorithms for unsplittable flow problems, graph theory (by extension), relational (Oracle/PostGreSQL/SQLite) and document-based database interaction (MongoDB), Natural language processing techniques for SEO (PageRank), asynchronous vs. synchronous client-server communication strategies with AJAX and NodeJS/PHP servers, XML Schema and XML Technologies [Xlink, XPath, etc.]

- dbscan R Package
- clustertree R Package
- Mapper R Package
- TDA + Manifold Learning Presentation(PDF) Presentation I gave to the Data Science and Security Cluster Group
- Clustering Presentation(GIF Animations)(Video) (PDF) Presentation I gave to the Data Science and Security Cluster Group and the Kno.e.sis research group
- Machine Learning Project: Bayesian Linear Regression(PDF)
- Introductory tutorial on Markov Chain Monte Carlo Basics(Presentation)(PDF)
- Notes on Bayesian Network Training Basics(PDF)

- Outstanding Masters student Award (Computer Science)Wright State College of Engineering and Computer Science
- Student participant and presenter (poster)TGDA NSF TRIPODS Workshop and Summer School
- Outstanding Position Paper AwardNational Model United Nations Annual Conference ('14)
- Outstanding Delegation AwardNational Model United Nations Annual Conference ('13)
- Honorable Mention AwardRegional Model United Nations Annual Conference ('13 and '14)

- (PDF) Invited Speaker (local High School outreach)Presentation I gave as part of the CJ STEMM Idol Series
- Volunteer Staff PositionRegional Model United Nations Annual Conference ('16 and '17)
- Developer and Maintainer of:daymunc.org