#### Matt Piekenbrock

I'm currently a graduate student at NEU's Khoury College of Computer Sciences, advised by Jose Perea.

A more traditional paper CV is available here

## Programming Experience

My computational experience is diverse. My university coursework typically required using Java, C++, or Matlab (10-15’). I used C++98 or ANSI-C extensively for the AFIT-affiliated projects, occasionally writing high level scripts in Python or Matlab (+MEX) (13-15’). I used either the R project (+Rcpp) or Python (+Cython) for the majority of the projects I was involved in, preferring the former (15-19’). Since 2020, interfacing Python with modern C++ FFIs (e.g. pybind11) has been my primary development workflow.

## Projects

#### Move Schedules: Fast persistence computations in coarse dynamic settings

Persistence diagrams are known to vary continuously with respect to their input, motivating the study of their computation for time-varying filtered complexes. Computationally, simulating persistence dynamically can be reduced to maintaining a valid decomposition under adjacent transpositions in the filtration order. Since there are quadratically many such transpositions, this maintenance procedure exhibits limited scalability and often is too fine for many applications. We propose a coarser strategy for maintaining the decomposition over a 1-parameter family of filtrations that requires only subquadratic time and linear space to construct.

#### Publications

- Piekenbrock, Matthew, and Jose A. Perea. “Move Schedules: Fast persistence computations in coarse dynamic settings.” arXiv preprint arXiv:2104.12285 (2021).

#### Efficient Multiscale Simplicial Complex Generation for Mapper

The primary result of the Mapper framework is the geometric realization of a simplicial complex, depicting topological relationships and structures suitable for visualizing, analyzing, and comparing high dimensional data...

#### Software

- Mapper R Package
- simplextree R Package
- Vignette on using mapper

#### Automating Point of Interest Discovery in Geospatial Contexts

With the rapid development and widespread deployment of sensors dedicated to location-acquisition, new types of models have emerged to predict macroscopic patterns that manifest in large data sets representing "significant" group behavior. Partially due to the immense scale of geospatial data, current approaches to discover these macroscopic patterns are primarily driven by inherently heuristic detection methods. Although useful in practice, the inductive bias adopted by such mainstream detection schemes is often unstated or simply unknown. Inspired by recent theoretical advances in efficient non-parametric density level set estimation techniques, in this research effort we describe a semi-supervised framework for automating point of interest discovery in geospatial contexts. We outline the flexibility and utility of our approach through numerous examples, and give a systematic framework for incorporating semisupervised information while retaining finite-sample estimation guarantees.

#### Bringing High Performance Density-based Clustering to R

Density-based clustering techniques have become extremely popular in the past decade. It's often conjectured that the reason for the success of these methods is due to their ability of identify 'natural groups' in data. These groups are often non-convex (in terms of shape), deviating the typical premise of 'minimal variance' that underlies parametric, model-based approaches, and often appear in very large data sets. As the era of 'Big Data' continues to rise in popularity, it seems that typical notions having access to scalable, easy-to-use, and scalable implementations of these density-based methods is paramount. In this research effort, we provide fast, state-of-the-art density-based algorithms in the form of an open-source package in R. We also provide several related density-based clustering tools to help bring make state of the art density-based clustering accessible to people with large, computationally difficult problems.

#### Publications

- Hahsler, Michael, Matthew Piekenbrock, and Derek Doran. “dbscan: Fast Density Based Clustering in R”, Journal of Statistical Software, 2018.

#### Software

- dbscan R Package
- Vignette on using HDBSCAN

#### Towards Autonomous Aerial Refueling: Massive Parallel Iterative Closest Point

The Iterative Closest Point (ICP) problem is now a well-studied problem that seeks to align a given query point cloud to a fixed reference point cloud. The ICP problem computationally is dominated by the first phase, a pairwise distance minimization. The ''brute-force'' approach, an embarrassingly parallel problem amenable to GPU-acceleration..

#### Publications

- J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December)
- Piekenbrock, M., Robinson, J., Burchett, L., Nykl, S., Woolley, B., & Terzuoli, A. (2016, July). Automated aerial refueling: Parallelized 3D iterative closest point: Subject area: Guidance and control. In Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016 IEEE National (pp. 188-192). IEEE.

## Employment

#### SCaN Intern

#### National Aeronautics and Space Administration

#### Summer 2022

#### John H. Glenn Research Center at Lewis Field, OH

Space networkingGraph TheoryFlow algorithmsTowards enabling delay-tolerant satellite communications in uncertain space environments, I was re-hired back at NASA as part of the Space Communications and Navigation (SCaN) program to expand the algorithmic theory on time-dependent routing.

#### Job Description

Towards enabling delay-tolerant satellite communications in uncertain space environments, I was re-hired back at NASA as part of the Space Communications and Navigation (SCaN) program to expand the algorithmic theory on time-dependent routing. My research focused on incorporating additional geometric assumptions into routing models built for of delay- and disruption-tolerant networks, particularly in the low Earth orbit regime.#### Graduate Research Assistant

#### Perea Lab

#### Fall 2019 - Present

#### MSU/NEU

Topological Data AnalysisLinear AlgebraMachine LearningMotivated by my previous work on the foundations of density-based clustering, I focused on implementing and extending the Mapper algorithm, a popular and very general method which has been used successfully for data analysis.

Though I began my doctoral research at Michigan State University in Fall 2019, I transferred to Northeastern University in the Fall of 2021 after my advisor (Jose Perea) accepted a joint appointment offer to transfer to Khoury College of Computer Sciences in Boston, MA.

My doctoral research focused on applications of topological theory to various common machine learning applications. In particular, much of my time was spent on accelerating the persistence algorithm in time-varying settings, codeveloping a topological dimensionality reduction using fiber bundle theory, and on studying a spectral-relaxations of the persistent rank invariant.

#### LERCIP Intern

#### National Aeronautics and Space Administration

#### Summer 2018

#### John H. Glenn Research Center at Lewis Field, OH

Experimental designMachine learningMaterial scienceTowards accelerating the design and discovery materials for use in extreme environments, I was hired by Dr. Steven Arnold under NASAs 10-week LERCIP program to apply Machine Learning to a specific Material Science problem.

I was hired by Dr. Steven Arnold under NASAs 10-week LERCIP program to apply Machine Learning to a specific Material Science problem. The first phase of the research project involving training a fairly trivial feed-forward Artificial Neural Network (ANN) to act a surrogate model for the Generalized Method of Cells (GMC) technique. The second (non-trivial) phase of the project involved creating a systematic procedure for interpreting various aspects of the data produced by the surrogate model using a non-parameteric Optimal Experimental Design (OED)-motivated optimization procedure, recently made possible by the Approximate Coordinate Exchange algorithm.

#### Graduate Research Assistant

#### Web and Complex Systems Group

#### Spring 2016 - Fall 2018

#### Wright State University

ClusteringNetwork analysisMachine LearningMotivated by my previous work on the foundations of density-based clustering, I focused on implementing and extending the Mapper algorithm, a popular and very general method which has been used successfully for data analysis.

My graduate research involved a large, multifaceted project aimed at modeling real-world traffic network networks at a macroscopic scale. The high-level goal of the project was to model dynamic network representations extracted from raw positioning/track information via random (distributional) network models. On the software side, the project involved:

- Density-based clustering (R/Rcpp)
- Geospatial Point of Interest (POI) detection / Nonparameteric distribution modeling (R/C++)
- Spatio-temporal network models (R)

Research topics involved during this time include density-based clustering algorithms, cluster validation measures, non-parametric density estimation techniques, Markov Chain Monte Carlo (MCMC) optimization techniques, and random graph modeling (stochastic block models).

#### Research Associate

#### Oak Ridge Institute for Science and Education

#### Fall 2017, Fall 2018 - Fall 2019

#### Air Force Research Laboratory, WPAFB

TopologyMapperR packageMotivated by my previous work on the foundations of density-based clustering, I focused on implementing and extending the Mapper algorithm, a popular and very general method which has been used successfully for data analysis.

In 2017, I joined with a local research group under Dr. Ryan Kramer as part of AFRL’s Human Performance Wing to explore and expand the intersection between algorithms in TDA and machine learning. During my time there, I focused on implementing and extending the

*Mapper*algorithm, a topological method that reframes common data analysis tasks as problems of analyzing level sets on topological spaces. An expository article explaining Mapper and its applications is available here.

I was hired full-time in Fall 2018 to assist the team in using

*Mapper*on various real-world applications, such as video segmentation, image analysis. My research was centered around enabling the efficient construction of mappers in multiscale settings and on understanding the connections the Mapper algorithm had to other existing constructions, such as Reeb graphs, nerve complexes, and hierarchical clustering.

#### Software

- Mapper
- Simplextree (R Package)

#### Student Participant

#### Google Summer of Code

#### Summer 2017

#### R Project for Statistical Computing

ClusteringLearning theoryR packageTowards unifying recent developments related the theory and utility of density-based clustering, this project involved a mixture of research and code development which culminated in the form of an R package for estimating the empirical cluster tree.

I submitted a successful funding proposal under the Google Summer of Code (GSOC) Initiative to the R Project for Statistical Computing to explore, develop, and unify recent developments related the theory of density-based clustering (see the project page). This involved a mixture of code development which culminated in the form of an R package, as well as deep research to further understand the theory and utility of

*the cluster tree*, a hierarchical summary of the level-sets of a density function. There was also a WSU newsroom piece that describes the proposal in a non-technical way.#### Civilian Research Assistant

#### Southwestern Ohio Council For Higher Education

#### December 2013 - June 2014

#### Air Force Institute of Technology, WPAFB

OptimizationGraph theoryFlow algorithmsAs my first experience doing undergraduate research, I worked in a heavily multi-disciplinary team called the Low Orbitals Radar and Electromagnetism group, where I worked on a diverse set of projects involving computational, statistical, or physics-based requirements

#### Job Description:

Under the guidance of Dr. Andrew Terzuoli, I was hired at the Air Force Institute of Technology (AFIT) as an undergraduate student to do research in a multi-disciplinary team called the Low Orbitals Radar and Electromagnetism (LOREnet) group, where I worked on a diverse set of projects involving computational, statistical, or physics-based requirements. Being my first research-oriented experience, I either assisted graduate students with primarily programmatic or educational tasks or worked on very computationally-oriented tasks.

#### Projects:

Example projects included, but were not limited too:

- Implementing an unsplittable flow approximation algorithm (C++ and Python)
- Creating a conversion tool between Oracle’s Abstract Data Type and XMLType (Java)
- Developing a prototype UI to enhance searching and viewing of 2d/3d models (JavaScript+three.js)

## Education

#### Doctorate in CS (Pursuing)

#### Khoury College of Computer Sciences

#### Northeastern University, 2021-Present

#### Advisor: Jose Perea

Click for teaching experience, coursework taken, and other details...#### Teaching:

- Graduate teaching assistant - Data Mining Techniques (CS 6220 / DS 5230), Summer 2023
- Graduate teaching assistant - Machine Learning (CS 6140/4420), Spring 2023
- Graduate teaching assistant - Unsupervised Learning (CS 6220 / DS 5230), Fall 2022

#### Coursework (GPA: 3.83):

- Formal Verification, Modeling, & Synthesis
- Network Visualization

#### Doctorate in CMSE (Transferred)

#### Computational Mathematics, Science and Engineering

#### Michigan State University, 2019-2021

#### Advisor: Jose Perea

Click for teaching experience, coursework taken, and other details...#### Coursework (GPA: 3.83):

- Numerical Linear Algebra (CMSE 823)
- Numerical Differential Equations (CMSE 821)
- Math Foundations of Data Science (CMSE 890)
- Topological Methods for the Analysis of Data (CMSE 890)
- Parallel Computing (CMSE 822)
- Geometry and Topology II (MTH 869)
- Mathematical foundations of analysis (CMSE 890)
- Algebra I (MTH 818)

#### Masters of Science in CS

#### College of Engineering and Computer Science

#### Wright State University, 2015-2018

#### Advisor: Derek Doran

Click for teaching experience, coursework taken, and other details...#### Coursework (GPA: 3.88):

- Network Science
- Machine Learning
- Information Theory
- Applied Stochastic Processes
- Algorithm Design and Analysis
- Empirical Analysis
- Advanced Programming Languages
- Distributed Computing

#### Bachelor of Science in CS (+STT)

#### College of Engineering and Computer Science

#### Wright State University, 2010-2015

Click for teaching experience, coursework taken, and other details...#### Coursework (GPA: 3.42, in-major):

- Applied Statistics I & II
- Optimization Techniques
- Foundations of AI
- Computational Tools for Data Analysis
- Theoretical Statistics
- Linear Algebra