#### Matt Piekenbrock

I'm currently a graduate student at NEU's Khoury College of Computer Sciences, advised by Jose Perea.

A more traditional paper CV is available here

## Programming Experience

My computational experience is diverse. My university coursework required using Java, C++, or Matlab (10-15’). I used C++98 or ANSI-C extensively for the AFIT-affiliated projects, occasionally writing high level scripts in Python or Matlab (+MEX) (13-15’). I used either the R project (+Rcpp) or Python (+Cython) for the majority of the projects I was involved in, preferring the former (15-19’). Since 2020, interfacing Python with modern C++ FFIs (e.g. pybind11) has been my primary development workflow.

## Projects

#### Spectral relaxations of persistent rank invariants

We introduce a framework for constructing families of continuous relaxations of the persistent rank invariant for persistence modules indexed over the real line. Applications to multi-parameter persistence, parameter optimization, and shape classification are also presented.

#### Publications

- Piekenbrock, Matthew, and Jose A. Perea. “Spectral families of persistent rank invariants.” Computational Persistence Workshop (2023). (link)

#### Software

- pbsig unorganized code containing the experiments
- primate package code for implicit matrix trace estimation
- simplextree (python package)

#### Related materials

- YouTube link for CompPers23 talk

#### Move Schedules: Fast persistence computations in coarse dynamic settings

Persistence diagrams are known to vary continuously with respect to their input, motivating the study of their computation for time-varying filtered complexes. Computationally, simulating persistence dynamically can be reduced to maintaining a valid decomposition under adjacent transpositions in the filtration order. Since there are quadratically many such transpositions, this maintenance procedure exhibits limited scalability and often is too fine for many applications. We propose a coarser strategy for maintaining the decomposition over a 1-parameter family of filtrations that requires only subquadratic time and linear space to construct.

#### Publications

- Piekenbrock, Matthew, and Jose A. Perea. “Move Schedules: Fast persistence computations in coarse dynamic settings.” arXiv preprint arXiv:2104.12285 (2021).

#### Software

- Move scheduling code
- simplextree (python package)

#### Efficient Multiscale Simplicial Complex Generation for Mapper

The primary result of the Mapper framework is the geometric realization of a simplicial complex, depicting topological relationships and structures suitable for visualizing, analyzing, and comparing high dimensional data...

#### Publications

- Multiscale Mapper paper (never finished!)

#### Software

- Mapper (R Package)
- simplextree (R Package)
- Vignette on using Mapper for shape recognition

#### Automating Point of Interest Discovery in Geospatial Contexts

With the rapid development and widespread deployment of sensors dedicated to location-acquisition, new types of models have emerged to predict macroscopic patterns that manifest in large data sets representing "significant" group behavior. Partially due to the immense scale of geospatial data, current approaches to discover these macroscopic patterns are primarily driven by inherently heuristic detection methods. Although useful in practice, the inductive bias adopted by such mainstream detection schemes is often unstated or simply unknown. Inspired by recent theoretical advances in efficient non-parametric density level set estimation techniques, in this research effort we describe a semi-supervised framework for automating point of interest discovery in geospatial contexts. We outline the flexibility and utility of our approach through numerous examples, and give a systematic framework for incorporating semisupervised information while retaining finite-sample estimation guarantees.

Publications

- Piekenbrock, Matthew J. Discovering Intrinsic Points of Interest from Spatial Trajectory Data Sources. Masters thesis. Wright State University, 2018. link
- Piekenbrock, Matthew, and Derek Doran. “Intrinsic point of interest discovery from trajectory data.” arXiv:1712.05247 (2017) (doi: https://doi.org/10.48550/arXiv.1712.05247)
- Matthew Maurice, Matt Piekenbrock, and Derek Doran. Waminet: An open source library for dynamic geospace analysis using WAMI. In IEEE International Symposium on Multimedia, pages 445–448. IEEE, 2015.

#### Bringing High Performance Density-based Clustering to R

Density-based clustering techniques have become extremely popular in the past decade. It's often conjectured that the reason for the success of these methods is due to their ability of identify 'natural groups' in data. These groups are often non-convex (in terms of shape), deviating the typical premise of 'minimal variance' that underlies parametric, model-based approaches, and often appear in very large data sets. As the era of 'Big Data' continues to rise in popularity, it seems that typical notions having access to scalable, easy-to-use, and scalable implementations of these density-based methods is paramount. In this research effort, we provide fast, state-of-the-art density-based algorithms in the form of an open-source package in R. We also provide several related density-based clustering tools to help bring make state of the art density-based clustering accessible to people with large, computationally difficult problems.

Publications

- Hahsler, Michael, Matthew Piekenbrock, and Derek Doran. “dbscan: Fast Density Based Clustering in R”, Journal of Statistical Software, 2018. (https://doi.org/10.18637/jss.v091.i01)

Software

- dbscan (R Package)
- Vignette on using HDBSCAN

#### Towards Autonomous Aerial Refueling: Massive Parallel Iterative Closest Point

The Iterative Closest Point (ICP) problem is now a well-studied problem that seeks to align a given query point cloud to a fixed reference point cloud. The ICP problem computationally is dominated by the first phase, a pairwise distance minimization. The ''brute-force'' approach, an embarrassingly parallel problem amenable to GPU-acceleration..

#### Publications

- J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December) (doi: 10.1007/978-3-319-50835-1_53)
- Piekenbrock, M., Robinson, J., Burchett, L., Nykl, S., Woolley, B., & Terzuoli, A. (2016, July). Automated aerial refueling: Parallelized 3D iterative closest point: Subject area: Guidance and control. In Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016 IEEE National (pp. 188-192). IEEE. (doi: 10.1109/NAECON.2016.7856797)

## Employment

#### Graduate Research Assistant

#### Perea Lab

#### Fall 2019 - Present

#### MSU/NEU

Topological Data AnalysisLinear AlgebraMachine LearningMotivated by my previous work on the foundations of density-based clustering, I focused on implementing and extending the Mapper algorithm, a popular and very general method which has been used successfully for data analysis.

After beginning my doctoral research at Michigan State University (Fall 19’), I transferred to Northeastern University (Boston, MA) in Fall 2021 after my advisor (Jose Perea) accepted a joint appointment offer at the Khoury College of Computer Sciences.

My doctoral research focused on applications of topological theory to various common machine learning applications. In particular, much of my time was spent on accelerating the persistence algorithm in time-varying settings, codeveloping a topological dimensionality reduction using fiber bundle theory, and on studying a spectral-relaxations of the persistent rank invariant.

#### SCaN Intern

#### National Aeronautics and Space Administration

#### Summer 2022

#### John H. Glenn Research Center at Lewis Field, OH

Space networkingGraph TheoryFlow algorithmsTowards enabling delay-tolerant satellite communications in uncertain space environments, I was re-hired back at NASA as part of the Space Communications and Navigation (SCaN) program to expand the algorithmic theory on time-dependent routing.

#### Job Description

Towards enabling delay-tolerant satellite communications in uncertain space environments, I was re-hired back at NASA as part of the Space Communications and Navigation (SCaN) program to expand the algorithmic theory on time-dependent routing. My research focused on incorporating additional geometric assumptions into routing models built for of delay- and disruption-tolerant networks, particularly in the low Earth orbit regime.#### Research Associate

#### Oak Ridge Institute for Science and Education

#### Fall 2017, Fall 2018 - Fall 2019

#### Air Force Research Laboratory, WPAFB

TopologyMapperR packageMotivated by my previous work on the foundations of density-based clustering, I focused on implementing and extending the Mapper algorithm, a popular and very general method which has been used successfully for data analysis.

#### Job Description

In 2017, I joined with a local research group under Dr. Ryan Kramer as part of AFRL’s Human Performance Wing to explore and expand the intersection between algorithms in TDA and machine learning. My time there was focused on implementing and extending the

*Mapper*algorithm, a topological method that reframes common data analysis tasks as problems of analyzing level sets on topological spaces (introduction available here).

My research was centered around enabling the efficient construction of mappers in multiscale settings and on understanding the connections the Mapper algorithm had to other existing constructions, such as Reeb graphs, nerve complexes, and hierarchical clustering. Beyond learning the theory and developing the software, I also applied

*Mapper*to various geospatial and image analysis tasks, such as video segmentation, object recognition, and clustering.#### Software

- Mapper (R Package)
- simplextree (R Package)
- Vignette on using Mapper for shape recognition

#### Graduate Research Assistant

#### Web and Complex Systems Group

#### Spring 2016 - Fall 2018

#### Wright State University

ClusteringNetwork analysisMachine LearningMotivated by my previous work on the foundations of density-based clustering, I focused on implementing and extending the Mapper algorithm, a popular and very general method which has been used successfully for data analysis.

My graduate research aimed at modeling real-world traffic network networks at a macroscopic scale. The high-level goal of the project was to model dynamic network representations extracted from raw positioning/track information via random (distributional) network models. On the software side, the project involved:

- Density-based clustering (R/Rcpp)
- Geospatial Point of Interest (POI) detection / Nonparameteric distribution modeling (R/C++)
- Spatio-temporal network models (R)

Research topics involved during this time include density-based clustering algorithms, cluster validation measures, non-parametric density estimation techniques, Markov Chain Monte Carlo (MCMC) optimization techniques, and random graph modeling (stochastic block models).

#### LERCIP Intern

#### National Aeronautics and Space Administration

#### Summer 2018

#### John H. Glenn Research Center at Lewis Field, OH

Experimental designMachine learningMaterial scienceTowards accelerating the design and discovery materials for use in extreme environments, I was hired by Dr. Steven Arnold under NASAs 10-week LERCIP program to apply Machine Learning to a specific Material Science problem.

#### Job Description:

I was hired by Dr. Steven Arnold under NASAs 10-week LERCIP program to use machine learning to accelerate the simulation-based design of materials and structures through multiscale modeling, in line with NASA’s 2040 vision.

#### Publications:

- Stuckner, J., Piekenbrock, M., Arnold, S. M., & Ricks, T. M. (2021). Optimal experimental design with fast neural network surrogate models. Computational Materials Science, 200, 110747.
- Arnold, S. M., Piekenbrock, M., Ricks, T. M., & Stuckner, J. (2020). Multiscale analysis of composites using surrogate modeling and information optimal designs. In AIAA Scitech 2020 Forum (p. 1863).

#### Software

#### Student Participant

#### Google Summer of Code

#### Summer 2017

#### R Project for Statistical Computing

ClusteringLearning theoryR packageTowards unifying recent developments related the theory and utility of density-based clustering, this project involved a mixture of research and code development which culminated in the form of an R package for estimating the empirical cluster tree.

I submitted a successful funding proposal under the Google Summer of Code (GSOC) Initiative to the R Project for Statistical Computing to explore, develop, and unify recent developments related the theory of density-based clustering (see the project page). This involved a mixture of research and code development which culminated in the form of an R package for estimating

*the cluster tree*, a hierarchical summary of the level-sets of a density function. There was also a WSU newsroom piece that describes the proposal in a non-technical way.#### Civilian Research Assistant

#### Oak Ridge Institute for Science and Education

#### Spring 2014 - Spring 2015

#### Air Force Institute of Technology, WPAFB

OptimizationGraph theoryFlow algorithmsTowards the end of my undergraduate degree, my contract at the [Air Force Institute of Technology](https://www.afit.edu/) (AFIT) was extended under ORISE, where I continued working with the LOREnet group under Dr. Andrew Terzuoli

#### Job Description

Towards the end of my undergraduate degree, my contract at the Air Force Institute of Technology (AFIT) was extended under ORISE, where I continued working with the LOREnet group under Dr. Andrew Terzuoli. During this time I primarily worked with Dr. Scott Nykl on the development of a novel Iterative Closest Point algorithm amenable to massive parallelization, implemented in C++/CUDA, for the purposes of enabling real-time tracking of aircraft in the context of Autonomous Aerial Refueling.

#### Publications

- J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December) (doi: 10.1007/978-3-319-50835-1_53)
- Piekenbrock, M., Robinson, J., Burchett, L., Nykl, S., Woolley, B., & Terzuoli, A. (2016, July). Automated aerial refueling: Parallelized 3D iterative closest point: Subject area: Guidance and control. In Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016 IEEE National (pp. 188-192). IEEE. (doi: 10.1109/NAECON.2016.7856797)

#### Projects

Example projects included, but were not limited too:

- Researching hierarchical markov model for predicting web navigation patterns
- Parallelizing existing atmospheric absorption routines with OpenCL
- Coding a nonlinear optimization algorithm in ANSI-C, and making it callable from MATLAB via MEX

#### Civilian Research Assistant

#### Southwestern Ohio Council For Higher Education

#### December 2013 - June 2014

#### Air Force Institute of Technology, WPAFB

OptimizationGraph theoryFlow algorithmsAs my first experience doing undergraduate research, I worked in a heavily multi-disciplinary team called the Low Orbitals Radar and Electromagnetism group, where I worked on a diverse set of projects involving computational, statistical, or physics-based requirements

#### Job Description

Under the guidance of Dr. Andrew Terzuoli, I was hired at the Air Force Institute of Technology (AFIT) to do research in a multi-disciplinary team called the Low Orbitals Radar and Electromagnetism (LOREnet) group, where I worked on a diverse set of projects involving computational, statistical, or physics-based requirements. As my first research-oriented experience, I primarily assisted graduate students with programmatic or computationally-intensive tasks.

#### Projects

Example projects included, but were not limited too:

- Implementing an unsplittable flow approximation algorithm (C++ and Python)
- Creating a conversion tool between Oracle’s Abstract Data Type and XMLType (Java)
- Developing a prototype UI for searching and viewing 3d models (JavaScript+three.js)

## Education

#### Doctorate in CS (Pursuing)

#### Khoury College of Computer Sciences

#### Northeastern University, 2021-Present

#### Advisor: Jose Perea

Click for teaching experience, coursework taken, and other details...#### Teaching:

- Graduate teaching assistant - Data Mining Techniques (CS 6220 / DS 5230), Summer 2023
- Graduate teaching assistant - Machine Learning (CS 6140/4420), Spring 2023
- Graduate teaching assistant - Unsupervised Learning (CS 6220 / DS 5230), Fall 2022

#### Coursework (GPA: 3.83):

- Formal Verification, Modeling, & Synthesis
- Network Visualization

#### Doctorate in CMSE (Transferred)

#### Computational Mathematics, Science and Engineering

#### Michigan State University, 2019-2021

#### Advisor: Jose Perea

Click for teaching experience, coursework taken, and other details...#### Coursework (GPA: 3.83):

- Numerical Linear Algebra (CMSE 823)
- Numerical Differential Equations (CMSE 821)
- Math Foundations of Data Science (CMSE 890)
- Topological Methods for the Analysis of Data (CMSE 890)
- Parallel Computing (CMSE 822)
- Geometry and Topology II (MTH 869)
- Mathematical foundations of analysis (CMSE 890)
- Algebra I (MTH 818)

#### Masters of Science in CS

#### College of Engineering and Computer Science

#### Wright State University, 2015-2018

#### Advisor: Derek Doran

Click for teaching experience, coursework taken, and other details...#### Coursework (GPA: 3.88):

- Network Science
- Machine Learning
- Information Theory
- Applied Stochastic Processes
- Algorithm Design and Analysis
- Empirical Analysis
- Advanced Programming Languages
- Distributed Computing

#### Bachelor of Science in CS (+STT)

#### College of Engineering and Computer Science

#### Wright State University, 2010-2015

Click for teaching experience, coursework taken, and other details...#### Coursework (GPA: 3.42, in-major):

- Applied Statistics I & II
- Optimization Techniques
- Foundations of AI
- Computational Tools for Data Analysis
- Theoretical Statistics
- Linear Algebra