Research Projects

Interactive Clustering Using Depth Quantile Functions (2022-2023)

Clustering is a challenging problem, especially in high dimensions. There may be multiple correct answers for the same problem. Handling outliers is difficult and evaluating a solution's correctness is nontrivial. Insight and solutions into these problems become more indecipherable when black-box algorithms such as neural networks are employed. An interactive and informative method for clustering may alleviate some of these difficulties.

Calculating Depth Quantile Functions (DQFs) is a method for extracting multiscale geometric features from high dimensional and non-Euclidean data clouds. They map pairs of data points to real-valued single variable functions on the [0,1] interval using ideas of statistical depth. DQFs have been shown to be effective and interpretable as part in classification, anomaly detection, and now, clustering. The clustering algorithm works as a series of anomaly detection problems determined by the user.

[Project Github]     [R Package GitHub]     [Thesis Submission]

The Manifold with Flow Hypothesis (2022-2023)

A familiar tenet of modern machine learning is the “Manifold Hypothesis”: most high-dimensional data collected in the real world is intrinsically parameterized by a low-dimensional manifold.

We consider the problem of embedding point cloud data sampled from an underlying manifold with an associated flow or velocity. Such data arises in many contexts where static snapshots of dynamic entities are measured, including in high-throughput biology such as single-cell transcriptomics. Existing embedding techniques either do not utilize velocity information or embed the coordinates and velocities independently, i.e., they either impose velocities on top of an existing point embedding or embed points within a prescribed vector field. The FlowArtist is a neural network that embeds points while jointly learning a vector field around the points. The combination allows FlowArtist to better separate and visualize velocity-informed structures.

[Preprint]*     [Poster]**     [Github]

This project was first considered from the perspective of embedding directed graphs in Euclidean space with a vector field in order to capture directionality information. The model postulates that nodes in a directed graphs are observations from a low-dimensional embedding of a manifold with a vector field. Thus, we look to develop algorithms to convert between directed graph affinities and low-dimensional manifolds with a vector field and extracting information from both ambient and embedding spaces.

* Accepted for publication in 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP).
** Appeared at the Machine Learning in Computational Biology (MLCB 2022) conference.

Directed Graphs Based Road Networks Data Structures (2021)

We consider the problem of designing a graph-based data structure to represent irregular road networks. The project was inspired by Bangkok's traffic problem in its irregular road networks. We worked closely with Thai (traffic) police officers, the Thai ministry of transportation, and the Bangkok Traffic and Transportation Department to address their concerns with traffic management in Bangkok. Traffic simulators were programmed to different abstractions to see each of them can capture the accuracy of transportation time and traffic congestions. Infrastructure was put into place to test effects of different methods of traffic lights management and control, traffic laws and regulations, and self-driving car algorithms on traffic congestion. Applications of the simulation may extend to urban planning and road network design.

[Github]     [Poster]*

* Appeared at the 2021 Pomona College Summer Symposium 2021 .

Biofilms Research (2020)

Biofilms are structured microbial communities embedded within an extracellular matrix often found on surfaces. Because of their structure, biofilms provide great survival sites for pathogenic bacteria by providing protection. Because of their characteristics, biofilms are often found on man-made surfaces such as pipes, ships, and pools and are difficult to get rid of. This project observes E. Coli Genetics in relation to biofilm formation of wild-type and mutant E. Coli through Dictyostelium phagocytosis with the goal of identifying which genes are responsible for the strength and formation of biofilms.