I'm a machine learning developer and open-source contributor.
I enjoy everything about the field of artificial intelligence: experimenting, developing and tuning models, tinkering with the latest software, learning from ML research, testing new ideas, and experiencing the "magic" of what can be accomplished with linear algebra, backpropagation, and lots of training data.
You can find examples of my work in my repositories. Some recent things I've worked on include:
- an interactive Polars tutorial as a marimo notebook/app and a demo of analyzing the NYC Yellow Taxi dataset to show how fast Polars is (the dataset has ~30M records). The repo also includes dynamic, asynchronous data download and the combination of multiple data sources and formats (e.g. parquet, JSON)
- History Buff, a Python and CLI app for semantic search over browser history. (Currently a work in progress)
- Baby Names, a Streamlit app for analyzing trends in first names over the past 100 years
- a collection of various machine learning projects, including key algorithms implemented from scratch in NumPy, along with detailed explanations of the algorithms and mathematical operations, structured to be easy to follow to encourage learning
ML from scratch Machine learning algorithms implemented in NumPy |
---|
Neural network (for regression or classification) |
KMeans clustering |
Logistic regression |
and more... |
Natual language processing (NLP) |
---|
Topic modeling on 50 years of magazine issues - Using Non-Negative Matrix Factorization, Latent Dirichlet Allocation, and doc-topic Cosine Similarity |
Extractive text summarization - And application to Wikipedia articles |
Feature engineering with regex pattern matching - To analyze groups within a corpus |
Dictionary key search - With fuzzy matching, to find keys in a nested JSON or dictionary object |
and more... |
Deep learning in PyTorch |
---|
Full-page handwritten text recognition - Implementation of a research paper: uses a combination of a ResNet encoder and a Transformer decoder to capture text from a full page of my handwritten journal - 1D and 2D positional encoding - dataset and dataloader prep (e.g., torch.transform image transformations) - gradient accumulation for memory-constrained GPUs - synthetic data generation; data augmentation - input sequence masking; training and validation |
Class projects from an upper-level university computer science course - image style transfer - fine-tuning a ResNet classifier - training GANs - building language models using the Transformer and RNN architectures - using a U-Net for image segmentation - reinforcement learning (Deep Q and PPO networks) |