Hola π
I am Machine Learning Engineer, originally from Mexico π²π½, based in Seattle, WA. I have a B.S. in Computer Science by Tecnologico de Monterrey, and a M.S. in Computer Science with a Specialization in Machine Learning by Georgia Institute of Technology. I currently work as a Senior Staff Machine Learning Engineer at SoFi.
I'm interested in Artificial Intelligence, especially Reinforcement Learning, and Robotics. I'm particularly interested in how AI can adopt complex social behaviors that adhere to human preferences, and how these agents can be deployed to improve our lives. I envision a world where humans and intelligent agents (embodied, i.e. robots, and non-embodied) cooperate to solve the world's most pressing problems.
- Generative AI
- Deep Learning
- Deep Reinforcement Learning
- Reinforcement Learning
- Computer Vision
- Robotics
- General Machine Learning
- Knowledge-based AI
Lee is an AI-powered cartoon version of my wife Ash that interacts with her and her audience on Twitch and YouTube, acting as a co-host for her streams. The bot is comprised of three main components, all leveraging the latest technologies:
- Speech-to-text: Allows Lee to listen to Ash's voice when she talks to her.
- Text Generation: Allows Lee to respond to Ash's and her audience comments by using a Large Language Model to generate responses.
- Text-to-speech: Allows Lee to talk back to Ash and her audience with her own unique voice.
Lee has captivated audiences during streams, and is even now playing games along with Ash!
This project allowed me to learn more about Generative AI and associated technologies, and how to build a production-grade system around them.
Aguefort is a chatbot that uses the power of LLM's combined with all the knowledge from Dropout's Adventuring Academy episodes (a podcast where the host and his guests talk about all things TTRPG).
With this chatbot I can ask questions about DM'ing, D&D and role-playing in general, and get answers based on everything that has been discussed in the episodes!
The chatbot was built using:
- Gradio for the UI
- LangChain for the backend
- Anthropic's Claude Haiku (via Amazon Bedrock) for the LLM
- Meta's FAISS for the embeddings store
These are mini applications of various Deep Learning algorithms implemented in PyTorch. The inspiration for these applications comes from assignments of Coursera's Deep Learning Specialization that were originally developed using Tensorflow. I decided to re-implement them from scratch using PyTorch to improve my knowledge of the framework. Below is a list of these mini applications:
- Feed Forward
- Cat vs Non-cat Classification
- Convolutional
- Hand Sign Recognition Using CNN
- Hand Sign Recognition Using ResNet
- Art Generation Using Neural Style Transfer
- Sequence
- Dinosaur Name Generation Using RNN
- Jazz Improvisation Using LSTM
- Emojifier Using LSTM and Word Embeddings
- Date Translation Using Neural Machine Translation
- Generative
- Wardrobe Generation Using DCGAN
- Wardrobe Generation Using VAE
This is my attempt at solving comma.ai's speed prediction challenge: given an input video from the front-facing camera of a car, can you predict the speed of the vehicle? Using a video with Ground Truth speed measurements, I trained a ResNet-18 model (pre-trained on ImageNet) with features generated by converting the frames of the video into Motion History Images. The model achieved an MSE of ~5.22 on a dev set of frames obtained from a random split of the training video.
In this project, I implemented Deep Convolutional Generative Adversarial Networks (DCGAN) to generate new characters from the Super Smash Bros. Ultimate Nintendo game. Although the results weren't exactly as I expected, the model was able to learn some of the fundamental elements that form the characters: legs, arms and weapon-like silhouettes, and fighting poses.
These are implementations of different Deep Reinforcement Learning algorithms in PyTorch, as suggested by OpenAI's Spinning Up in Deep RL. Below is a list of the algorithms implemented:
- Vanilla Policy Gradients
- Deep-Q Networks
- Advantage Actor-Critic
- Proximal Policy Optimization
- Deep Deterministic Policy Gradients
Agents were trained using each algorithm to solve two classic control Gym environments: CartpPole-v1
and Pendulum-v0
. The first one was used with all algorithms except DDPG, which was trained against the second environment. The reason is that DDPG can only be used for continuous action spaces.
The following figure shows a trained DQN agent solving the CartPole environment, and a DDPG agent solving the Pendulum one.
In this project, I solved OpenAI's Lunar Lander gym environment using Deep Reinforcement Learning. I implemented a Deep Q-Network with Experience Replay. The agent was able to maneuver and land the space ship without crashing.
- Code available on request only
- Paper
- Video presentation
In this project I recreated the Theory of Mind experiment, done on chimpanzees in 2001 by Joseph Call, as an RL environment. And I investigated whether an RL agent could learn to behave like the subordinate chimpanzee in the experiment. The agent was able to learn how to read the movement of the other subject and make optimal decisions.
The following figure shows the agent (blue) acting optimally in the two experiment settings.
These are implementations of different Reinforcement Learning algorithms as described in Sutton and Barto's book Reinforcement Learning: An Introduction, 2nd Edition. Below is a list of the algorithms implemented:
- Monte Carlo Methods
- Monte Carlo with Exploring Starts
- On-policy Monte Carlo
- Off-policy Monte Carlo with weighted importance sampling
- Temporal Difference Methods
- On-policy Sarsa
- Q-learning
- n-Step Bootstrapping Methods
- On-policy n-step Sarsa
- Off-policy n-step Sarsa
- n-step Tree Backup
- Planning Methods
- Dyna-Q
- Prioritized Sweeping
- Monte Carlo Tree Search
- Function Approximation Methods
- On-policy Gradient Monte Carlo
- Semi-gradient TD(0)
- Semi-gradient n-step TD
- On-policy semi-gradient Sarsa
- On-policy semi-gradient n-step Sarsa
- Function Approximation with Eligibility Traces Methods
- Semi-gradient TD(lambda)
- On-policy semi-gradient Sarsa(lambda)
- Policy Gradient Methods
- REINFORCE with Baseline
- One-step Actor-Critic
- Actor-Critic with Eligibility Traces
The optimal value function was computed by each algorithm against a simplified Blackjack-like game called Easy21. The following figure shows these value functions.
In this project, I implemented three Multi-Agent Reinforcement Learning algorithms, Friend-Q, Foe-Q and Correlated-Q; as well as, the standard Q-Learning algorithm. These algorithms were evaluated against a "soccer" environment modeled as a Markov game. The final results reproduce the original ones obtained by Amy Greenwald and Keith Hall in their paper Correlated Q-learning (2003).
- Code available on request only
- Paper
- Video presentation
In this project, I reproduced the results presented in Richard S. Sutton's seminal paper Learning to Predict by the Methods of Temporal Differences (1988). I implemented the TD algorithm against a random walk environment to demonstrate how learning can be achieved by updating weights using the gradients of temporal difference errors.
- Code available on request only
- Paper
- Video presentation
In this project, I applied Q-Learning to the problem of trading equities in the stock market. Technical indicators were used for the states with daily return as the reward. The agent was allowed to take three actions: long, short or cash (i.e. close a position). The Q-Learning agent was able to find an optimal policy that beat both the benchmark and a rule-based hand-crafted strategy.
- Code available on request only
In this project, I implemented an activity classifier using Random Forests to detect between six different human actvities. The input vector was composed of features derived from different Motion History-based techniques. Using the well-known KTH activity dataset, my model was able to achieve a performance of 86.73% on the test set. Using a video of myself performing the activities, my model was also able to classify multiple activities sequentially with satisfying performance.
- Code available on request only
- Paper
- Video presentation
- Multi-activity recognition demo
In this project, I developed a robot capable of navigating through a simulated 2D warehouse with the objective of collecting and delivering packages to a specified dropzone. The layout of the warehouse was unknown to the robot. Instead, the robot was provided with an ultrasonic sensor that measured the distance to surfaces (e.g. walls, obstacles and boxes). The "brain" of the robot consisted of two main modules. A localizer and mapper that uses Graph SLAM to reconstruct an estimate layout of the warehouse, and an approximate position of the robot in it. And a planner that uses A Star search to navigate to/from the dropzone once regions were discovered.
Left: Estimated map and position. Right: Actual map and position.
- Code available on request only
In this project, I implemented an agent to solve Raven's Progressive Matrices tests. RPM's are intended to test human intelligence, in particular, logical associations. My agent works in two phases, each leveraging the knowledge-based technique known as Generate & Test. It first attempts to solve the problem visually via affine transformations of the images. Then, it attempts to solve the problem semantically by building Semantic Networks. My agent obtained a final accuracy of ~69% (133 correct answers out of 192 problems).
Sample problem with a correct answer 1)
In this research project, I investigated the effect on the performance of various Supervised Learning algorithms, in particular binary classifiers, when trained with data containing attribute noise. I studied the behavior of the classifiers with varying levels of noise against two different datasets, and compared their performances using different metrics. I also investigated the impact of the dimension of the feature space with respect to the attribute noise based on the contrasting characteristics of the two datasets. The objective was to understand which classifiers perform best in the presence of noise.
- Code available on request only
- Paper
In this research project, I investigated the performance of four different Randomized Optimization algorithms: Randomized Hill Climbing, Simulated Annealing, Genetic Algorithms and MIMIC. I studied their behavior under three different types of optimization problems: Continuous Peaks, Knapsack and Traveling Salesman. I compared their performances using several metrics like fitness, runtime, and others. The objective was to understand which algorithms perform best for each of the problems.
- Code available on request only
- Paper
In this research project, I investigated the performance of three of the most common RL algorithms: Value Iteration, Policy Iteration and Q-Learning. I studied their behavior, in terms of rewards obtained and running times, against two environments with different characteristics. A grid world with frictionless floors, and the classic Taxi problem. The objective was to investigate the impact that the dimension of the state, and the complexity of the problem itself have on the performance of the algorithms.
- Code available on request only
- Paper
In this research project, I studied two of the most common clustering algorithms, K-Means and Expectation-Maximization. I also looked at four different dimensionality reduction techniques: Principal Component Analysis, Independent Component Analysis, Randomized Projections and Factor Analysis. I investigated their features by performing four different experiments using two different datasets. The objective was to understand the behavior of each algorithm under various types of problem spaces, and compare and contrast their advantages and disadvantages.
- Code available on request only
- Paper