E4571: Columbia University, Fall 2019
Data Science Institute
Industrial Engineering and Operations Research
Instructor: Brett Vintch PhD
with guest lectures by:
Sam Garrett | Lead Data Science Engineer at iHeartRadio
Personalization is a key tool for enhancing customer experience across industries, thereby driving user loyalty and customer value. It is therefore no surprise that creating and enhancing personalization systems is also increasingly one of the core responsibilities of data science teams, and a key focus for many of the machine learning algorithms in the sector.
This course will focus on common personalization algorithms and theory, including behavior-based and content-based recommendation, commonly encountered issues in scaling and cold-starts, and state of the art research. It will also look at how businesses use, and misuse, these techniques in real world applications.
Math: Linear algebra preferred, but not required
CS: A scripting language, preferably Python
- History
- Evaluation - how do we know when we’re successful?
- Questions to keep in mind throughout course
- The filter bubble
- Rich get richer (popularity breeds popularity)
- Cold start (users and/or items)
- Accuracy
- Interpretability
- Scalability
- Real-time vs batch
- Serendipity
- Diversity
- How to incorporate new types of data
- User or Item features
- Real-time user context
- Time-variance
- User states or attribute drifts
- Positive feedback loop
- Aspirational vs actual (stated vs revealed)
“Users with history like yours also like … “
- Nearest neighbors
- Collaborative filtering
- Matrix Factorization
- Explicit
- ALS
- Error metrics
- Ranking metrics
- Techniques for missing data as unobserved data
- Regularization
- Approximate nearest neighbors
- Collective factorization & factorization machines
- Incorporating metadata and additional item or user dimensions
- Restricted Boltzmann Machines & Auto-encoders
- Bayesian approaches
“Users that like content that [looks/sounds/reads] like this also might like…“
- Domains:
- Semantic
- Image
- Audio
- Ensembles
- Explicit combinations
- Wide and deep
- Active learning
- Reinforcement learning
- Context: time and location sensitivity
- Recommender rationale (e.g. tagging)
- Deep learning and neural network approaches
- Homework 1 (20%)
- Homework 2 (30%)
- Final project (50%)