Programming Assignments for machine learning specialization courses from University of Washington through Coursera.
Techniques used: Python, pandas, numpy,scikit-learn, graphlab, R
In terms of the library and packages, I only used graphlab and SFrame for Machine Learning Foundations. For all the other courses (Regression, Classification and Clustering) I have used pandas for feature enginering and scikit-learn to build out modeling.
-
Machine Learning Foundations: A Case Study Approach
Regression: Predicting House Prices (Leverage Zillow data to build linear regression model to predict house prices)
Classification: Analyzing Sentiment (Build logistic classification model to analyze product sentiment)
Clustering and Similarity: Retrieving Documents (conduct cluster analysis for document retreival, tf-dif)
Recommending Products: Build Matrix Factorization Model and leverage Jaccard Similarity to Recommend Songs
-
Machine Learning: Regression
Project Overview: How to predict a house's price? How to evaluate model? How to prevent model from overfitting?
Simple Linear Regression: Implementing closed-form solution for simple linear regression
Multiple Linear Regression: Exploring multiple regression models for house prediction; Implementing gradient descent for multiple regression
Assessing Performance
Ridge regression
Lasso regression
Kernal regression
-
Machine Learning: Classification
Project 1 Overview: Build classification modeling to predict if an Amazon review is positive.
Project 2 Overview: Is this loan safe or risky?
In these assignments, I have built logistic regression modeling and decision tree modeling to predict if a loan is risky or safe and test classification errors for different models by both using scikit-learn and implementing the (greedy ascent, greedy descrsion tree and etc.) algorithm from sracth.
Linear Classifiers & Logistic Regression
Learning Classifiers; Overfitting & Regularization in Logistic Regression
Decision Trees
Precision-Recall
Stochastic Gradient Ascent
SVM http://www.svm-tutorial.com/2014/11/svm-understanding-math-part-2/
-
Machine Learning: Clustering & Retrievel
Nearest Neighbor Search
Clustering with K-Means
Mixture Models (Implementing Expectation Maximization Algorithm for Gaussian mixtures; Clustering text data with Gaussian mixtures)
Mixed Membership Modeling via Latent Dirichlet Allocation
Others:
computational cost (comlexity) http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly
bitwiseoperators ( 0 1 ) https://wiki.python.org/moin/BitwiseOperators
additional blog that helps understand LDA http://confusedlanguagetech.blogspot.com/