movie-matchmaker

The following is the documentation for Katelyn Stringer and Alex Riley's course project for STAT 689: Statistical Computing w/ R & Python. Movie Matchmaker is an application of collaborative filtering methods to the MovieLens dataset. See the accompanying report for more detail on the methodology used.

Getting started

These instructions should get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

You will need to install the (mostly normal) array of Python packages numpy, pandas, matplotlib, and sklearn. All of these should be installable from your preferred package installer, for example with conda:

conda install <package>

Installing

To install, simply navigate to where you want the project located on your machine and perform a git clone:

git clone https://github.com/stringkm/movie-matchmaker.git

To check that the project is speaking with your Python packages, run utils.py

cd movie-matchmaker
python utils.py

This will import all of the required packages and define some functions useful for the project. If you get message Everything looks good! then you can assume things are working.

Testing

After this, you should be able to move into the directory

cd path/to/movie-matchmaker

and run ratemovies.py in the following manner. Dfine your name using the "userID" keyword, Use the "cutoff" keyword to determine how many movies you want to rate, specify the output filename using "output", and set "method" to "cosine" or "pearson" to specify which similarity metric you want to use for the ratings.

python ratemovies.py --userId=test --cutoff=5 --output=test.csv --method=cosine

You should see a prompt similar to the one below (the movie will very likely be different)

For each movie, type a numeric rating (0-5) or <Enter> if you haven't seen it.
What is your rating for "Toy Story (1995)":

Simply exit the process with either exit or ^C:

For each movie, type a numeric rating (0-5) or <Enter> if you haven't seen it.
What is your rating for "Toy Story (1995)": exit

Exiting recommendation program

If you've made it to this point everything is probably set up correctly.

Optional download

This project contains the latest small development version of the MovieLens dataset as of May 2018, containing ~100,000 ratings. The interested developer might wish to apply this package to the full stable benchmark version of ~20 million ratings. To do so, download the dataset from the linked website, unzip it, and modify the FULL_DATA parameter in 0_explore_data.ipynb to point to the folder containing the data. To apply this to any other point in the analysis you will need to modify the files to point to that version of the ratings.csv and movies.csv files (see the 0_*.ipynb files for more information on the contents of the dataset).

0_explore_data.ipynb: data exploration of the full stable benchmark version
0_explore_ratings.ipynb: further data exploration focused on the ratings of the small development version
1_pearson.ipynb: implementation of collaborative filtering with Pearson correlation coefficient weights
2_cosine.ipynb: implementation of collaborative filtering with vector cosine similarity weights
3_top_k.ipynb: implementation of top-k collaborative filtering
ratemovies.py: API to rate randomly selected movies, save those ratings, and compute (using either weight method) the top 5 and bottom 5 predicted rated movies
utils.py: defines useful functions used throughout the project

Folders

data/: contents of the small development version of the MovieLens dataset as of May 2018. See 0_explore_data.ipynb for an exploration of the full stable benchmark version of these files, which is quite similar to the small version. The actual analysis uses the small dataset throughout
docs/: this README and other project documentation (proposal, presentation slides, and final report
figures/: figures from the analysis included in the project report
processed/: saved files created in 0_explore_data.ipynb and practice ratings generated by the authors

Authors

Katelyn Stringer - Pearson correlation - stringkm
Alex Riley - Cosine similarity, top-k filtering - ahriley

Acknowledgements

This project was created as part of the Spring 2018 course STAT 689: Statistical Computing with R and Python taught by Dr. James Long at Texas A&M University.

We acknowledge the helpful advice contained in the following sources that helped us design and implement our algorithms:

Michael Ekstrand, "Similarity Functions for User-User Collaborative Filtering," Grouplens (blog), October 24, 2013.
Suresh Kumar Gorakala, Building Recommendation Engines (Birmingham, UK: Packt Publishing Ltd), 2016.
James Long, "Netflix Prize and Collaborative Filtering" (lecture, Statistical Computing in R and Python, Texas A&M University, College Station, TX), March 8, 2018.
"Netflix Prize", Netflix, Inc., accessed April 30, 2018.
Ethan Rosenthal, "Intro to Recommender Systems: Collaborative Filtering," Data Piques (blog), November 2, 2015.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

movie-matchmaker

Getting started

Prerequisites

Installing

Testing

Optional download

Contents

Code

Folders

Authors

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
data		data
docs		docs
figures		figures
processed		processed
.gitignore		.gitignore
0_explore_data.ipynb		0_explore_data.ipynb
0_explore_ratings.ipynb		0_explore_ratings.ipynb
1_pearson.ipynb		1_pearson.ipynb
2_cosine.ipynb		2_cosine.ipynb
3_top_k.ipynb		3_top_k.ipynb
ratemovies.py		ratemovies.py
utils.py		utils.py

stringkm/movie-matchmaker

Folders and files

Latest commit

History

Repository files navigation

movie-matchmaker

Getting started

Prerequisites

Installing

Testing

Optional download

Contents

Code

Folders

Authors

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages