Spotify Million Playlists (RecSys 2018) Challenge Submission

UC Berkeley Team: Jack Vasylenko, Chitwan Kaudan, Anith Patel, Tyler Larsen and William Wang.

Databricks Spark Demo video

This project is a song recommendation system implemented using Spark MLib Alternating Squares Collaborative Filtering Algorithm trained on 1 million playlists open-sourced by Spotify.

About the dataset:

The MPD contains a million user-generated playlists. These playlists were created during the period of January 2010 through October 2017. Each playlist in the MPD contains a playlist title, the track list (including track metadata) editing information (last edit time, number of playlist edits) and other miscellaneous information about the playlist.

Obtaining The Data

Proceed with these steps to download Spotify's dataset (33 Gb) and convert the data into a memory-efficient format (~ 5 Gb) for use on the Databricks platform:

Download Spotify's official dataset and place the 'data' folder into the root folder of the project.
Run the following command:

python restructureData.py

This script populates the \data_csv folder with the data that can be used to create a Databricks table.

1. Exploratory data analysis:

EDA.ipynb

2. Using Neural Collaborative Filtering approach on a subset of data:

Neural-Collaborative-Filtering.ipynb

3. Training & Using Spark MLib Alternative Least Squares algorithm on all of data:

Spark-MLib-ALS.ipynb

License

Usage of the Million Playlist Dataset is subject to these license terms

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
data_CSV		data_CSV
src		src
.DS_Store		.DS_Store
.git_ignore		.git_ignore
.gitattributes		.gitattributes
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
GMF.png		GMF.png
Neural-Collaborative-Filtering.ipynb		Neural-Collaborative-Filtering.ipynb
README.md		README.md
Spark-MLib-ALS.ipynb		Spark-MLib-ALS.ipynb
SpotifyProject-WriteUp.pdf		SpotifyProject-WriteUp.pdf
Test.ipynb		Test.ipynb
environment.yml		environment.yml
restructureData.py		restructureData.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Million Playlists (RecSys 2018) Challenge Submission

Databricks Spark Demo video

About the dataset:

Obtaining The Data

1. Exploratory data analysis:

2. Using Neural Collaborative Filtering approach on a subset of data:

3. Training & Using Spark MLib Alternative Least Squares algorithm on all of data:

License

About

Releases

Packages

Contributors 3

Languages

vaslnk/Spotify-Song-Recommendation-ML

Folders and files

Latest commit

History

Repository files navigation

Spotify Million Playlists (RecSys 2018) Challenge Submission

Databricks Spark Demo video

About the dataset:

Obtaining The Data

1. Exploratory data analysis:

2. Using Neural Collaborative Filtering approach on a subset of data:

3. Training & Using Spark MLib Alternative Least Squares algorithm on all of data:

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages