Enjoy exploring among about 2 million songs...
This section provides a step-by-step guide on setting up and running the project locally.
- Kubernetes: Ensure Kubernetes is installed and configured.
- Clone the Repository:
git clone https://github.com/rezaakb/CSCI-5253-Final-Project.git cd CSCI-5253-Final-Project
- Run the Deployment Script: This script sets up Redis, REST, logs, and worker services and forwards the Redis service connection to your local machine.
./deploy-local-dev.sh
- Verify the Services: Check that all services are running.
kubectl get pods
Our project focuses on developing a Music Recommendation System using the Spotify Million Playlist Dataset. We processed approximately two million unique songs to create feature vectors for each of them. Users can search for songs by track name and artist name, select their preferred songs from the list, and receive relevant recommendations from the system.
Data preprocessing included cleaning 32GB of raw data and performing sentiment analysis on Dataproc. Feature vectors were obtained from the Spotify API for 1,998,516 unique songs. Features were joined and saved in MongoDB for further processing.
To find similar songs, we calculated the cosine similarity between the mean feature vector of selected songs by the user and all feature vectors in our database. Each feature vector comprises 27 features, including danceability, energy, loudness, speechiness, and more.
- Flask: Framework for developing the REST-Server.
- HTML, CSS, and JS: Used for the frontend UI.
- Pyspark: Utilized for preprocessing the Spotify Million Playlist Dataset.
- APIs: Spotify API for extracting features and cover art images.
- Redis: for message queuing.
- MongoDB: Storage for extracted features.
- Google Storage Bucket: Used to store raw data.
- Docker and Kubernetes: Deployment of the application.
Interested in contributing? Fork the repository on GitHub, clone it to your local machine, and create your own branch to commit your changes. After pushing your changes to your fork, open a pull request to submit your contributions for review. Ensure your code is up-to-date with the main branch before submitting. We appreciate your collaboration!