Music-Recommend-Model

Self-supervised GNN Music Recommendation Model with Data collected using Spotify API.

AWS serverless backend is Here
Frontend is Here
Website hosted on Music Recommendation

V3 Graph Transformer Contrastive Learning For Real-time Inference

Description: Run GNN on most recent User-Song graph to update embedding for each song on schedule.
Training Objective: The cos similarity between two songs' embedding represents the probability that they are neighbor.
Recommendation Query the saved embedding the find songs that are most similar to the songs in user's liked list as recommendation on real-time.

Define: Song-1 and Song-2 are neighbor if there exist at least one user, whose liked list contains both Song-1 and Song-2. Normal way to check if two songs are neighbor requires search through all user's liked list, which will take forever as the database grows bigger. I will use tricks from graph theory to solve this problem.

Considering the following mini example with only 3 users and 4 tracks/songs

Build the dataset by dataset.SpotifyGraph2 and the following the the visualization of the procedure

Optimization detail: By computing and saving B using sparse adjacent matrix multiplication, we are able to find the neighbor of any song in O(1) time. Otherwise, we need to search through all user's likedlist to check if two songs are neighbor, that's O(N^2).
Caching B makes sampling training data super efficient (See below).

Using Contrastive self-supervised pipeline to train the Encoder on the dataset by, code in trainG2 and SpotifyGraph2.getitem

Efficient & Elegant Real-time Inference. inference.py Consider the following case when a user named Jane requests for a recommendation list. Our database indicated that Jane has n songs in her liked list:

The embedding is normalized, which does no affect cosine distance. And largely simplified operatioin.

Performance

680ms average response time on: minimum AWS Lambda Serverless, User Like Graph contains 125300+ edges
- Comparing with V2, which use exactly the same encoder and User Like Graph takes 410000ms to run recommendation for one user. Because the it need to add the user into the graph and run the model on the HUGE graph.
- Advantage is more obvious when graph grow even bigger. The performance of V3 is not affect by graph size

Performance Explaination

The above algorithm elegantly obtains the most similar (in terms of cosine similarity in embedding space) songs to the user's existing favorite list through matrix operations .
By directly using (track, track) similarity, The pipeline bypasses the computation of user embedding, which requires ML model service (Huge Overhead)
By using the above method, I was able to put the time-consuming machine learning computation offline on schedule. And query the result in real-time

V1 MLP

User MLP to embed user and song according to the song in user's playlist.
Use cos similarity, K-d tree in embedding space between user and song for recommendation
Problem accuracy

V2 Graph Transformer

Convert data to (User-Song) bipartite graph, edge means "like". Run GNN to generate embedding
Use cos similarity, K-d tree in embedding space between user and song for recommendation. Accuracy rocket!!!
Problem When new user visit. Need to add user to the HUGE graph and re-run GNN to generate embedding for the user, super expansive.

Data Collection

All the data were collected through Spotify API

Collect Data from Spotify

2 main functions in data_collection.py

collect_more_user
- RUN DFS: user-playlist-user to get more user ids and save to txt file.
collect_train_data
- Expand to the neighbor users of users collected above and request their liked list, save to userid:[track ids] json file user_like.json
- Collect 11 features of each song in the songs liked by users and save to id_map.csv. Features are
```
['acousticness'],['danceability'],['energy'],['instrumentalness'],['key'],['liveness'],['loudness'],['mode'],['speechiness'],['tempo'],['valence']
```

Format the data

Use format_like.py to format the files. Required to build the Graph

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
util		util
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music-Recommend-Model

V3 Graph Transformer Contrastive Learning For Real-time Inference

Considering the following mini example with only 3 users and 4 tracks/songs

V1 MLP

V2 Graph Transformer

Data Collection

Collect Data from Spotify

Format the data

About

Releases

Packages

Languages

CChenLi/Music-Recommend-Model

Folders and files

Latest commit

History

Repository files navigation

Music-Recommend-Model

V3 Graph Transformer Contrastive Learning For Real-time Inference

Considering the following mini example with only 3 users and 4 tracks/songs

V1 MLP

V2 Graph Transformer

Data Collection

Collect Data from Spotify

Format the data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages