Song recommendations using K-NN Algorithm.
-
Used a subset of data released by EchoNest in a challenge called Milion Song Dataset Challenge.
-
I haven't uploaded data used to build this application. (Exceeded the storage limit ~128 MB.)
-
Data is freely available at -> Here.
-
Dataset information -> Here
-
Developed web portal using Spring framework, for that I needed album art and artist image. For that I scraped data from spotify.
- There are ~340K songs, and scrapping them and adding them in appropriate dataframe was headache. So, I scrapped data based on albums.
- Fetched all unique albums and scrap data, and append accordingly. It saved hell lot of time. 😌
- Scraping code
-
After getting all the data -> I used K-NN algorithm to find K nearest neighbours using measure called Cosine distance. And those who have more similarity, will be recommended to user.
-
Also for the web portal I needed MP3 file. So, I did following steps.
- Scrapped YouTube for the youtube link of particular track.
- For Example: If song is faded by Ben Harper then by simply scrapping YouTube's first video won't really help.
It will definitely give you results based on popularity, hence faded by Alan Walker. So, search query should be
"(song_name)+by+(artist_name)". Scrapping code -> Here
- For Example: If song is faded by Ben Harper then by simply scrapping YouTube's first video won't really help.
It will definitely give you results based on popularity, hence faded by Alan Walker. So, search query should be
- From YouTube video link, I used library called youtube-dl for getting temporary MP3 URL. And used it on front end.
- Scrapped YouTube for the youtube link of particular track.
-
Finally developed web API using Flask. End points are as following.
- (/search) searchResult:- Fetched the data based on query given by the user.
- Query Parameter: song_input, artist_input and album_input. Uses getDataFromQuery as helper function.
It fetches data from the .csv file.
- Query Parameter: song_input, artist_input and album_input. Uses getDataFromQuery as helper function.
- (/recommend) generateRecommend:- Generates recommendations from the input data given by the user. (More in JAVA repo.)
- Query Parameter: FeatureVector provided from the client. Used as input to our KNN algorithm and generates K=50 nearest neighbors and recommends the same.
Helper function is -> getRecommendations.
- Query Parameter: FeatureVector provided from the client. Used as input to our KNN algorithm and generates K=50 nearest neighbors and recommends the same.
- (/getmp3url) getMP3URL:- It generates MP3 URL for the track seleced by the user.
- Query Parameter: Artist name, song name, song id. Uses this as helper function. It returns MP3 URL of that song.
- (/recommendByTrack) recommendByTrack:- It recommends similar songs to the song, which user wanted to play.
- Query Parameter: Song id. And it uses the same helper function which we used to recommend songs from the feature vector.
- To bifurcate both the methods -> I used one parameter called way.
If we were to recommend songs from the feature vector extracted from the user input then way would be "fromProfile". Otherwise "fromTrackID".
- To bifurcate both the methods -> I used one parameter called way.
- Query Parameter: Song id. And it uses the same helper function which we used to recommend songs from the feature vector.
- (/search) searchResult:- Fetched the data based on query given by the user.
-
Web portal is developed in Spring framework(J2EE), which I will be uploading in another repo called "RUMusicPortal".
-
Output demo: