To evaluate whether Machine Learning can be used to automatise playlist creation.
Moosic is a small startup that creates playlists curated manually by music experts. Their listeners love the personal touch, which they achieve by capturing the "mood" or "vibe".
Board: Believes that they need at least a degree of automatisation, as music experts are not able to keep up with the demand. Currently the whole creation process is done manually.
Music Experts: Are skeptical that audio features on their own are not enough to capture the "mood" which is very subjective that only a human can judge.
Moosic wants the data science team to use a dataset that has been collected from the Spotify API and contains the audio features (tempo, energy, danceability…) for a few thousand songs. After useing a basic clustering algorithm such as K-Means to divide the dataset into a few clusters the data team shall answer the following two questions:
- Are Spotify’s audio features able to identify “similar songs”, as defined by humanly detectable criteria?
- Is K-Means a good method to create playlists?
- Import list of 5000 songs collected from Spotify API
- Use basic clustering ex.: K-Means to divide dataset into clusters
- Validate clusters, export clusters (playlists) to Spotify and listen to some of the songs
- Difficult to evaluate the results without listening to each playlist
- No tangible way to measure accuracy
- Unevenly large clusters
- Subjective - what is a good playlist?
- Must be visualized, so we can see the overlaps and the outliers
- Limit the number of features to 3 (or multiples of 3) so it can be visualized in 3D scatterplot
- Find a balance between K-score and the business objectives
- Instead of replacing music experts, ML does the "heavy lifting" and they fine-tune the results
- Evaluate the database; basic cleaning, ex.: missing, corrupted values, correct data types
- Exploration of audio features
- Decide which features to drop, and which features to use
- K-Means clustering
- Evaluation of clusters
- Sub-clustering
- Evaluation of final clusters
5 minute PowerPoint presentation found here to the Board of Directors, that summarizes the findings and suggests a course of action. Python code is found here.
- Data Cleaning & Quality Assurance
- Data Preprocessing: Scaling
- K-Means Clustering
- Elbow Method and Silhouette Score
- Data Visualization (3D Scatterplot)