By Drew White
Summary | Technologies Used | Sources | Description | Visualizations | Goals | Setup | Known Bugs | License
Musical Journeys is a project that creates a music genre classification machine learning model, gives song recommendations based on what song you classify and suggests a place to visit where others have a similar taste in music.
- Python
- Pandas
- Spotipy
- scikit-learn
- Librosa
- matplotlib
- BigQuery
- Looker Studio
GTZAN Dataset - Music Classification
- Process followed for project:
- Load classification dataset.
- Classify data using machine learning with Python Library
scikit-learn
. - Visualize and analyze accuracy results by creating a confusion matrix with
matplotlib
. - Gauge similarites in the music with cosine similarity.
- List recommendations based on song chosen to analyze.
- Analyze markets and genre popularity by pulling select data from Spotify using Spotify API,
spotipy
. - Put extracted data into
Pandas
dataframe and make transformations for readability. - Load data to BigQuery using
Google Python Client
. - Visualize findings in Looker Studio.
A confusion matrix is a table that is commonly used to evaluate the performance of a machine learning model. It displays the number of true positives, true negatives, false positives, and false negatives for a given set of predictions when compared to the actual outcomes.
I would like to keep adding to this project and have the following goals:
- In the data that I analyzed, no meta data for Artist or Title was present. I would like to find a way to analyze the music and pull that information. I made attempts to do this with the
Shazam API
andAcoustID
but did not have any luck. - Continue training this model to make the accuracy score higher with all genres.
- With genres in mind, I would like to bring sub-genres into the model.
- Explore the possibilities of creating user interaction with this pipeline. Maybe this could be done retrofitting the code to be a flask application that either: allows the user to upload a song, or choose a song in a database to explore.
- Pull more market data and explore different means to do so. I was able to get a general market popularity rating but I feel like some missing information that could be placed would change results. For example, Finland is said to be the most passionate country about metal music. My results showed Canada and USA markets.
- Clone by inputting following into terminal:
git clone https://github.com/Drewrwhite/musical_journeys.git
- Navigate to directory:
cd <directory>
- Create a virtual environment:
python3.7 -m venv venv
- Activate virtual environment:
source venv/bin/activate
- Install requirements:
pip install -r requirements.txt
- Create
data
directory:mkdir data
- Open directory in VSCode:
code .
- Download data from Kaggle and save in
data
directory:https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification
- No known bugs
If you find any issues, please reach out at: d.white0002@gmail.com.
Copyright (c) 2023 Drew White