A deep learning method for automatically labeling songs by genre using Torch. The primary reason for creating this was to become more familiar with audio deep learning.
- Make sure to install Torch by following the instructions on the website.
- You also need sox. If you don't already have it, then run:
git clone git://git.code.sf.net/p/sox/code sox
First, because music from SoundCloud is free, we take advantage of a pretty sweet SoundCloud song scraper called SoundScrape, which you can install with pip install soundscrape
.
To give you an example of how to install songs by genre:
- Simply type the genre into the SoundCloud search bar online, select Playlists on the sidebar and pick a set.
- Navigate to one level outside this repo and open up a terminal to create a Data folder (we assume you download all sets there).
mkdir Data
cd Data
- Then Open up a terminal and download each set with SoundScrape:
soundscrape https://soundcloud.com/full-url-to-the-selected-genre-set
- You should create a separate folder containing the tracks for each genre inside the Data folder.
- Repeat this process for each genre of your choice and then update config.lua to support the genres you've selected.
- Currently the config file is set up for Classical, Country, Hip-Hop, Rock, and (of course) Tropical-House.
Next, using sox, we combine the two stereo channels to mono, and convert all of our songs to spectrograms so that they can be processed by the Conv-Net in a way that is similar to normal images.
In order to make the most of the data we have, we slice-up these spectrograms to produce several small ~2 second clips that we can train on and treat as individual instances.
- To tweak the configuration to your setup, you may have to change some of the opts inside
config.lua
,train.lua
, ortest.lua
. - Since we only needed to train for 20 epochs to get impressive accuracy (~95%), the model is saved at each epoch by default.
- Again, all details like this can be changed to user preference by editing the files listed above.
- Note: To run on an Nvidia GPU, set the -backend flag to 'cudnn' rather than the default 'nn'.
- run
th train.lua
. - Note: Conversion to mono, conversion to spectrogram, and spectrogram slicing is all done by default when running
th train.lua
. This can be changed for subsequent runs by changing the flag -createSpectrograms to false.
- To test the accuracy of a trained model run
th test.lua
. - This will evaluate the pre-trained model on the evaluation set that was set aside and print the accuracy for each genre as well as the overall accuracy.
- Training on Soundcloud sets containing the genres metioned above (Tropical-House etc.) we were able to achieve an overall classification accuracy of 95%.
- Keep in mind the model is able to make these predictions from just ~2 second audio clips, so that's pretty cool!
All of the above images can be found on Julien Despois' really cool blog post, which can also be used as a reference for more in-depth details. This code is largely based off of his original TensorFlow implementation which can be found here.