Skip to content

RNN implementation with Tensorflow (LSTM) to classify variable length sound sequences

License

Notifications You must be signed in to change notification settings

fabien-brulport/RNN-Sound-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sound classification using Recurrent Neural Networks

This repository is a RNN implementation using Tensorflow, to classify audio clips of different lengths. The input of the neural networks is not the raw sound, but the MFCC features (20 features).

As shown in the the following figure, the audio files are divided in sub-samples of 2 seconds, after it was transformed in MFCC features. The results of the preprocessing is a list of sequences with 20 features, with a fixed length (here, the file produces 3 sequences).

If necessary, the sequences are padded with 0 so the input of the neural network is fixed. But the network is able to retreive the effective time length and get rid of the 0 to be more efficient.

Since one file can be composed of several sequences, the results of sequences corresponding to one file are averaged so one label is given per file.

I used this network to classify sounds for my first kaggle competition, but I still need to dig into the data to improve the result.

Sources

  • this repository and this notebook helped me to understand the mfcc features extraction.
  • this post explains how to take into account the variable length of the sequences.

About

RNN implementation with Tensorflow (LSTM) to classify variable length sound sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages