🔢Audio MNIST Classification on the Free Spoken Digit Dataset🔊

This project is the MNIST equivalent for audio. The dataset contains recordings of speakers saying numbers 0 to 9. The data is converted to mel spectograms and classified using a modified ResNet18 model.

Dataset

Free Spoken Digit Dataset (FSDD) is a simple audio/speech dataset consisting of recordings of spoken digits in wav files. The dataset contains approximately 20 MB of 1,500 recordings of spoken digits from 0 to 9. Each digit was spoken by 50 different speakers, and each speaker spoke each digit five times. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.

FSDD is an open dataset, which means it will grow over time as data is contributed. It is a useful dataset for speech recognition tasks and can be thought of as an audio version of the popular MNIST dataset which consists of hand-written digits.

Training

Notebook

Imput data: Mel spectograms (2400 train, 600 test)
Model: Modified ResNet18 model
Total epochs: 300
LR: 0.001, step size 20, gamma 0.9
Loss: 1.4690
Val Loss: 1.5069
Accuracy: 0.9571
Val Acc: 0.9696

Best epoch (highest validation accuracy):

Epoch: 271
Precision: 0.9691420399131312
Recall: 0.9694164524957861
F1 score: 0.9690412348965143

Inference

You can use the trained model to run inference on a single mel spectogram image using:
python inference.py Data\Mel\0\0_george_0.png

Output:
Prediction: 0 (Confidence: 0.9999996423721313)

Future steps

Any empty signal at the beginning or end of each clip is removed, however, this does not mean noise before and after the section of the desired signal is removed. To fix, the data could be passed through a noise gate to remove noise before and after the desired signal while not affecting it.
The model should be modified to allow input of different lengths without having to resize them.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Data		Data
ModelSaves		ModelSaves
LICENSE		LICENSE
Mel Spectogram Example.png		Mel Spectogram Example.png
Metrics.png		Metrics.png
README.md		README.md
TrainAndTest - CNN.ipynb		TrainAndTest - CNN.ipynb
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔢Audio MNIST Classification on the Free Spoken Digit Dataset🔊

Dataset

Training

Inference

Future steps

About

Languages

License

dilne/Free-Spoken-Digit-Dataset

Folders and files

Latest commit

History

Repository files navigation

🔢Audio MNIST Classification on the Free Spoken Digit Dataset🔊

Dataset

Training

Inference

Future steps

About

Topics

Resources

License

Stars

Watchers

Forks

Languages