Skip to content

Cough audio classification using a simple network implemented in Pytorch


Notifications You must be signed in to change notification settings


Repository files navigation

COVID-19 Cough Classification using PyTorch


The aim of this project is to classify audio recordings of coughs into COVID-19 positive and negative.


Install required libraries

pip install -r requirements.txt

Running the code

The code is split into four notebooks:

  1. 001_prepare_data.ipynb: This notebook extracts the features of the wav files and writes them to a new csv file.
  2. 002_training.ipynb: Trains and evaluates the CoughNet using the extracted features. Finally a checkpoint is saved.
  3. 003_inference.ipynb: Uses the saved checkpoint to predict on an input wav file. There is also a pretrained checkpoint available.
  4. 004_k_fold_cross_validation.ipynb: K-fold Cross Validation for objective evaluation.


Kaggle Cough-Classifier Dataset

The major problem with this dataset is, that it is highly unbalanced. Only 19 of the 170 examples are labeled as positive.
Therefore, in addition to the good test accuracy, the model shows a relatively high false-negative rate.
More data, especially positive examples, is needed to develop a reliable, cough audio-based Covid-19 test.

Virufy Dataset

This dataset is provided by the developers of the Virufy app. They offer a webservice ( which can detect a COVID-19 signature in recordings using an AI algorithm. The open dataset on github contains 121 examples of which 48 are labeled as positive.

Balanced Dataset

The balanced dataset is created by combining the Kaggle and the Virufy dataset. Since they both contain more negative than positive examples, downsampling is used to create a perfectly balanced dataset.

Other Datasets

Other datasets have not been investigated, yet.

The Model

The model used is a simple DNN with 6 Layers. The input is a 26 values feature vector calculated from the audio files using librosa.


The balanced dataset is split into 8 folds and evaluated using cross validation.
The model converges after 20 epochs of training.

Train Accuracy Test Accuracy
Fold 0 100.00 % 94.12 %
Fold 1 100.00 % 88.24 %
Fold 2 100.00 % 94.12 %
Fold 3 100.00 % 76.47 %
Fold 4 100.00 % 88.24 %
Fold 5 100.00 % 94.12 %
Fold 6 100.00 % 100.00 %
Fold 7 96.61 % 87.50 %
Average 99.58 % 90.35 %

Confusion Matrix


The Model is based on the Keras notebook by Himanshu which can be found here:


Cough audio classification using a simple network implemented in Pytorch





