Skip to content

Exploring speech enhancement models using 32-bit floating-point audio files

Notifications You must be signed in to change notification settings

samwelkanda/speech-enhancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Enhancement

Tinkering with speech enhancement models.

Borrowed code, models and techniques from:

  • Improved Speech Enhancement with the Wave-U-Net ((arXiv)
  • Wave-U-Net: a multi-scale neural network for end-to-end audio source separation (arXiv)
  • Speech Denoising with Deep Feature Losses (arXiv, sound examples, GitHub)
  • MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis (arXiv, sound examples, GitHub)

Datasets

The following datasets are used:

  • The Univeristy of Edinburgh Noisy speech database for speech enhancement problem
  • The TUT Acoustic scenes 2016 dataset is used to train the scene classifier network, which is used for the loss function. (dataset paper)
  • The CHiME-Home (Computational Hearing in Multisource Environments) dataset (2015) is also used for the scene classifier, in some experiments
  • The "train-clean-100" dataset from Librispeech, mixed with the TUT acoustic scenes dataset.

Data format

At the moment, the algorithm uses 32-bit floating-point audio files at a 16kHz sampling rate to perform correctly. You can use sox to convert your file. To convert audiofile.wav to 32-bit floating-point audio at 16kHz sampling rate, run:

sox audiofile.wav -r 16000 -b 32 -e float audiofile.float.wav

About

Exploring speech enhancement models using 32-bit floating-point audio files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published