PyTorch_Speaker_Verification

PyTorch implementation of speech embedding net and loss described here: https://arxiv.org/pdf/1710.10467.pdf.

Also contains code to create embeddings compatible as input for the speaker diarization model found at https://github.com/google/uis-rnn

The TIMIT speech corpus was used to train the model, found here: https://catalog.ldc.upenn.edu/LDC93S1, or here, https://github.com/philipperemy/timit

Dependencies

PyTorch 0.4.1
python 3.5+
numpy 1.15.4
librosa 0.6.1

The python WebRTC VAD found at https://github.com/wiseman/py-webrtcvad is required to create run dvector_create.py, but not to train the neural network.

Preprocessing

Change the following config.yaml key to a regex containing all .WAV files in your downloaded TIMIT dataset. The TIMIT .WAV files must be converted to the standard format (RIFF) for the dvector_create.py script, but not for training the neural network.

unprocessed_data: './TIMIT/*/*/*/*.wav'

Run the preprocessing script:

./data_preprocess.py

Two folders will be created, train_tisv and test_tisv, containing .npy files containing numpy ndarrays of speaker utterances with a 90%/10% training/testing split.

Training

To train the speaker verification model, run:

./train_speech_embedder.py

with the following config.yaml key set to true:

training: !!bool "true"

for testing, set the key value to:

training: !!bool "false"

The log file and checkpoint save locations are controlled by the following values:

log_file: './speech_id_checkpoint/Stats'
checkpoint_dir: './speech_id_checkpoint'

Only TI-SV is implemented.

Performance

EER across 10 epochs: 0.0377

D vector embedding creation

After training and testing the model, run dvector_create.py to create the numpy files train_sequence.npy, train_cluster_ids.npy, test_sequence.npy, and test_cluster_ids.npy.

These files can be loaded and used to train the uis-rnn model found at https://github.com/google/uis-rnn

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Results		Results
config		config
LICENSE		LICENSE
README.md		README.md
VAD_segments.py		VAD_segments.py
data_load.py		data_load.py
data_preprocess.py		data_preprocess.py
dvector_create.py		dvector_create.py
hparam.py		hparam.py
speech_embedder_net.py		speech_embedder_net.py
train_speech_embedder.py		train_speech_embedder.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch_Speaker_Verification

Dependencies

Preprocessing

Training

Performance

D vector embedding creation

About

Releases

Packages

Contributors 4

Languages

License

HarryVolek/PyTorch_Speaker_Verification

Folders and files

Latest commit

History

Repository files navigation

PyTorch_Speaker_Verification

Dependencies

Preprocessing

Training

Performance

D vector embedding creation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages