Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vector] ecapa-tdnn on voxceleb #1523

Merged
merged 45 commits into from
Mar 24, 2022
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
16108de
add voxceleb1 dataset prepare process
LeoMax-Xiong Feb 24, 2022
35b7968
remove invalid directory
LeoMax-Xiong Feb 25, 2022
6f7e965
add kaldi feats ark dataset
LeoMax-Xiong Feb 25, 2022
d7da629
add kaldi feats egs dataset
LeoMax-Xiong Feb 26, 2022
70d3b01
remove invalid code
LeoMax-Xiong Feb 26, 2022
1395b5f
Merge branch 'PaddlePaddle:develop' into develop
LeoMax-Xiong Mar 2, 2022
7ef60eb
add voxceleb1 data prepare
LeoMax-Xiong Mar 2, 2022
0780d18
remove personal code test=doc
LeoMax-Xiong Mar 2, 2022
3a943ca
repair the variable name bug
LeoMax-Xiong Mar 2, 2022
dc28ebe
move the csv vox format to paddleaudio, test=doc
LeoMax-Xiong Mar 3, 2022
57c4f4a
add sid learning rate and training model
LeoMax-Xiong Mar 3, 2022
6af2bc3
add sid loss wraper for voxceleb, test=doc
LeoMax-Xiong Mar 3, 2022
7668f61
add sid dataloader for training, test=doc
LeoMax-Xiong Mar 3, 2022
4648059
add training process for sid, test=doc
LeoMax-Xiong Mar 3, 2022
1f74af1
add training log info and comment, test=doc
LeoMax-Xiong Mar 3, 2022
97ec012
add speaker verification using cosine score, test=doc
LeoMax-Xiong Mar 4, 2022
016ed6d
repair the code according to the part comment, test=doc
LeoMax-Xiong Mar 4, 2022
ac4967e
optimize the data prepare process
LeoMax-Xiong Mar 6, 2022
2d89c80
add waveform augment pipeline, test=doc
LeoMax-Xiong Mar 7, 2022
7db7eb8
add extract audio embedding api, test=doc
LeoMax-Xiong Mar 7, 2022
386ef3f
add voxceleb augment unit test, test=doc
LeoMax-Xiong Mar 8, 2022
14efbf5
check extract embedding result, test=doc
LeoMax-Xiong Mar 8, 2022
60d73bb
add state 0 to prepare the voxcele data and augment data
LeoMax-Xiong Mar 9, 2022
0dee8f4
Merge branch 'PaddlePaddle:develop' into develop
LeoMax-Xiong Mar 9, 2022
4473405
merge develop to vox12, test=doc
LeoMax-Xiong Mar 9, 2022
0e87037
refactor to compilance paddleaudio
LeoMax-Xiong Mar 9, 2022
993d678
remove unused code, test=doc
LeoMax-Xiong Mar 9, 2022
584a2c0
add ecapa-tdnn config yaml file
LeoMax-Xiong Mar 9, 2022
8ed5c28
add vox2 data into VoxCeleb class
LeoMax-Xiong Mar 10, 2022
311fa87
add some comments to the code
LeoMax-Xiong Mar 13, 2022
7eb8fa7
convert save_freq to save_interval, test=doc
LeoMax-Xiong Mar 13, 2022
506d26a
change the code style to s2t code style, test=doc
LeoMax-Xiong Mar 14, 2022
d28ccfa
add vector cli component, test=doc
LeoMax-Xiong Mar 20, 2022
9c6735f
add vector voxceleb12 base mode url, test=doc
LeoMax-Xiong Mar 21, 2022
b9eafdd
change - to _ to distinguish field
LeoMax-Xiong Mar 21, 2022
9874fb7
add some comments in code
LeoMax-Xiong Mar 22, 2022
d85d1de
exec pre-commit in paddlespeech vector, test=doc
LeoMax-Xiong Mar 22, 2022
5221c27
add voxceleb dataset and trial info, test=doc
LeoMax-Xiong Mar 23, 2022
e2684e7
refactor the data prepare process
LeoMax-Xiong Mar 23, 2022
62cbce6
add vectorwrapper to extract audio embedding
LeoMax-Xiong Mar 24, 2022
0bb67d8
add vector cli unit test, test=doc
LeoMax-Xiong Mar 24, 2022
305bacd
Merge branch 'develop' into vox12
LeoMax-Xiong Mar 24, 2022
0f78d25
add vector cli batch and pipeline test demo, test=doc
LeoMax-Xiong Mar 24, 2022
3054659
remove debug info, test=doc
LeoMax-Xiong Mar 24, 2022
faf6b8d
add the vec cli test audio name, test=doc
LeoMax-Xiong Mar 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dataset/voxceleb/voxceleb1.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,4 +189,4 @@ def main():


if __name__ == '__main__':
main()
main()
20 changes: 20 additions & 0 deletions examples/voxceleb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,23 @@ sv0 - speaker verfication with softmax backend etc, all python code

sv1 - dependence on kaldi, speaker verfication with plda/sc backend,
more info refer to the sv1/readme.txt


## VoxCeleb2 preparation
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved

VoxCeleb2 audio files are released in m4a format. All the VoxCeleb2 m4a audio files must be converted in wav files before feeding them in PaddleSpeech.
Please, follow these steps to prepare the dataset correctly:

1. Download Voxceleb2.
You can find download instructions here: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/

2. Convert .m4a to wav
VoxCeleb2 stores files with the m4a audio format. To use them in PaddleSpeech, you have to convert all the m4a audio files into wav files.

``` shell
ffmpeg -y -i %s -ac 1 -vn -acodec pcm_s16le -ar 16000 %s
```

You can do the conversion using ffmpeg https://gist.github.com/seungwonpark/4f273739beef2691cd53b5c39629d830). This operation might take several hours and should be only once.

3. Put all the wav files in a folder called `wav`. You should have something like `voxceleb2/wav/id*/*.wav` (e.g, `voxceleb2/wav/id00012/21Uxsk56VDQ/00001.wav`)
52 changes: 52 additions & 0 deletions examples/voxceleb/sv0/conf/ecapa_tdnn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
###########################################
# Data #
###########################################
# we should explicitly specify the wav path of vox2 audio data converted from m4a
vox2_base_path:
zh794390558 marked this conversation as resolved.
Show resolved Hide resolved
augment: True
batch_size: 16
num_workers: 2
num_speakers: 7205 # 1211 vox1, 5994 vox2, 7205 vox1+2, test speakers: 41
shuffle: True
random_chunk: True

###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
# currently, we only support fbank
sample_rate: 16000
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
n_mels: 80
window_size: 400 #25ms, sample rate 16000, 25 * 16000 / 1000 = 400
hop_length: 160 #10ms, sample rate 16000, 10 * 16000 / 1000 = 160
zh794390558 marked this conversation as resolved.
Show resolved Hide resolved

###########################################################
# MODEL SETTING #
###########################################################
# currently, we only support ecapa-tdnn in the ecapa_tdnn.yaml
# if we want use another model, please choose another configuration yaml file
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
model:
input_size: 80
# "channels": [512, 512, 512, 512, 1536],
channels: [1024, 1024, 1024, 1024, 3072]
kernel_sizes: [5, 3, 3, 3, 1]
dilations: [1, 2, 3, 4, 1]
attention_channels: 128
lin_neurons: 192

###########################################
# Training #
###########################################
seed: 1986 # according from speechbrain configuration
epochs: 10
save_interval: 1
log_interval: 1
learning_rate: 1e-8


###########################################
# Testing #
###########################################
global_embedding_norm: True
embedding_mean_norm: True
embedding_std_norm: False

18 changes: 18 additions & 0 deletions examples/voxceleb/sv0/local/data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

stage=-1
stop_stage=100

. ${MAIN_ROOT}/utils/parse_options.sh || exit -1;

dir=$1
conf_path=$2
mkdir -p ${dir}

if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
# data prepare for vox1 and vox2, vox2 must be converted from m4a to wav
# we should use the local/convert.sh convert m4a to wav
python3 local/data_prepare.py \
--data-dir ${dir} \
--config ${conf_path}
fi
71 changes: 71 additions & 0 deletions examples/voxceleb/sv0/local/data_prepare.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os

import paddle
from yacs.config import CfgNode

from paddleaudio.datasets.voxceleb import VoxCeleb
from paddlespeech.s2t.utils.log import Log
from paddlespeech.vector.io.augment import build_augment_pipeline
from paddlespeech.vector.training.seeding import seed_everything

logger = Log(__name__).getlog()


def main(args, config):

# stage0: set the cpu device, all data prepare process will be done in cpu mode
paddle.set_device("cpu")
# set the random seed, it is a must for multiprocess training
seed_everything(config.seed)

# stage 1: generate the voxceleb csv file
# Note: this may occurs c++ execption, but the program will execute fine
# so we ignore the execption
# we explicitly pass the vox2 base path to data prepare and generate the audio info
logger.info("start to generate the voxceleb dataset info")
train_dataset = VoxCeleb(
zh794390558 marked this conversation as resolved.
Show resolved Hide resolved
'train', target_dir=args.data_dir, vox2_base_path=config.vox2_base_path)

# stage 2: generate the augment noise csv file
if config.augment:
logger.info("start to generate the augment dataset info")
augment_pipeline = build_augment_pipeline(target_dir=args.data_dir)
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved


if __name__ == "__main__":
# yapf: disable
parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--data-dir",
default="./data/",
type=str,
help="data directory")
parser.add_argument("--config",
default=None,
type=str,
help="configuration file")
args = parser.parse_args()
# yapf: enable

# https://yaml.org/type/float.html
config = CfgNode(new_allowed=True)
if args.config:
config.merge_from_file(args.config)

config.freeze()
print(config)

main(args, config)
13 changes: 13 additions & 0 deletions examples/voxceleb/sv0/local/emb.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash
. ./path.sh

exp_dir=exp/ecapa-tdnn-vox12-big//epoch_10/ # experiment directory
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
conf_path=conf/ecapa_tdnn.yaml
audio_path="demo/voxceleb/00001.wav"

source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;

# extract the audio embedding
python3 ${BIN_DIR}/extract_emb.py --device "gpu" \
--config ${conf_path} \
--audio-path ${audio_path} --load-checkpoint ${exp_dir}
8 changes: 8 additions & 0 deletions examples/voxceleb/sv0/local/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
dir=$1
exp_dir=$2
conf_path=$3
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved

python3 ${BIN_DIR}/test.py \
--config ${conf_path} \
--data-dir ${dir} \
--load-checkpoint ${exp_dir}
22 changes: 22 additions & 0 deletions examples/voxceleb/sv0/local/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

dir=$1
exp_dir=$2
conf_path=$3

LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo "using $ngpu gpus..."

# train the speaker identification task with voxceleb data
# Note: we will store the log file in exp/log directory
python3 -m paddle.distributed.launch --gpus=$CUDA_VISIBLE_DEVICES \
${BIN_DIR}/train.py --device "gpu" --checkpoint-dir ${exp_dir} --augment \
--data-dir ${dir} --config ${conf_path}


if [ $? -ne 0 ]; then
echo "Failed in training!"
exit 1
fi

exit 0
28 changes: 28 additions & 0 deletions examples/voxceleb/sv0/path.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
export MAIN_ROOT=`realpath ${PWD}/../../../`

export PATH=${MAIN_ROOT}:${MAIN_ROOT}/utils:${PATH}
export LC_ALL=C

export PYTHONDONTWRITEBYTECODE=1
# Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
export PYTHONIOENCODING=UTF-8
export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/
zh794390558 marked this conversation as resolved.
Show resolved Hide resolved

MODEL=ecapa_tdnn
export BIN_DIR=${MAIN_ROOT}/paddlespeech/vector/exps/${MODEL}
71 changes: 71 additions & 0 deletions examples/voxceleb/sv0/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/bin/bash
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

. ./path.sh
set -e

#######################################################################
# stage 0: data prepare, including voxceleb1 download and generate {train,dev,enroll,test}.csv
# voxceleb2 data is m4a format, so we need user to convert the m4a to wav yourselves as described in Readme.md with the script local/convert.sh
# stage 1: train the speaker identification model
# stage 2: test speaker identification
# stage 3: extract the training embeding to train the LDA and PLDA
######################################################################

# we can set the variable PPAUDIO_HOME to specifiy the root directory of the downloaded vox1 and vox2 dataset
# default the dataset will be stored in the ~/.paddleaudio/
# the vox2 dataset is stored in m4a format, we need to convert the audio from m4a to wav yourself
# and put all of them to ${PPAUDIO_HOME}/datasets/vox2
# we will find the wav from ${PPAUDIO_HOME}/datasets/vox1/wav and ${PPAUDIO_HOME}/datasets/vox2/wav
# export PPAUDIO_HOME=
stage=0
stop_stage=50

# data directory
# if we set the variable ${dir}, we will store the wav info to this directory
# otherwise, we will store the wav info to vox1 and vox2 directory respectively
# vox2 wav path, we must convert the m4a format to wav format
# dir=data-demo/ # data info directory
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
dir=demo/ # data info directory

exp_dir=exp/ecapa-tdnn-vox12-big// # experiment directory
conf_path=conf/ecapa_tdnn.yaml
gpus=0,1,2,3

source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;

mkdir -p ${exp_dir}

if [ $stage -le 0 ] && [ ${stop_stage} -ge 0 ]; then
# stage 0: data prepare for vox1 and vox2, vox2 must be converted from m4a to wav
# and we should specifiy the vox2 data in the data.sh
bash ./local/data.sh ${dir} ${conf_path}|| exit -1;
fi

if [ $stage -le 1 ] && [ ${stop_stage} -ge 1 ]; then
# stage 1: train the speaker identification model
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
CUDA_VISIBLE_DEVICES=${gpus} bash ./local/train.sh ${dir} ${exp_dir} ${conf_path}
fi

if [ $stage -le 2 ]; then
# stage 2: get the speaker verification scores with cosine function
# now we only support use cosine to get the scores
CUDA_VISIBLE_DEVICES=0 bash ./local/test.sh ${dir} ${exp_dir} ${conf_path}
fi

# if [ $stage -le 3 ]; then
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
# # stage 2: extract the training embeding to train the LDA and PLDA
# # todo: extract the training embedding
# fi
1 change: 1 addition & 0 deletions examples/voxceleb/sv0/utils
2 changes: 2 additions & 0 deletions paddleaudio/paddleaudio/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@
from .gtzan import GTZAN
from .tess import TESS
from .urban_sound import UrbanSound8K
from .voxceleb import VoxCeleb
from .rirs_noises import OpenRIRNoise
LeoMax-Xiong marked this conversation as resolved.
Show resolved Hide resolved
Loading