スターGAN ボイスコンバージェンスメモ

StarGAN VC(pytorch版)リポジトリの日本語による説明．あとソースコードの微修正．

[Original Paper]
StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.

変換音声はconvertedディレクトリに保存される．

Dependencies

Python 3.6 (or higher)
pytorch 1.1.0 (see https://pytorch.org/)
tensorflow 2.0 (Be careful! Not use tensorflow 1.X)
librosa
pyworld
tensorboard 2.0
scikit-learn 0.21.3 (or higher)

NOTE:According to some feedbacks, we recommend to use tensorflow version 1.8 exactly. (Tensorflow 1.11 generate nonsense results)

Usage

Download dataset

vcc 2016データセットはここからダウンロードできるよ！

Data Share - SUPERSEDED - The Voice Conversion Challenge 2016
https://datashare.is.ed.ac.uk/handle/10283/2042

訓練データセット: ここからデータセットZIPファイルをダウンロードVCC training data: source and target evaluation data (7.357Mb)して解凍する．解凍して出てきたvcc2016_trainingをdataフォルダ内に保存．
テストデータセット: ここからデータセットZIPファイルをダウンロードVCC training data: evaluation data released to participants during the challenge (3.576Mb)して解凍する．解凍して出てきたevaluation_allをdataフォルダ内に保存．

保存したデータの一部を以下の訓練セットフォルダとテストセットフォルダに保存する．

訓練セット: ./data/vcc2016_trainingから４人のスピーカ(SF1,SF2,TM1,TM2)を選んで./data/speakersに保存する.
テストセット ./data/evaluation_allから４人のスピーカ(SF1,SF2,TM1,TM2)を選んで./data/speakers_testに保存する.

↓↓ 以下はフォルダ構成をまとめたもの．

data
├── speakers  (training set <  vcc2016_trainingから４つのフォルダをコピーしてぶち込む)
│   ├── SF1
│   ├── SF2
│   ├── TM1
│   └── TM2
├── speakers_test (testing set <  evaluation_allから４つのフォルダをコピーしてぶち込む)
│   ├── SF1
│   ├── SF2
│   ├── TM1
│   └── TM2
├── vcc2016_training (vcc2016_training.zipを解凍したときに出てくるフォルダ)
│   ├── SF1
|   ├── SF2
|   ├── SF3
│   ├── ...
├── evaluation_all (evaluation_all.zipを解凍したときに出てくるフォルダ)
│   ├── ...

Preprocess

はじめにpreprocess.pyを実行して音声クリップから特徴量(mcep：メルケプストラム，f0：基本周波数，ap：非周期性指標)を抽出する．それらの特徴量は.npyファイルにソートして保存される．preprocess.pyの実行コマンドは以下の通り．

python preprocess.py

約5~10分ぐらいかかるよ！ テスト時はpreprocessが必要ない！

Train

学習実行時，何エポックごとかにresult_***ディレクトリに学習データに対する処理結果が保存される． result_***にテストデータに対する処理結果(たぶん？)が保存される．実行コマンドは以下の通り．

python main.py

Convert

python main.py --mode test --test_iters 200000 --src_speaker TM1 --trg_speaker "['TM1','SF1']"

Summary

The network structure shown as follows:

Reference

CycleGAN-VC code

pytorch StarGAN-VC code

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
converted_speech		converted_speech
imgs		imgs
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
download.py		download.py
logger.py		logger.py
main.py		main.py
model.py		model.py
preprocess.py		preprocess.py
solver.py		solver.py
utility.py		utility.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

スターGAN ボイスコンバージェンスメモ

Dependencies

Usage

Download dataset

Preprocess

Train

Convert

Summary

Reference

About

Releases

Packages

Contributors 2

Languages

License

Shimamura-Lab-SU/SGV

Folders and files

Latest commit

History

Repository files navigation

スターGAN ボイスコンバージェンス メモ

Dependencies

Usage

Download dataset

Preprocess

Train

Convert

Summary

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

スターGAN ボイスコンバージェンスメモ

Packages