Use Merlin toolkit to convert .wav files to .gcm files, and modified original MCD implement.
- install Merlin toolkit as https://github.com/CSTR-Edinburgh/merlin taught.(to extract some useful features like .bap/.lf0/.mgc)
- cd into merlin/tool , and install as new script under that file
- cd into merlin/tools/SPTK*3.9, follow INSTALL md taught( because we mainly use sptk toolkit to extract features)
- cd merlin/egs/voice_conversion/s1
- ./01_setup.sh speakerA speakerB (it will mkdir to files named database/speakerA && apeakerB)
-
./02_prepare_acoustic_features.sh <path_to_wav_dir> <path_to_feat_dir> (you need to mkdir two new folder to contain the features to be extracted, recommend to build under database folder)
-
then u got two .mgc files
- install MCD tools as https://github.com/MattShannon/mcd/tree/c86266a2caf6a7cb248ea89ea56f90fd161a297e taguht
10 modify some codes in htk_io/vecseq.py, as picture show:
def readFile(self, vecSeqFile):
"""Reads a raw vector sequence file.
The dtype of the returned numpy array is always the numpy default
np.float, which may be 32-bit or 64-bit depending on architecture, etc.
"""
Vec = np.fromfile(vecSeqFile, dtype=self.dtypeFile)
lengthOfVec = len(Vec)
misLenToPad = lengthOfVec % self.vecSize
means = np.mean(Vec)
for i in range(misLenToPad):
Vec = np.insert(Vec, lengthOfVec, means)
return np.reshape(
Vec,
(-1, self.vecSize)
).astype(np.float)
# return np.reshape(
# np.fromfile(vecSeqFile, dtype=self.dtypeFile),
# (-1, self.vecSize)
# ).astype(np.float)
up modified code is to solve reshape problem during read .mgc files data: original auther's algorithm here make me always counter with error: (if this pic can't perform normal, use this site: https://blog-1301959139.cos.ap-beijing.myqcloud.com/picGo/20200718155714.png )
-
Here my solution is pad the mean number of .mgc into source .mgc so as to make it could be % by 40 dimension.
-
this change work well on all kind of wav files (Notice that : Merlin only accept 16bit format of wav, you can change this parameter by Audition or 'sox' toolkit)
-
Last but not least: we would better use 16k sample rate as our wav sr. In exp, 44100hz's MCD would be 16+ when it just 13+ in 16k hz
14 Ps. one more thing puzzled me: i use stargan-vc2's original demo .wav files for testing, which authority told that MCD is only 6.+ in their paper, however i calculate then as 13+.
In another hand, i use my personal VoiceConversion tast final result for testing, can get 7.1+score, which makes sense (MCD should be among [4, 8], the less the better )
- Any question, or any improment suggestion, welcome to make issues!
16.one more details to notice: under MCD folder, should copy all files under bin out:
and then can put ur original .mgc in test_data/ref-examples/ folder, converted/synthesized .mgc in test_data/synth-examples/ folder 🌟These two files name should be the same!
modified contents in corpus.lst under test_data/ folder as : <your .mgc files name, without .mgc>
then use cammand: ** cat test_data/corpus.lst | xargs dtw_synth test_data/ref-examples test_data/synth-examples out **
☑️ 🌟 To do , release some examples