Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[make_voxceleb1.pl] Fix train/test data split for the latest version of the voxceleb1 dataset #3249

Merged
merged 8 commits into from
Apr 19, 2019

Conversation

sunshines14
Copy link
Contributor

@sunshines14 sunshines14 commented Apr 18, 2019

I found that previous scripts do not work with the latest version of the voxceleb1 dataset.
Therefore, I fixed the script for the latest version as follows:

  1. 'url' update to download the latest version (from official site)
  2. Delete the script about 'vox1_meta.csv' no longer needed
  3. Fix the code for the latest version's output (variable name equals make_voxceleb2.pl)

It is confirmed that successful output can be obtained from the script.
ex) voxceleb1_test -- spk2utt, trials, utt2spk, wav.scp

@sunshines14
Copy link
Contributor Author

sunshines14 commented Apr 18, 2019

(+) if previous script is still needed, I can use this modified script to another version (ex. v2) rather than replacing the existing script.

@danpovey
Copy link
Contributor

Thanks a lot!! @david-ryan-snyder can you please check this?

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Apr 18, 2019

Thanks @sunshines14.

Could you do what you offered and copy this to a new script called make_voxceleb1_v2.pl?

Then, in the run.sh script, make the v2 script the default one. Comment out the the old version of the script, and above it write a short comment describing the situation. E.g., mention that if you downloaded the dataset soon after it was released, you will want to use the make_voxceleb1.pl script instead.

@sunshines14
Copy link
Contributor Author

sunshines14 commented Apr 19, 2019

Thanks @david-ryan-snyder.

I fixed some code as you commented as follows:

  1. I have made a new script called 'make_voxceleb1_v2.pl'.
  2. In the run.sh script, the v2 script used as the default one.
  3. In addition, I have made the code simpler in 'make_voxceleb1_v2.pl'.

It was reconfirmed that successful output can be obtained from all of fixed scripts.
Thanks.

@david-ryan-snyder
Copy link
Contributor

Thanks @sunshines14! I just suggested you credit your work in the v2 perl script and fix a preexisting typo. Then we can merge it.

@sunshines14
Copy link
Contributor Author

I did it all.
Thanks @david-ryan-snyder.

@david-ryan-snyder
Copy link
Contributor

Thanks @sunshines14, looks good to me. @danpovey, I think it's fine to merge this.

@danpovey danpovey merged commit c3260f2 into kaldi-asr:master Apr 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants