Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hindi Finetuning Not supported yet? #424

Closed
Mark2619 opened this issue Nov 27, 2024 · 10 comments
Closed

Hindi Finetuning Not supported yet? #424

Mark2619 opened this issue Nov 27, 2024 · 10 comments

Comments

@Mark2619
Copy link

Hi, just wanted to finetune a model and i see there is no Hindi language in the finetune training tab. Is it not available yet?

@erew123
Copy link
Owner

erew123 commented Nov 27, 2024

@Mark2619 Have posted an update, so you will need to git pull

You would have to use an XTTS 2.0.3 model, the earlier ones dont support Hindi.

The tokenizers were only extended now to fully support Hindi training 1x month ago. I have not tested, so if it goes wrong/doesn't work, your best bet would be to go here https://github.com/idiap/coqui-ai-TTS/issues where they work keeping the back end Coqui scripts alive and working.

image

Thanks

@Mark2619
Copy link
Author

thanks for the quick reply, even after pull i don't see Hindi option to process dataset.
training

@erew123
Copy link
Owner

erew123 commented Nov 27, 2024

Its definitely there This is the commit: af29b1e

image

image

You can always manually download from here https://github.com/erew123/alltalk_tts/blob/alltalkbeta/finetune.py down arrow two over from the word "raw" at the top right

@Mark2619
Copy link
Author

Oh yes manually downloading did work, but i'm getting some error."

To create a public link, set share=True in launch().
[FINETUNE] [INFO] Initializing output directory: F:\VOICEOP\ModelTraining\alltalk_tts\finetune\mark
[FINETUNE] [MODEL] Using device: cuda
[FINETUNE] [MODEL] Loading Whisper model: large-v3
[FINETUNE] [MODEL] Using mixed precision
[FINETUNE] [MODEL] Initializing Silero VAD
Using cache found in C:\Users\jhaka/.cache\torch\hub\snakers4_silero-vad_master
[FINETUNE] [INFO] Updated language to: hi
[FINETUNE] [INFO] Processing: Mark_Cleaned_22050
Traceback (most recent call last):
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1241, in format_audio_list
process_transcription_result(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1824, in process_transcription_result
save_audio_segment(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1719, in save_audio_segment
sas_sentence = multilingual_cleaners(sas_sentence, sas_target_language)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 584, in multilingual_cleaners
text = expand_numbers_multilingual(text, lang)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 564, in expand_numbers_multilingual
text = re.sub(_ordinal_re[lang], lambda m: _expand_ordinal(m, lang), text)
~~~~~~~~~~~^^^^^^
KeyError: 'hi'
Traceback (most recent call last):
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\queueing.py", line 521, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1955, in process_api
data, changed_state_ids = await self.postprocess_data(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1718, in postprocess_data
self.validate_outputs(block_fn, predictions) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1684, in validate_outputs
raise ValueError(
ValueError: An event handler (preprocess_dataset) didn't receive enough output values (needed: 6, received: 3).
Wanted outputs:
[<gradio.components.label.Label object at 0x0000015544E13710>, <gradio.components.textbox.Textbox object at 0x000001554164EA90>, <gradio.components.textbox.Textbox object at 0x0000015544D625D0>, <gradio.components.textbox.Textbox object at 0x0000015544E44E10>, <gradio.components.textbox.Textbox object at 0x000001554663CDD0>, <gradio.components.textbox.Textbox object at 0x0000015546695AD0>]
Received outputs:
["The data processing was interrupted due to an error!! Please check the console to verify the full error message!
Error summary: Traceback (most recent call last):
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1241, in format_audio_list
process_transcription_result(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1824, in process_transcription_result
save_audio_segment(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1719, in save_audio_segment
sas_sentence = multilingual_cleaners(sas_sentence, sas_target_language)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 584, in multilingual_cleaners
text = expand_numbers_multilingual(text, lang)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 564, in expand_numbers_multilingual
text = re.sub(_ordinal_re[lang], lambda m: _expand_ordinal(m, lang), text)
~~~~~~~~~~~^^^^^^
KeyError: 'hi'
", "", ""]

@erew123
Copy link
Owner

erew123 commented Nov 27, 2024

Thanks for letting me know. As mentioned I would suggest you address this up the chain to the people whom maintain the Coqui TTS engine/scripts. https://github.com/idiap/coqui-ai-TTS/issues. Long and short of it is, this means that the Coqii training scripts don't know what "hi" the language code for Hindi is KeyError: 'hi'

If I get time in future I will look further into it, however as I say, they manage the core scripts so I have no control over that and Hindi support on XTTS was a pretty much undocumented feature by Coqui.

@erew123 erew123 closed this as completed Nov 27, 2024
@Mark2619
Copy link
Author

Thanks, I'll take it up with coqui to further fix this issue. Just a quick question, if i process the data manually (split up audio and create metadata) the training part would still work fine, right? because right now the only issue I see is dataset creation.

@erew123
Copy link
Owner

erew123 commented Nov 27, 2024

If it happened at that stage then thats Whisper. Which whisper model did you use? https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages

@Mark2619
Copy link
Author

large-v3 is the one i used

@erew123
Copy link
Owner

erew123 commented Nov 27, 2024

It should have worked in theory. OpenAI's code accepts hi as hindi https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L28

And Large-V3 is multi-language.

There is nothing special about the AllTalk whisper installation, so its whatever OpenAI are supporting should work.

Yes you can build a dataset yourself. Using it is documented in the interface of finetuning (as I recall). The Coqui Trainer structure for the CSV files is detailed here https://docs.coqui.ai/en/latest/formatting_your_dataset.html

The simplest way to understand it though, would be to quickly do step 1 as an English or other language set which will build the files for you to see how they are laid out on disk and within the CSV files.

Id suggest you make a small Hindi dataset after that for testing purposes. You will need to change the language in the lang.txt file to hi (again documented in the finetuning interface). Then pick up from Step 2 to actually train. But as I say, make a small dataset first just to see what happens, as I cant say if it will work or not and there is no point wasting 1hr+ just to find it fails.

@Mark2619
Copy link
Author

Thanks again for the detailed info, I will try to train on a small dataset and update here for anyone else looking to see if it works or not. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants