Hindi Finetuning Not supported yet? #424

Mark2619 · 2024-11-27T04:29:46Z

Hi, just wanted to finetune a model and i see there is no Hindi language in the finetune training tab. Is it not available yet?

erew123 · 2024-11-27T05:15:27Z

@Mark2619 Have posted an update, so you will need to git pull

You would have to use an XTTS 2.0.3 model, the earlier ones dont support Hindi.

The tokenizers were only extended now to fully support Hindi training 1x month ago. I have not tested, so if it goes wrong/doesn't work, your best bet would be to go here https://github.com/idiap/coqui-ai-TTS/issues where they work keeping the back end Coqui scripts alive and working.

Thanks

Mark2619 · 2024-11-27T05:46:59Z

thanks for the quick reply, even after pull i don't see Hindi option to process dataset.

erew123 · 2024-11-27T05:50:40Z

Its definitely there This is the commit: af29b1e

You can always manually download from here https://github.com/erew123/alltalk_tts/blob/alltalkbeta/finetune.py down arrow two over from the word "raw" at the top right

Mark2619 · 2024-11-27T07:18:38Z

Oh yes manually downloading did work, but i'm getting some error."

To create a public link, set share=True in launch().
[FINETUNE] [INFO] Initializing output directory: F:\VOICEOP\ModelTraining\alltalk_tts\finetune\mark
[FINETUNE] [MODEL] Using device: cuda
[FINETUNE] [MODEL] Loading Whisper model: large-v3
[FINETUNE] [MODEL] Using mixed precision
[FINETUNE] [MODEL] Initializing Silero VAD
Using cache found in C:\Users\jhaka/.cache\torch\hub\snakers4_silero-vad_master
[FINETUNE] [INFO] Updated language to: hi
[FINETUNE] [INFO] Processing: Mark_Cleaned_22050
Traceback (most recent call last):
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1241, in format_audio_list
process_transcription_result(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1824, in process_transcription_result
save_audio_segment(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1719, in save_audio_segment
sas_sentence = multilingual_cleaners(sas_sentence, sas_target_language)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 584, in multilingual_cleaners
text = expand_numbers_multilingual(text, lang)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 564, in expand_numbers_multilingual
text = re.sub(_ordinal_re[lang], lambda m: _expand_ordinal(m, lang), text)
~~~~~~~~~~~^^^^^^
KeyError: 'hi'
Traceback (most recent call last):
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\queueing.py", line 521, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1955, in process_api
data, changed_state_ids = await self.postprocess_data(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1718, in postprocess_data
self.validate_outputs(block_fn, predictions) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1684, in validate_outputs
raise ValueError(
ValueError: An event handler (preprocess_dataset) didn't receive enough output values (needed: 6, received: 3).
Wanted outputs:
[<gradio.components.label.Label object at 0x0000015544E13710>, <gradio.components.textbox.Textbox object at 0x000001554164EA90>, <gradio.components.textbox.Textbox object at 0x0000015544D625D0>, <gradio.components.textbox.Textbox object at 0x0000015544E44E10>, <gradio.components.textbox.Textbox object at 0x000001554663CDD0>, <gradio.components.textbox.Textbox object at 0x0000015546695AD0>]
Received outputs:
["The data processing was interrupted due to an error!! Please check the console to verify the full error message!
Error summary: Traceback (most recent call last):
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1241, in format_audio_list
process_transcription_result(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1824, in process_transcription_result
save_audio_segment(
File "F:\VOICEOP\ModelTraining\alltalk_tts\finetune.py", line 1719, in save_audio_segment
sas_sentence = multilingual_cleaners(sas_sentence, sas_target_language)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 584, in multilingual_cleaners
text = expand_numbers_multilingual(text, lang)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\VOICEOP\ModelTraining\alltalk_tts\system\ft_tokenizer\tokenizer.py", line 564, in expand_numbers_multilingual
text = re.sub(_ordinal_re[lang], lambda m: _expand_ordinal(m, lang), text)
~~~~~~~~~~~^^^^^^
KeyError: 'hi'
", "", ""]

erew123 · 2024-11-27T09:02:41Z

Thanks for letting me know. As mentioned I would suggest you address this up the chain to the people whom maintain the Coqui TTS engine/scripts. https://github.com/idiap/coqui-ai-TTS/issues. Long and short of it is, this means that the Coqii training scripts don't know what "hi" the language code for Hindi is KeyError: 'hi'

If I get time in future I will look further into it, however as I say, they manage the core scripts so I have no control over that and Hindi support on XTTS was a pretty much undocumented feature by Coqui.

Mark2619 · 2024-11-27T09:06:03Z

Thanks, I'll take it up with coqui to further fix this issue. Just a quick question, if i process the data manually (split up audio and create metadata) the training part would still work fine, right? because right now the only issue I see is dataset creation.

erew123 · 2024-11-27T09:08:58Z

If it happened at that stage then thats Whisper. Which whisper model did you use? https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages

Mark2619 · 2024-11-27T09:09:54Z

large-v3 is the one i used

erew123 · 2024-11-27T09:18:54Z

It should have worked in theory. OpenAI's code accepts hi as hindi https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L28

And Large-V3 is multi-language.

There is nothing special about the AllTalk whisper installation, so its whatever OpenAI are supporting should work.

Yes you can build a dataset yourself. Using it is documented in the interface of finetuning (as I recall). The Coqui Trainer structure for the CSV files is detailed here https://docs.coqui.ai/en/latest/formatting_your_dataset.html

The simplest way to understand it though, would be to quickly do step 1 as an English or other language set which will build the files for you to see how they are laid out on disk and within the CSV files.

Id suggest you make a small Hindi dataset after that for testing purposes. You will need to change the language in the lang.txt file to hi (again documented in the finetuning interface). Then pick up from Step 2 to actually train. But as I say, make a small dataset first just to see what happens, as I cant say if it will work or not and there is no point wasting 1hr+ just to find it fails.

Mark2619 · 2024-11-27T09:29:26Z

Thanks again for the detailed info, I will try to train on a small dataset and update here for anyone else looking to see if it works or not. Cheers!

erew123 closed this as completed Nov 27, 2024

erew123 mentioned this issue Dec 1, 2024

Improve audio splitting in dataset generation #419

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hindi Finetuning Not supported yet? #424

Hindi Finetuning Not supported yet? #424

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

Hindi Finetuning Not supported yet? #424

Hindi Finetuning Not supported yet? #424

Comments

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024

erew123 commented Nov 27, 2024

Mark2619 commented Nov 27, 2024