You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking for a guide or example on how to train the model and tokenizer on a new language. Any language that wasn't pre-trained/not listed in the tokenizers list of languages. There's a similar thread, but it's about fine-tuning on a locale of a language that's already pre-trained.
Edit: I tried to fine-tune, but it won't work if the language isn't in the list of supported languages. And I can't use any of the current languages because they're not the same.
[/usr/local/lib/python3.10/dist-packages/transformers/models/whisper/tokenization_whisper.py](https://localhost:8080/#) in prefix_tokens()418else:
419is_language_code=len(self.language) ==2-->420raiseValueError(
421f"Unsupported language: {self.language}. Language should be one of:"422f" {list(TO_LANGUAGE_CODE.values()) ifis_language_codeelselist(TO_LANGUAGE_CODE.keys())}."
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm looking for a guide or example on how to train the model and tokenizer on a new language. Any language that wasn't pre-trained/not listed in the tokenizers list of languages. There's a similar thread, but it's about fine-tuning on a locale of a language that's already pre-trained.
Edit: I tried to fine-tune, but it won't work if the language isn't in the list of supported languages. And I can't use any of the current languages because they're not the same.
Beta Was this translation helpful? Give feedback.
All reactions