-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support training a student model from an already existing teacher model #180
Comments
I'm already testing the GreenNLP fork. My concern here is that the quality of OPUS models might not be good enough. According to their website, they don't use backtranslations. If we can train better quality models from scratch, the pre-trained OPUS models would be useful only to expand coverage faster or maybe as backward models. |
Hi, the most current OPUS models are actually in a different repository (I know this is confusing, sorry about that): https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models. These newer models generally include backtranslated data (if the model name contains +bt), and there are also transformer-big models available. You can see comparisons of different models (with model links) here: https://opus.nlpl.eu/leaderboard/ |
This is a great insight, thank you @TommiNieminen! |
I trained a student model with this config:
and got the following BLEU scores for flores-devtest: I don't know if it's directly comparable with the model metrics on OPUS dashboard because evaluation procedure and BLEU settings might be different. The comparison with the cloud APIs is here and quality looks ok to have it as a dev model but might not be sufficient to release in prod. I'll proceed with merging the GreenNLP fork to a separate branch to address the issues important for us and then we can continue experimenting with this. @TommiNieminen do you think I picked the right base model for Finnish language? Maybe you have other ideas on how to get better quality? I'm also planning to train the opposite direction and Swedish. |
We have the functionality to use pre-trained models or fine-tune them now. They should be compatible with our architecture though. I don't think we're planning on using OPUS-MT models at this point since we generally train higher-quality models from scratch. |
This will greatly speed us up in covering more languages (by using existing open source models as teacher models, depending on results from #179) and making improvements to languages that we already trained (finetuning them).
See also #117.
The text was updated successfully, but these errors were encountered: