You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm aware the translation models are trained on training data from the OPUS corpus.
But for me it's very unclear on how much data exactly they have trained these models and whether they have used ALL available data from the OPUS corpus given the language directions.
Does it make sense to download OPUS data and further finetune these models?
Does it make sense to find other data sources and finetune the models? If so, how much sentence pairs (approximately) do I need to see an improvement?
I'm particularly interested in finetuning "Helsinki-NLP/opus-mt-nl-en" and "Helsinki-NLP/opus-mt-en-nl"
.
The text was updated successfully, but these errors were encountered:
Yes, more or less all data in OPUS at that time of training. I am not sure about fine-tuning. It may also forget about previously learned information. You could continue training with some larger data set but then you may need some longer warm-up time as well to get the optimizer back on track.
I'm aware the translation models are trained on training data from the OPUS corpus.
But for me it's very unclear on how much data exactly they have trained these models and whether they have used ALL available data from the OPUS corpus given the language directions.
Does it make sense to download OPUS data and further finetune these models?
Does it make sense to find other data sources and finetune the models? If so, how much sentence pairs (approximately) do I need to see an improvement?
I'm particularly interested in finetuning "Helsinki-NLP/opus-mt-nl-en" and "Helsinki-NLP/opus-mt-en-nl"
.
The text was updated successfully, but these errors were encountered: