You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The translation result from English to Korean using the 'Helsinki-NLP/opus-mt-tc-big-en-ko' model does not make sense at all
from transformers import MarianMTModel, MarianTokenizer
src_text = [
"2, 4, 6 etc. are even numbers.",
"Yes."
]
tokenizer = MarianTokenizer.from_pretrained(MODEL_PATH3)
model = MarianMTModel.from_pretrained(MODEL_PATH3)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
for t in translated:
print( tokenizer.decode(t, skip_special_tokens=True) )
The result is not ['2, 4, 6 등은 짝수입니다.', '그래'] as in the example, but ['그들은,우리는,우리는 모자입니다. 신뢰할 수 있습니다.', 'ATP입니다.'] which does not make sense at all.
I tried some more sentences and believe that correct tokenizer or vocab file can correct this problem.
Could you take a look at it?
The text was updated successfully, but these errors were encountered:
regpath
changed the title
Wrong tokenizer for the 'Helsinki-NLP/opus-mt-tc-big-en-ko' model
Wrong tokenizer/vocab for the 'Helsinki-NLP/opus-mt-tc-big-en-ko' model
Sep 13, 2022
The translation result from English to Korean using the 'Helsinki-NLP/opus-mt-tc-big-en-ko' model does not make sense at all
The result is not ['2, 4, 6 등은 짝수입니다.', '그래'] as in the example, but ['그들은,우리는,우리는 모자입니다. 신뢰할 수 있습니다.', 'ATP입니다.'] which does not make sense at all.
I tried some more sentences and believe that correct tokenizer or vocab file can correct this problem.
Could you take a look at it?
The text was updated successfully, but these errors were encountered: