Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What could cause widely varying inference time when using pre-trained opus-mt-en-fr model with python transformers library? #80

Open
shandou opened this issue Sep 8, 2022 · 2 comments

Comments

@shandou
Copy link

shandou commented Sep 8, 2022

I have been testing pre-trained Opus-MT models ported to transformers library for python implementation. Specifically, I am using opus-mt-en-fr for English to French translation. And the tokenizer and translation model is loaded via MarianTokenizer and MarianMTModels--similar to code examples shown here on huggingface. Strangely, for the same pre-trained model translating the same English input on an identical machine, I have observed anywhere between 80+ ms and (whopping) 4 s per translation (example input = "kiwi strawberry").

Wonder if anyone has observed similar behaviours, and what could cause such a wide variation? Thank you very much!

@jorgtied
Copy link
Member

Maybe asking people at huggingface and the transformers git repo would help?

@artyomboyko
Copy link

Good afternoon. Hypothetically, maybe the CPU or GPU load affected the performance of the model? Have you tried to monitor the load on the hardware component while performing measurements?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants