-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement T5 decoding #864
Conversation
Looks great, thanks for adding this! |
Just to mention that caching for the encoder output and the decoder kv cache have just been added and speed things up quite significantly. Thanks again @jbochi for adding this, looking forward to more models being added! |
Thanks for adding these optimizations! I am happy I could contribute.
Candle is a really nice framework.
…On Sun, Sep 17, 2023, 5:29 AM Laurent Mazare ***@***.***> wrote:
Just to mention that caching for the encoder output and the decoder kv
cache have just been added and speed things up quite significantly. Thanks
again @jbochi <https://github.com/jbochi> for adding this, looking
forward to more models being added!
—
Reply to this email directly, view it on GitHub
<#864 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACHO2E66OB7ZGHBIKIWUSDX227GRANCNFSM6AAAAAA4Z74KJ4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Something seems off with the cache. With temperature zero and no cache, for the prompt First difference is in "mississippipi" Edit: I opened #892 to add the option of disabling the cache. |
T5 can be used for several tasks out of the box, such as translation and summarization, as requested in #543.
Translation to German:
Perhaps this is not the best example of summarization, but it matches the output from huggingface/transformers:
I have also compared the output of the last hidden state to the output from the torch-based implementation.
This is terribly slow for larger models because I didn't implement any optimizations: