How to run LLaMa inference? #243
Unanswered
SinanAkkoyun
asked this question in
Q&A
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello!
How is it possible to run LLaMa models with the great FP8 inference speedup?
Would one need to train a new LLM from scratch or is it possible to convert existing models with the same accuracy?
Thank you very much and thank you for all the awesome work!
Beta Was this translation helpful? Give feedback.
All reactions