GGUF appears to run much slower compared to same version of GGML model #959
Unanswered
J-Scott-Dav
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am comparing performance of similar models pulled from HuggingFace. Using the question "Name the planets in the solar system?" I found a striking difference in performance. I was wondering if someone could comment on my observations below. Is this a fair comparison? Should these performance differences be expected, or perhaps I am doing something wrong?
Model Name: wizardlm-13b-v1.1-superhot-8k-ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python version: 0.1.78
Time to answer question: 7.9 seconds
Model Name: wizardlm-13b-v1.2.ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python-version: 0.1.78
Time to answer question: 15.6 seconds
Model Name: wizardlm-13b-v1.2.q4_k_m.gguf
File size: 7.6 GB
llama-cpp-python-version: 0.2.12
Time to answer question: 64.4 seconds
Any comments/suggestions would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions