GGUF appears to run much slower compared to same version of GGML model #959

J-Scott-Dav · 2023-11-30T19:01:16Z

J-Scott-Dav
Nov 30, 2023

I am comparing performance of similar models pulled from HuggingFace. Using the question "Name the planets in the solar system?" I found a striking difference in performance. I was wondering if someone could comment on my observations below. Is this a fair comparison? Should these performance differences be expected, or perhaps I am doing something wrong?

Model Name: wizardlm-13b-v1.1-superhot-8k-ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python version: 0.1.78
Time to answer question: 7.9 seconds

Model Name: wizardlm-13b-v1.2.ggmlv3.q4_0.bin
File size: 7.1 GB
llama-cpp-python-version: 0.1.78
Time to answer question: 15.6 seconds

Model Name: wizardlm-13b-v1.2.q4_k_m.gguf
File size: 7.6 GB
llama-cpp-python-version: 0.2.12
Time to answer question: 64.4 seconds

Any comments/suggestions would be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF appears to run much slower compared to same version of GGML model #959

{{title}}

Replies: 0 comments

Select a reply

GGUF appears to run much slower compared to same version of GGML model #959

J-Scott-Dav Nov 30, 2023

Replies: 0 comments

J-Scott-Dav
Nov 30, 2023