-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : remove bit shuffling #1405
Conversation
This reverts commit 948d124.
**Hot topics:** | ||
|
||
- Qauntization formats `Q4` and `Q5` have changed - requantize any old models [(info)](https://github.com/ggerganov/llama.cpp/pull/1405) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "qauntization" a typo? 🤔
Is there a script to upgrade the old models to new? I don't have the source models because they're huge. |
#1384 does not work for NEON because when we remove the This is the relevant section before this PR: Lines 3335 to 3360 in b608b55
We were ORing the 5th bit after the |
Hello All Could someone please share the way of requantizing? upd: but seems man README.md has lack of !q4 variants and sheet matching -n value with quantization of selected model |
Doesn't seem like it though, I don't see references where File version 2 is set during quantization either. Edit: i'm wrong. It's set in |
Check my repos again. I've re-quantised all my GGMLs using the latest code, in q4_0, q5_0, q5_1 and q8_0 variants. So no need to do it yourself unless you want to. |
Close #1241
Q4_2
supportQ4
andQ5
(breaking change)New timings:
Old timings:
overall, all these numbers seem to have about +/- 10% variablility from run to run. not ideal benchmark, but not sure what else to do