-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) #1179
Conversation
For AVX2/AVX/scalar, we might want to keep I'm actually surprised that they're worth using on ARM NEON, as the alternative is simply subtracting 8 from the Q4 quants. |
@sw there is no noticeable difference difference between the two. Still, changed to use |
I guess it's not finished? You're using |
Wow - this is difficult 😄 I keep messing up something |
Looks good now; I think it's very slightly slower for Q4_0 and Q4_2 because we're now missing the SIMD optimizations for |
Ok, will merge now and we can finish the AVX stuff from |
8-bit integer quantization support
Perplexity:
5.9563