-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : reuse quantum structs across backends #5943
Conversation
40b0a47
to
7a8c050
Compare
I wonder if this change is worth it. @slaren what do you think? We reduce code duplication, but maybe the macros are bit too much. Any better way to do this? |
I am not sure if this would be better, but it is also possible to access the data as a |
c5fa4ed
to
f33ab14
Compare
Yup, that's an option, but I'm also not sure it would be better I'm 50/50 about merging this. We can potentially adopt this pattern and use it for other stuff that can be shared by introducing more |
6075b27
to
7741456
Compare
Avoiding code duplication is definitely good. We can always iterate over it in the future if we figure a cleaner way to do this. |
|
||
#define QK4_0 32 | ||
#define QI4_0 (QK4_0 / (4 * QR4_0)) | ||
#define QR4_0 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these CUDA-specific macros (QIX, QRX
) shouldn't be propagated to the other backends. If it so happens that another back-end uses these, it would be better to just duplicate there.
ed01e0c
to
dca5020
Compare
Ok, will merge this after the CI is green |
These thread sanitizer failures seem unrelated to our changes: https://github.com/ggerganov/llama.cpp/actions/runs/8246447769/job/22552498578?pr=5943#step:5:27 Maybe related to this: google/sanitizers#1716? |
* ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci
* ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci
* ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci
Trying to see if we can have the quantum structs declared in a single place (e.g.
ggml-common.h
)#define GGML_COMMON_AGGR data
is needed becausenvcc
does not allow members with constructors (half
) in anonymousstruct
ggml-common.h
is structured in such a way to allow the following usage:block_q8_1
now usesfp16
instead offp32
ford
ands
. This was already the case for CUDA and to make things similar, I decided to apply the same format on the CPU. This type is used only to quantize activations, so it is not breaking any existing modelsTODO:
Vulkan(might not be applicable here)Kompute(might not be applicable here)