Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[
core
/Quantization
] AWQ integration #27045[
core
/Quantization
] AWQ integration #27045Changes from 5 commits
efb1de8
a0eee34
a06a604
33f1134
4b29b0b
ddc3ea2
13abcf2
1561463
433148e
ee4d301
d8616fb
42cc2e5
fd172bd
5e155e1
4a1413e
014d901
39b4e6a
3dd93dd
96bac28
d58c461
150a5ec
477e17d
717d044
d73fe54
21571c5
5c146bd
23f23df
5b44946
8fc16ab
a33d17c
b252bd7
010de2e
75a9a55
6c28089
7f30911
21b25a1
576b69d
1c19912
02aa242
3dbdd2c
90f0ea5
f271173
027f76a
2940188
7bc3549
1008152
d585182
b19ba13
963676e
a50639b
249492d
d870195
bd6a90f
702f990
1e03581
e97026c
e4dcaca
bd3c37a
c353ace
fa38b26
53957f0
790b2fe
6bdf5de
64cbf01
88aec66
97d335a
b0e2868
597ab7f
79cbbd3
ebf58e1
df9e691
3f4b1a1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the
version
in config isGEMV
, this will fail to load that version. Would it be appropriate to get the WQLinear based on the version in the config?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know WDYT of 13abcf2 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, and should work as intended. If you want to test GEMV without quantizing yourself, I have quantized Vicuna 7B v1.5 in GEMV:
https://huggingface.co/casperhansen/vicuna-7b-v1.5-awq-gemv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
modules_to_not_convert=["lm_head"]
may end up causing issues eventually if the head is not named "lm_head" exactly. The way we deal with avoiding this is by only looking at the decoder layers of the model in AutoAWQ by calling the model'sget_model_layers()
function (e.g. llama)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! For bnb we have the same issue and use this method: https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/bitsandbytes.py#L243 which should be quite generic for most
transformers
models. I planned to use that instead