Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Qwen2Moe GGUF loading support #33264

Merged

Conversation

VladOS95-cyber
Copy link
Contributor

@VladOS95-cyber VladOS95-cyber commented Sep 2, 2024

What does this PR do?

Add Qwen2Moe GGUF loading support

Before submitting

Who can review?

Regarding the task @SunMarc @LysandreJik @ArthurZucker .

@VladOS95-cyber
Copy link
Contributor Author

VladOS95-cyber commented Sep 4, 2024

Hello @SunMarc @LysandreJik @ArthurZucker! I would like to ask for a code review

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this clean PR @VladOS95-cyber ! LGTM !

src/transformers/integrations/ggml.py Outdated Show resolved Hide resolved
@SunMarc SunMarc requested a review from ArthurZucker September 5, 2024 14:11
@VladOS95-cyber
Copy link
Contributor Author

VladOS95-cyber commented Sep 5, 2024

Thanks for this clean PR @vanpelt ! LGTM !

Thank you for review! Why @vanpelt? =))

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc
Copy link
Member

SunMarc commented Sep 5, 2024

Thank you for review! Why @vanpelt? =))

My bad, editted ;) usually the first name github suggests me is the author of the PR

@VladOS95-cyber
Copy link
Contributor Author

Thank you for review! Why @vanpelt? =))

My bad, editted ;) usually the first name github suggests me is the author of the PR

No worries at all :)

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VladOS95-cyber!

@LysandreJik LysandreJik merged commit 5d11de4 into huggingface:main Sep 5, 2024
23 checks passed
@vanpelt
Copy link
Contributor

vanpelt commented Sep 5, 2024

Anytime guy's 🤪. Nice work @VladOS95-cyber!

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Sep 6, 2024
* update gguf doc, config and tensor mapping

* add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests

* apply code style fixes

* reformat files

* assign GGUFQwen2Converter to qwen2_moe
itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024
* update gguf doc, config and tensor mapping

* add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests

* apply code style fixes

* reformat files

* assign GGUFQwen2Converter to qwen2_moe
@SunMarc
Copy link
Member

SunMarc commented Oct 2, 2024

I'm hitting an error on our CI and locally for this test test_qwen2_moe_q4_0 @VladOS95-cyber . Can you have a look?
Here's the traceback:

self = <ggml.test_ggml.GgufIntegrationTests testMethod=test_qwen2_moe_q4_0>

    def test_qwen2_moe_q4_0(self):
        tokenizer = AutoTokenizer.from_pretrained(self.qwen2_moe_model_id, gguf_file=self.q4_0_qwen2_moe_model_id)
        model = AutoModelForCausalLM.from_pretrained(
            self.qwen2_moe_model_id,
            gguf_file=self.q4_0_qwen2_moe_model_id,
            device_map="auto",
            torch_dtype=torch.float16,
        )

        text = tokenizer(self.example_text, return_tensors="pt").to(torch_device)
        out = model.generate(**text, max_new_tokens=10)

        EXPECTED_TEXT = "Hello everyone, I'm a newbie here and would like"
>       self.assertEqual(tokenizer.decode(out[0], skip_special_tokens=True), EXPECTED_TEXT)
E       AssertionError: 'Hello部分齐值得关注erc区域堪称 btnCancel跳舞�ASC' != "Hello everyone, I'm a newbie here and would like"
E       - Hello部分齐值得关注erc区域堪称 btnCancel跳舞ASC
E       + Hello everyone, I'm a newbie here and would like

tests/quantization/ggml/test_ggml.py:359: AssertionError

Also here's some interesting logs:

Some weights of Qwen2MoeForCausalLM were not initialized from the model checkpoint at RichardErkhov/Qwen_-_Qwen1.5-MoE-A2.7B-Chat-gguf and are newly initialized: ['model.layers.0.mlp.experts.0.down_proj.weight', 'model.layers.0.mlp.experts.0.gate_proj.weight', 'model.layers.0.mlp.experts.0.up_proj.weight', 'model.layers.0.mlp.experts.1.down_proj.weight', 'model.layers.0.mlp.experts.1.gate_proj.weight', 'model.layers.0.mlp.experts.1.up_proj.weight', 'model.layers.0.mlp.experts.10.down_proj.weight', 'model.layers.0.mlp.experts.10.gate_proj.weight', 'model.layers.0.mlp.experts.10.up_proj.weight', 'model.layers.0.mlp.experts.11.down_proj.weight', 'model.layers.0.mlp.experts.11.gate_proj.weight', 'model.layers.0.mlp.experts.11.up_proj.weight', 'model.layers.0.mlp.experts.12.down_proj.weight', 'model.layers.0.mlp.experts.12.gate_proj.weight', 'model.layers.0.mlp.experts.12.up_proj.weight', 
.....

Looks like we didn't manage to load the weights correctly. Also the GgufIntegrationTests.test_bloom_q8_0 test is also failing but it is easier to fix:

>       self.assertEqual(tokenizer.decode(out[0], skip_special_tokens=True), EXPECTED_TEXT)
E       AssertionError: 'Hello, I just want to say that I am just' != 'Hello, I just want to say that I am very'
E       - Hello, I just want to say that I am just
E       ?                                     ^^^^
E       + Hello, I just want to say that I am very
E       ?                                     ^^^^

tests/quantization/ggml/test_ggml.py:422: AssertionError

@VladOS95-cyber
Copy link
Contributor Author

@SunMarc sure, I'll take a look

@VladOS95-cyber VladOS95-cyber mentioned this pull request Oct 4, 2024
5 tasks
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* update gguf doc, config and tensor mapping

* add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests

* apply code style fixes

* reformat files

* assign GGUFQwen2Converter to qwen2_moe
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
* update gguf doc, config and tensor mapping

* add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests

* apply code style fixes

* reformat files

* assign GGUFQwen2Converter to qwen2_moe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants