-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Support for Custom Qwen2moe Architectures with mergekit-qwen2 #6453
base: master
Are you sure you want to change the base?
Conversation
Thanks @DisOOM for creating this PR based on our discussion regarding why MoE models based on Qwen don't work properly. I will tag @compilade and @slaren who was involve in the PR you mentioned. However, have you tried using this PR to see if the MoE models based on Qwen architecture works properly? #6387 I am testing #6387 now for DBRX, but if it's to solve issues with MoE (not sure if there is a difference between Mergekit MoE and others like Qwen, Mixtral, DBRX). I would personally try it to see if my quantized Qwen MoE model would work. |
I haven't tried this PR yet. I will give it a try later. |
I have pulled and used the latest changes from the It works very fine and has a coherent output. However, any quantized model from this pf16 results in the following error:
@ggerganov I am not sure what causes this error. This is a MoE made by MergeKit based on Qwen models. (one of those situation where the fp16 GGUF model works fine, but the quantized just either crashes or outputs nonsense) |
Statement: This has nothing to do with the fine-grained MoE architecture in Qwen/Qwen1.5-MoE-A2.7B. It is more akin to a traditional MoE, except that its experts are derived from the qwen2 (qwen1.5) model.
I was previously using mergekit-moe to merge the qwen1.5 model into an MoE, but the resulting models were corrupted after being converted into the gguf format.
Subsequently, I discovered this custom mergekit script that successfully merges into qwen2MoE: https://github.com/Aratako/mergekit-qwen2. Following the example of #4912, I made some modifications to llama.cpp, enabling it to correctly convert, quantize, and run MoEs merged using this custom script.
It performs well on older versions, but I encountered errors with the latest version. It can correctly convert and quantize but fails to run. I believe the issue lies in incompatibility with the changes made to llamacpp in #6122, but I am unsure how to resolve this problem.
I am a newbie to coding and this is my first PR, please be lenient.
I encountered no issues when converting with convert-hf-to-gguf.py and quantizing with quantize.exe, but I encountered the following issues when I ran main.exe.