-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Community contribution: Adding GGUF support for more architectures #33260
Comments
@SunMarc I am going to take Qwen2Moe |
@SunMarc I want to take Gemma2 |
@SunMarc May I suggest & take T5? Seems GGUF version of T5 encoder is highly used for getting along with FLUX. |
@SunMarc Hello! Unless someone else is working on this model already, may I take MiniCPM-V? |
Added @junejae !
Hi @010kim, thanks for the interest ! MiniCPM-V model relies on |
@SunMarc Thank you so much for your response. It also makes sense the author should work on it. What about Cohere? Can I take it? |
Hi @SunMarc 👋🏻 |
Hey @SunMarc 🙋♂️ |
Hi @SunMarc, I take bloom if nobody is working on it |
Hi @SunMarc, I'd like to handle the work related to Codestrall :) |
Hi @SunMarc, |
Hi @SunMarc, I'd like to work on the BLIP model, but after researching, I found that it might be challenging due to the Vision model structure. Would it be alright if I switched to working on the Smol model instead? |
Hey @SunMarc 🤗 |
@SunMarc If not, I’ll proceed with switching to the dbrx model. |
Oh indeed, this is because it is a llama architecture. |
Hi @SunMarc! I am going to start working on StableLM model |
Is any work being done on the Gemma2? If not, I would like to proceed with it! |
Hi @SunMarc! I suppose GPT2 gguf is not supported yet, if this is a case, I'll take it |
Codestrall's tokenizer was just llama tokenizer. |
I went through the codes, and i was able to to load Cohere gguf model, but could not load the tokenizer. This is because Cohere slow tokenizer is not implemented in HuggingFace. (Only FastTokenizer is available for Cohere) Is there a way around to fix this? @SunMarc |
Hey @SunMarc! I"ll take Starcoder2 as next model |
Hi @SunMarc! I am going to start working on Mamba |
Are you still working on Gemma2? @yijun-lee @KingNish24 ? If not, is it possible for me to try working on it? Thank you! |
I’m running behind schedule, but I’m making progress! I’ll handle it. |
Glad to know! Then is it possible for me to try working on Nemotron? @SunMarc |
Could you please kindly check my PR @SunMarc? Thank you |
Feature request
Recently, we have added the ability to load
gguf
files within transformers.The goal was to offer the possibility to users to further train/fine-tune their gguf models.
See Workflow
1) Load gguf file in transformers: we dequantize the weights to fp32, then we load the weights to be used with PyTorch.train/finetune
Convert the model back to gguf to use in the ggml ecosystem using convert_hf_to_gguf script or using gguf-my-repo space if you pushed your model on the hub :
Let's try to add GGUF support for more architectures! Currently supported architectures are
It would be great to add the support for more architectures such as
... and many more (Feel free to suggest more architectures ! The model needs to integrated in transformers)
Adding this feature would require to follow the same protocol as in this PR :
GGUF_TENSOR_MAPPING
andGGUF_CONFIG_MAPPING
in order to map the tensor/config of the gguf file to the one on transformers.GGUFXXXConverter(XXXConverter)
class to convert the gguf tokenizer to a transformers one.If you are interested to take up the challenge, comment below with the architecture name you want to integrate and open a PR!
Once you open a PR, feel free to ping @SunMarc @LysandreJik @ArthurZucker for a review !
Motivation
Support for more gguf models
Your contribution
Reviewing PRs and possibly adding the support for more models
The text was updated successfully, but these errors were encountered: