-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert.py fails importing a new model architecture #7406
Comments
Even if this works it will likely think this is just two linear I don't know enough about llama.cpp to help more, but IIRC the Qwen models have some affine projections in then and use |
If it's a variation of an existing architecture, you might be able to simply specify new optional tensors on model load and then detect their presence in the compute graph to use them when they are present. This is kind of how StableLM2 1.6B support was added in #5052.
https://github.com/ggerganov/llama.cpp/blob/master/docs/HOWTO-add-model.md |
Thanks this looks like what I need. Google has gotten really bad at finding
things lately.
…On Tue, May 21, 2024 at 2:34 PM compilade ***@***.***> wrote:
Can you provide some tips on what I need to modify to make this work?
If it's a variation of an existing architecture, you might be able to
simply specify new optional tensors on model load and then detect their
presence in the compute graph to use them when they are present.
This is kind of how StableLM2 1.6B support was added
<https://github.com/ggerganov/llama.cpp/pull/5052/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348ef>
in #5052 <#5052>.
Also if there is any documentation on porting new model architectures I
would appreciate it if you could point me to it.
https://github.com/ggerganov/llama.cpp/blob/master/docs/HOWTO-add-model.md
—
Reply to this email directly, view it on GitHub
<#7406 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA5W4ARHYKKYEEKE3PG73E3ZDOHTHAVCNFSM6AAAAABH7EAZ6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGIYTCOBRGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I am trying to port a new model I've created to GGUF however I'm hitting issues in convert.py. Specifically it seems to be confused that my lm_head has two linear layers.
I get the error:
If I add lm_head.linear1 and lm_head.linear2 to gguf-py/gguf/tensor_mapping.py convert will run however trying to actually use it with llama.cpp it will complain about 2 missing layers.
Can you provide some tips on what I need to modify to make this work? Also if there is any documentation on porting new model architectures I would appreciate it if you could point me to it.
The text was updated successfully, but these errors were encountered: