Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py fails importing a new model architecture #7406

Closed
JohnSully opened this issue May 20, 2024 · 3 comments
Closed

convert.py fails importing a new model architecture #7406

JohnSully opened this issue May 20, 2024 · 3 comments
Labels
question Further information is requested

Comments

@JohnSully
Copy link

JohnSully commented May 20, 2024

I am trying to port a new model I've created to GGUF however I'm hitting issues in convert.py. Specifically it seems to be confused that my lm_head has two linear layers.

I get the error:

Exception: Unexpected tensor name: lm_head.linear1.bias

If I add lm_head.linear1 and lm_head.linear2 to gguf-py/gguf/tensor_mapping.py convert will run however trying to actually use it with llama.cpp it will complain about 2 missing layers.

error loading model: done_getting_tensors: wrong number of tensors; expected 293, got 291
llama_load_model_from_file: failed to load model
main: error: unable to load model

Can you provide some tips on what I need to modify to make this work? Also if there is any documentation on porting new model architectures I would appreciate it if you could point me to it.

@jukofyork
Copy link
Contributor

jukofyork commented May 20, 2024

If I add lm_head.linear1 and lm_head.linear2

Even if this works it will likely think this is just two linear .weight type projections in series, whereas to use a .bias it needs to do an affine projection.

I don't know enough about llama.cpp to help more, but IIRC the Qwen models have some affine projections in then and use .bias as well as .weight, so this might be worth a look.

@mofosyne mofosyne added the question Further information is requested label May 21, 2024
@compilade
Copy link
Collaborator

Can you provide some tips on what I need to modify to make this work?

If it's a variation of an existing architecture, you might be able to simply specify new optional tensors on model load and then detect their presence in the compute graph to use them when they are present.

This is kind of how StableLM2 1.6B support was added in #5052.

Also if there is any documentation on porting new model architectures I would appreciate it if you could point me to it.

https://github.com/ggerganov/llama.cpp/blob/master/docs/HOWTO-add-model.md

@JohnSully
Copy link
Author

JohnSully commented May 21, 2024 via email

@Galunid Galunid closed this as completed May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants