Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. #227

Closed
awinml opened this issue May 17, 2023 · 7 comments
Labels
llama.cpp Problem with llama.cpp shared lib model Model specific issue

Comments

@awinml
Copy link

awinml commented May 17, 2023

Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization released in llama.cpp PR 1405.

@awinml awinml changed the title Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization released in [llama.cpp PR 1405](https://github.com/ggerganov/llama.cpp/pull/1405). Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. May 17, 2023
@awinml
Copy link
Author

awinml commented May 17, 2023

@abetlen Any ideas on how to use the new models?

@abetlen
Copy link
Owner

abetlen commented May 17, 2023

@awinml are you using the latest version? The current version 0.1.50 is pinned to this commit which includes those changes

@gjmulder gjmulder added llama.cpp Problem with llama.cpp shared lib model Model specific issue labels May 17, 2023
@awinml
Copy link
Author

awinml commented May 19, 2023

@abetlen Updating the version to 0.1.50 resolved the issue. Thanks!

@awinml awinml closed this as completed May 19, 2023
@KocWozniakPiotr
Copy link

when using version 0.1.51 and 0.1.52 the problem still persist :

llama.cpp: loading model from Wizard-Vicuna-13B-q4_1.bin error loading model: unknown (magic, version) combination: 67676a74, 00000003; is this really a GGML file? llama_init_from_file: failed to load model

When I use old models though everything works fine using python, as well as when I run new models normally using llama.cpp.

Am I missing something or is there a bug somewhere?

@marcus800
Copy link

when using version 0.1.51 and 0.1.52 the problem still persist :

llama.cpp: loading model from Wizard-Vicuna-13B-q4_1.bin error loading model: unknown (magic, version) combination: 67676a74, 00000003; is this really a GGML file? llama_init_from_file: failed to load model

When I use old models though everything works fine using python, as well as when I run new models normally using llama.cpp.

Am I missing something or is there a bug somewhere?

Similarly, same problem working with os.system call to ./main but not through Llama()

@gjmulder
Copy link
Contributor

There are now three versions of llama model files following rapid development of the quantization code in llama.cpp.

About a week ago it changed to v2 which required python-llama-cpp==0.1.50. 00000003 however looks like a bad version number.

Below is a quick way to verify your model versions.

$ cat ../ggml_file.sh 
#!/bin/bash

xxd $1 | awk -v fname=$1 '{printf("magic: 0x%8s, version: 0x%4s, file: %s\n", $3$2, $4, fname);exit}'

$ find . -name "*.bin" -exec ../ggml_file.sh {} \; | sort -n -k 4,4 | egrep "0x0[123]00"
magic: 0x6767666d, version: 0x0100, file: ./alpaca-13B-ggml/ggml-model-q4_0.bin
magic: 0x6767666d, version: 0x0100, file: ./alpaca-30B-ggml/ggml-model-q4_0.bin
magic: 0x6767666d, version: 0x0100, file: ./alpaca-7B-ggml/ggml-model-q4_0.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_0.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_1.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_2.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_3.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q5_0.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q5_1.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q8_0.bin
magic: 0x6767746a, version: 0x0100, file: ./legacy-ggml-vicuna-13b-4bit/ggml-vicuna-13b-4bit-rev1.bin
magic: 0x6767746a, version: 0x0100, file: ./legacy-ggml-vicuna-13b-4bit/ggml-vicuna-13b-4bit.bin
magic: 0x6767746a, version: 0x0100, file: ./vicuna-13B-1.1-GPTQ-4bit-128g-GGML/vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
magic: 0x6767746a, version: 0x0100, file: ./vicuna-13B-1.1-GPTQ-4bit-128g-GGML/vicuna-13B-1.1-GPTQ-4bit-32g.GGML.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q4_0.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q5_0.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q5_1.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q8_0.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q4_0.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q5_0.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q5_1.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q8_0.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q4_0.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q5_0.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q5_1.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q8_0.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q4_0.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q4_1.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q5_0.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q5_1.bin

@marcus800
Copy link

Works with commit 08737ef:
llama_model_load_internal: format = ggjt v2 (latest)

But not when quantizing with newer versions of ggerganov/llama.cpp. I guess llama-cpp-python is not yet ready to work with version 3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llama.cpp Problem with llama.cpp shared lib model Model specific issue
Projects
None yet
Development

No branches or pull requests

5 participants