Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. #227

awinml · 2023-05-17T19:39:54Z

Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization released in llama.cpp PR 1405.

awinml · 2023-05-17T19:42:56Z

@abetlen Any ideas on how to use the new models?

abetlen · 2023-05-17T19:45:13Z

@awinml are you using the latest version? The current version 0.1.50 is pinned to this commit which includes those changes

awinml · 2023-05-19T04:48:09Z

@abetlen Updating the version to 0.1.50 resolved the issue. Thanks!

KocWozniakPiotr · 2023-05-20T18:00:22Z

when using version 0.1.51 and 0.1.52 the problem still persist :

llama.cpp: loading model from Wizard-Vicuna-13B-q4_1.bin error loading model: unknown (magic, version) combination: 67676a74, 00000003; is this really a GGML file? llama_init_from_file: failed to load model

When I use old models though everything works fine using python, as well as when I run new models normally using llama.cpp.

Am I missing something or is there a bug somewhere?

marcus800 · 2023-05-20T20:05:03Z

when using version 0.1.51 and 0.1.52 the problem still persist :

llama.cpp: loading model from Wizard-Vicuna-13B-q4_1.bin error loading model: unknown (magic, version) combination: 67676a74, 00000003; is this really a GGML file? llama_init_from_file: failed to load model

When I use old models though everything works fine using python, as well as when I run new models normally using llama.cpp.

Am I missing something or is there a bug somewhere?

Similarly, same problem working with os.system call to ./main but not through Llama()

gjmulder · 2023-05-20T20:27:01Z

There are now three versions of llama model files following rapid development of the quantization code in llama.cpp.

About a week ago it changed to v2 which required python-llama-cpp==0.1.50. 00000003 however looks like a bad version number.

Below is a quick way to verify your model versions.

$ cat ../ggml_file.sh 
#!/bin/bash

xxd $1 | awk -v fname=$1 '{printf("magic: 0x%8s, version: 0x%4s, file: %s\n", $3$2, $4, fname);exit}'

$ find . -name "*.bin" -exec ../ggml_file.sh {} \; | sort -n -k 4,4 | egrep "0x0[123]00"
magic: 0x6767666d, version: 0x0100, file: ./alpaca-13B-ggml/ggml-model-q4_0.bin
magic: 0x6767666d, version: 0x0100, file: ./alpaca-30B-ggml/ggml-model-q4_0.bin
magic: 0x6767666d, version: 0x0100, file: ./alpaca-7B-ggml/ggml-model-q4_0.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_0.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_1.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_2.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q4_3.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q5_0.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q5_1.bin
magic: 0x6767746a, version: 0x0100, file: ./alpaca-7b-native-enhanced/ggml-model-q8_0.bin
magic: 0x6767746a, version: 0x0100, file: ./legacy-ggml-vicuna-13b-4bit/ggml-vicuna-13b-4bit-rev1.bin
magic: 0x6767746a, version: 0x0100, file: ./legacy-ggml-vicuna-13b-4bit/ggml-vicuna-13b-4bit.bin
magic: 0x6767746a, version: 0x0100, file: ./vicuna-13B-1.1-GPTQ-4bit-128g-GGML/vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
magic: 0x6767746a, version: 0x0100, file: ./vicuna-13B-1.1-GPTQ-4bit-128g-GGML/vicuna-13B-1.1-GPTQ-4bit-32g.GGML.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q4_0.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q5_0.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q5_1.bin
magic: 0x6767746a, version: 0x0200, file: ./GPT4All-13B-snoozy-GGML/GPT4All-13B-snoozy.ggml.q8_0.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q4_0.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q5_0.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q5_1.bin
magic: 0x6767746a, version: 0x0200, file: ./Wizard-Vicuna-13B-Uncensored-GGML/Wizard-Vicuna-13B-Uncensored.ggml.q8_0.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q4_0.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q5_0.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q5_1.bin
magic: 0x6767746a, version: 0x0200, file: ./gpt4-alpaca-lora-30B-4bit-GGML/gpt4-alpaca-lora-30b.ggml.q8_0.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q4_0.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q4_1.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q5_0.bin
magic: 0x6767746a, version: 0x0300, file: ./dromedary-65B-lora-GGML/dromedary-lora-65B.ggmlv3.q5_1.bin

marcus800 · 2023-05-20T20:48:34Z

Works with commit 08737ef:
llama_model_load_internal: format = ggjt v2 (latest)

But not when quantizing with newer versions of ggerganov/llama.cpp. I guess llama-cpp-python is not yet ready to work with version 3?

awinml changed the title ~~Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization released in [llama.cpp PR 1405](https://github.com/ggerganov/llama.cpp/pull/1405).~~ Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. May 17, 2023

gjmulder added llama.cpp Problem with llama.cpp shared lib model Model specific issue labels May 17, 2023

awinml closed this as completed May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. #227

Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. #227

awinml commented May 17, 2023

awinml commented May 17, 2023

abetlen commented May 17, 2023

awinml commented May 19, 2023

KocWozniakPiotr commented May 20, 2023

marcus800 commented May 20, 2023

gjmulder commented May 20, 2023

marcus800 commented May 20, 2023

Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. #227

Not Compatible with Models quantized with updated llama.cpp q4 and q5 quantization. #227

Comments

awinml commented May 17, 2023

awinml commented May 17, 2023

abetlen commented May 17, 2023

awinml commented May 19, 2023

KocWozniakPiotr commented May 20, 2023

marcus800 commented May 20, 2023

gjmulder commented May 20, 2023

marcus800 commented May 20, 2023