Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime error in example/server #1557

Closed
DDANGEUN opened this issue May 22, 2023 · 15 comments
Closed

runtime error in example/server #1557

DDANGEUN opened this issue May 22, 2023 · 15 comments

Comments

@DDANGEUN
Copy link

To build and run the just released example/server executable,
I made the server executable with cmake build(adding option: -DLLAMA_BUILD_SERVER=ON),

And I followed the ReadMe.md and ran the following code.

./build/bin/server -m models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin --ctx_size 2048

And the following error occurred.

In Mac

main: seed = 1684723159
llama.cpp: loading model from models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin
libc++abi: terminating due to uncaught exception of type std::runtime_error: unexpectedly reached end of file
zsh: abort      ./build/bin/server -m models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin

In Ubuntu(with cuBLAS)

main: seed = 1684728245
llama.cpp: loading model from models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

Same Runtime Error.
what more do I need?

@vicwer
Copy link

vicwer commented May 22, 2023

same error in gpt2_ggml_model when run ./quantize ./gpt2_13b/ggml-model-f16.bin ./gpt2_13b/ggml-model-f16.bin:

terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

@mconsidine
Copy link

Same error with Linux Mint (Ubuntu) with ggml-alpaca-7b-native-q4.bin

@adamierymenko
Copy link

Same here on several models, but gpt4-alpaca-lora still works.

@Jason0214
Copy link

same error in gpt2_ggml_model when run ./quantize ./gpt2_13b/ggml-model-f16.bin ./gpt2_13b/ggml-model-f16.bin:

terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

I hit a similar issue. Mine was caused by 2d5db48. The delta field in each quantize block is changed from fp32 to fp16, so the model file fails to load. There is a file version bump and checking the file version, but the checking was too late. When "some" tensor data is loaded with incorrect size, following length related fields such as name_len may load corrupted data, causing unexpected end of file.

@mconsidine
Copy link

Thanks for that. I overlooked the breaking change. Looks like from this
#1405 (comment)
there are redone models available.

@FSSRepo
Copy link
Collaborator

FSSRepo commented May 22, 2023

Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project.

@x4080
Copy link

x4080 commented May 22, 2023

Hi guys, I confirmed its working with the latest model like airoboros-13B.q5_1.bin

@FSSRepo there's some config that is not available in runtime like
-t 6
-n 2048
--repeat_penalty 1.0
-f prompts/chat.txt
-r "User:"

So it will be initialized using the node call in example ?

const axios = require("axios");

const prompt = `Building a website can be done in 10 simple steps:`;

async function Test() {
    let result = await axios.post("http://127.0.0.1:8080/completion", {
        prompt,
        batch_size: 128,
        n_predict: 512,
    });

    // the response is received until completion finish
    console.log(result.data.content);
}

Test();

And thanks again for your hard work @FSSRepo

@DDANGEUN
Copy link
Author

Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project.

I knew I needed to re-quantize, It works. Thank you.

@mconsidine
Copy link

mconsidine commented May 23, 2023 via email

@x4080
Copy link

x4080 commented May 23, 2023

@mconsidine just download the new version of the model in hf

@DDANGEUN
Copy link
Author

@mconsidine I don't know what you are missing, write all steps.

git pull origin master
make clean
make quantize

and form README.md :

# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/

# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

just re-quantize your model

@DDANGEUN
Copy link
Author

May I close this issue?

@mconsidine
Copy link

mconsidine commented May 24, 2023 via email

@simonepontz
Copy link

I still experience a similar issue, with the following error during quantization

$ ./quantize models/7B/ggml-model-q4_0.bin models/7B/ggml-model-q4_0.bin.quantized 2
main: build = 588 (ac7876a)
main: quantizing 'models/7B/ggml-model-q4_0.bin' to 'models/7B/ggml-model-q4_0.bin.quantized' as q4_0
llama.cpp: loading model from models/7B/ggml-model-q4_0.bin
libc++abi: terminating with uncaught exception of type std::runtime_error: unexpectedly reached end of file
[1]    76573 abort      ./quantize models/7B/ggml-model-q4_0.bin  2

The step has been so far

git pull
make clean
make quantize

python3 convert.py models/7B/  # which create ggml-model-q4_0.bin

./quantize models/7B/ggml-model-q4_0.bin models/7B/ggml-model-q4_0.bin.quantized q4_0

May be where am I downloading the models? I have tried different approaches on the README they all fail the same way

@simonepontz
Copy link

I was able to solve for gpt4all doing convert + quantization.

python3 convert.py models/gpt4all-7B/gpt4all-lora-quantized.bin --outtype f16

./quantize models/gpt4all-7B/ggml-model-f16.bin models/gpt4all-7B/ggml-model-q4_0.bin q4_0

Which looking at the output of convert, it should have been doing already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants