-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime error in example/server #1557
Comments
same error in gpt2_ggml_model when run
|
Same error with Linux Mint (Ubuntu) with ggml-alpaca-7b-native-q4.bin |
Same here on several models, but gpt4-alpaca-lora still works. |
I hit a similar issue. Mine was caused by 2d5db48. The delta field in each quantize block is changed from fp32 to fp16, so the model file fails to load. There is a file version bump and checking the file version, but the checking was too late. When "some" tensor data is loaded with incorrect size, following length related fields such as |
Thanks for that. I overlooked the breaking change. Looks like from this |
Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project. |
Hi guys, I confirmed its working with the latest model like airoboros-13B.q5_1.bin @FSSRepo there's some config that is not available in runtime like So it will be initialized using the node call in example ?
And thanks again for your hard work @FSSRepo |
I knew I needed to re-quantize, It works. Thank you. |
Can you describe or point me to a link that shows how you re-quantized?
Thank you.
Sent from allegedly smart phone
…On Mon, May 22, 2023, 10:47 PM DDANGEUN ***@***.***> wrote:
Hello, I am in charge of implementing the server example in llama.cpp.
That error occurs because you need to re-quantize the model when using the
latest version of the project.
I knew I needed to re-quantize, It works. Thank you.
—
Reply to this email directly, view it on GitHub
<#1557 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEPIPG6ROH4DAUUV7LY3DTXHQQK7ANCNFSM6AAAAAAYJ3WSHI>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
@mconsidine just download the new version of the model in hf |
@mconsidine I don't know what you are missing, write all steps.
and form README.md :
just re-quantize your model |
May I close this issue? |
Thank you. I'm squared away...mconsidineOn May 23, 2023 9:58 PM, DDANGEUN ***@***.***> wrote:
@mconsidine I don't know what you are missing, write all steps.
git pull origin master
make clean
make quantize
and form README.md :
# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/
# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
just re-quantize your model
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I still experience a similar issue, with the following error during quantization
The step has been so far
May be where am I downloading the models? I have tried different approaches on the README they all fail the same way |
I was able to solve for gpt4all doing convert + quantization.
Which looking at the output of convert, it should have been doing already |
To build and run the just released example/server executable,
I made the server executable with cmake build(adding option: -DLLAMA_BUILD_SERVER=ON),
And I followed the ReadMe.md and ran the following code.
And the following error occurred.
In Mac
In Ubuntu(with cuBLAS)
Same Runtime Error.
what more do I need?
The text was updated successfully, but these errors were encountered: