-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] mistral-7b-openorca crashes main.exe after BPE update. #3454
Comments
I'm not surprised, if the model is using a GPT2 based tokenizer. How do we convert
OK, so the model seems to use a |
I used TheBloke's converted verison, if that helps. |
@goerch I can reproduce. Anything you would like me to check ? I believe this mode adds Edit: from model page: https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/raw/main/added_tokens.json
|
Thanks, something like To avoid further damage I tend to disable these assertions in Edit: what I don't like is in our current logic is that even |
It would be great if you could check #3455. |
So your fix works, however naively changing Line 408 in ff5a3f0
|
So do I need to re-make OpenOrca Mistral GGUF? For the FOURTH time? 🤣 (they kept updating the JSON files with tokenizer changes, so I ended up making them three times yesterday) Or are you asking me to test if this PR works with the existing GGUFs? |
(Edit: pr is #3455) I already tested it and it does This PR should make already converted models work, but the change in In case people start reporting broken conversion, the solution is either to wait for this PR to get merged, or redo the conversion with modified So I guess the choice is yours, whether you want people to aim their pitchforks at you or llamacpp :) |
Well, once support for SWA is added, Mistral models will probably need to be converted again to add it to the metadata. |
Using |
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
…example * 'master' of github.com:ggerganov/llama.cpp: py : change version of numpy requirement to 1.24.4 (ggerganov#3515) quantize : fail fast on write errors (ggerganov#3521) metal : support default.metallib load & reuse code for swift package (ggerganov#3522) llm : support Adept Persimmon 8B (ggerganov#3410) Fix for ggerganov#3454 (ggerganov#3455) readme : update models, cuda + ppl instructions (ggerganov#3510) server : docs fix default values and add n_probs (ggerganov#3506)
I believe the issue is resolved now |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
mistral-7b-openorca.Q4_K_S.gguf works correctly, as it was before the BPE update.
Current Behavior
mistral-7b-openorca.Q4_K_S.gguf crashes main.exe after entering (and processing?) the prompt.
Additionally, I've merged that commit into my own chat project (slightly rewritten main example), and it generates, but crashes at the end of generation (eos issue?).
i5 3470 (AVX only).
Windows 8.1
Compiled with w64devkit-fortran-1.20.0
Additionally, I've tested it and got the same crash with main.exe from b1311 AVX release.
Failure Information (for bugs)
The crash message points at llama.cpp, line 7716, GGML_ASSERT(false);
Failure Logs
The text was updated successfully, but these errors were encountered: