[bug] mistral-7b-openorca crashes main.exe after BPE update. #3454

MaggotHATE · 2023-10-03T12:42:17Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

mistral-7b-openorca.Q4_K_S.gguf works correctly, as it was before the BPE update.

Current Behavior

mistral-7b-openorca.Q4_K_S.gguf crashes main.exe after entering (and processing?) the prompt.

Additionally, I've merged that commit into my own chat project (slightly rewritten main example), and it generates, but crashes at the end of generation (eos issue?).

Physical (or virtual) hardware you are using:

i5 3470 (AVX only).

Operating System:

Windows 8.1

Environment:

Compiled with w64devkit-fortran-1.20.0
Additionally, I've tested it and got the same crash with main.exe from b1311 AVX release.

Failure Information (for bugs)

The crash message points at llama.cpp, line 7716, GGML_ASSERT(false);

Failure Logs

[1696334675] Log start
[1696334675] Cmd: main -t 3 -m F:/GGML/test/models/mistral_7b_openorca_Q4_K_S.gguf -p "system: complete the given task with precision, adding methodical explanations. user:" --temp 0.9 --repeat_penalty 1.133 --top-p 0.7 -r user: --interactive-first
[1696334675] main: build = 0 (unknown)
[1696334675] main: built with cc (GCC) 13.1.0 for x86_64-w64-mingw32
[1696334675] main: seed  = 1696334675
[1696334675] main: llama backend init
[1696334675] main: load the model and apply lora adapter, if any
[1696334676] warming up the model with an empty run
[1696334677] n_ctx: 512
[1696334677] 
[1696334677] system_info: n_threads = 3 / 4 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
[1696334677] add_bos: 1
[1696334677] tokenize the prompt
[1696334677] prompt: "system: complete the given task with precision, adding methodical explanations. user:"
[1696334677] tokens: [ '':1, ' system':1587, ':':28747, ' complete':4160, ' the':272, ' given':2078, ' task':3638, ' with':395, ' precision':16021, ',':28725, ' adding':8833, ' method':2038, 'ical':745, ' explan':10928, 'ations':697, '.':28723, ' user':2188, ':':28747 ]
[1696334677] recalculate the cached logits (check): embd_inp.empty() false, n_matching_session_tokens 0, embd_inp.size() 18, session_tokens.size() 0, embd_inp.size() 18
[1696334677] inp_pfx: [ '':1, ' ':28705, '':13, '':13, '###':27332, ' Inst':3133, 'ruction':3112, ':':28747, '':13, '':13 ]
[1696334677] inp_sfx: [ ' ':28705, '':13, '':13, '###':27332, ' Response':12107, ':':28747, '':13, '':13 ]
[1696334677] main: interactive mode on.
[1696334677] Reverse prompt: 'user:'
[1696334677] sampling: repeat_last_n = 64, repeat_penalty = 1.133000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.700000, typical_p = 1.000000, temp = 0.900000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
[1696334677] generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
[1696334677] 

[1696334677] == Running in interactive mode. ==
[1696334677]  - Press Ctrl+C to interject at any time.
[1696334677]  - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

[1696334677] embd_inp.size(): 18, n_consumed: 0
[1696334677] found antiprompt: ▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅ system: complete the given task with precision, adding methodical explanations. user:
[1696334677] eval: [ '':1, ' system':1587, ':':28747, ' complete':4160, ' the':272, ' given':2078, ' task':3638, ' with':395, ' precision':16021, ',':28725, ' adding':8833, ' method':2038, 'ical':745, ' explan':10928, 'ations':697, '.':28723, ' user':2188, ':':28747 ]
[1696334681] n_past = 18
[1696334681] embd_inp.size(): 18, n_consumed: 18
[1696334681] found antiprompt: ▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅ system: complete the given task with precision, adding methodical explanations. user:
[1696334681] waiting for user input
[1696334689] buffer: 'Write a joke about llamas.
'
[1696334689] input tokens: [ ' Write':12018, ' a':264, ' joke':13015, ' about':684, ' llam':17620, 'as':293, '.':28723, '':13 ]
[1696334689] n_remain: -9
[1696334689] embd_inp.size(): 26, n_consumed: 18
[1696334689] eval: [ ' Write':12018, ' a':264, ' joke':13015, ' about':684, ' llam':17620, 'as':293, '.':28723, '':13 ]
[1696334691] n_past = 26
[1696334691] top 10 candidates:

The text was updated successfully, but these errors were encountered:

goerch · 2023-10-03T13:02:43Z

mistral-7b-openorca.Q4_K_S.gguf crashes main.exe after entering (and processing?) the prompt.

I'm not surprised, if the model is using a GPT2 based tokenizer. How do we convert mistral-7b-openorca (I haven't found a specific conversion script in the repository)?

The crash message points at llama.cpp, line 7716, GGML_ASSERT(false);

OK, so the model seems to use a sentencepiece tokenizer and the function tries to handle a token which is neither NORMAL, UNKNOWN, CONTROL or BYTE. Does the vocabulary contain USER_DEFINED or UNUSED tokens?

MaggotHATE · 2023-10-03T13:50:04Z

How do we convert mistral-7b-openorca (I haven't found a specific conversion script in the repository)?

I used TheBloke's converted verison, if that helps.

staviq · 2023-10-03T14:01:49Z

@goerch I can reproduce. Anything you would like me to check ? I believe this mode adds <|im_start|> <|im_end|> tokens

Edit: from model page: https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/raw/main/added_tokens.json

{
  "</s>": 2,
  "<s>": 1,
  "<unk>": 0,
  "<|im_end|>": 32000,
  "<|im_start|>": 32001
}

goerch · 2023-10-03T14:14:37Z

@goerch I can reproduce. Anything you would like me to check ? I believe this mode adds <|im_start|> <|im_end|> tokens

Edit: from model page: https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/raw/main/added_tokens.json
{
  "</s>": 2,
  "<s>": 1,
  "<unk>": 0,
  "<|im_end|>": 32000,
  "<|im_start|>": 32001
}

Thanks, something like convert.py probably adds these as USER_DEFINED, but unfortunately we don't know the real sentencepiece token type and I neither have an understanding of the semantics of USER_DEFINED nor a test case for it.

To avoid further damage I tend to disable these assertions in token_to_piece, which would mean all unsupported token types behave like CONTROL tokens.

Edit: what I don't like is in our current logic is that even <unk>, <s> and </s> would end up with token type USER_DEFINED IIUC.

goerch · 2023-10-03T14:25:14Z

Anything you would like me to check ?

It would be great if you could check #3455.

staviq · 2023-10-03T14:57:28Z

Anything you would like me to check ?

It would be great if you could check #3455.

So your fix works, however naively changing USER_DEFINED to CONTROL in

llama.cpp/convert.py

Line 408 in ff5a3f0

yield text.encode("utf-8"), score, gguf.TokenType.USER_DEFINED

works too, and produces a model compatible with current version without modifications ( paging @TheBloke )

TheBloke · 2023-10-03T21:19:52Z

So do I need to re-make OpenOrca Mistral GGUF? For the FOURTH time? 🤣 (they kept updating the JSON files with tokenizer changes, so I ended up making them three times yesterday)

Or are you asking me to test if this PR works with the existing GGUFs?

staviq · 2023-10-03T21:54:01Z

Or are you asking me to test if this PR works with the existing GGUFs?

(Edit: pr is #3455)

I already tested it and it does

This PR should make already converted models work, but the change in convert.py produces a model which works with or without this PR

In case people start reporting broken conversion, the solution is either to wait for this PR to get merged, or redo the conversion with modified convert.py

So I guess the choice is yours, whether you want people to aim their pitchforks at you or llamacpp :)

slaren · 2023-10-03T22:08:44Z

Well, once support for SWA is added, Mistral models will probably need to be converted again to add it to the metadata.

goerch · 2023-10-04T07:21:29Z

so I ended up making them three times yesterday

Using convert.py? Thanks!

Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion

…example * 'master' of github.com:ggerganov/llama.cpp: py : change version of numpy requirement to 1.24.4 (ggerganov#3515) quantize : fail fast on write errors (ggerganov#3521) metal : support default.metallib load & reuse code for swift package (ggerganov#3522) llm : support Adept Persimmon 8B (ggerganov#3410) Fix for ggerganov#3454 (ggerganov#3455) readme : update models, cuda + ppl instructions (ggerganov#3510) server : docs fix default values and add n_probs (ggerganov#3506)

ggerganov · 2023-10-18T06:51:10Z

I believe the issue is resolved now

goerch added a commit to goerch/llama.cpp that referenced this issue Oct 3, 2023

Workaround for ggerganov#3454

ee8e2b2

cwillu mentioned this issue Oct 6, 2023

main: build = 1336 (9ca79d5) - Load mistral-7b-openorca.Q8_0.gguf - after first prompt "hello" llama crashing - windows build - some time ago was ok - 30 builds before? #3516

Closed

goerch added a commit that referenced this issue Oct 7, 2023

Fix for #3454 (#3455)

3a716b4

Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion

yusiwen pushed a commit to yusiwen/llama.cpp that referenced this issue Oct 7, 2023

Fix for ggerganov#3454 (ggerganov#3455)

240db82

Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion

ggerganov closed this as completed Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] mistral-7b-openorca crashes main.exe after BPE update. #3454

[bug] mistral-7b-openorca crashes main.exe after BPE update. #3454

MaggotHATE commented Oct 3, 2023

goerch commented Oct 3, 2023 •

edited

Loading

MaggotHATE commented Oct 3, 2023

staviq commented Oct 3, 2023 •

edited

Loading

goerch commented Oct 3, 2023 •

edited

Loading

goerch commented Oct 3, 2023

staviq commented Oct 3, 2023

TheBloke commented Oct 3, 2023 •

edited

Loading

staviq commented Oct 3, 2023 •

edited

Loading

slaren commented Oct 3, 2023

goerch commented Oct 4, 2023

ggerganov commented Oct 18, 2023

[bug] mistral-7b-openorca crashes main.exe after BPE update. #3454

[bug] mistral-7b-openorca crashes main.exe after BPE update. #3454

Comments

MaggotHATE commented Oct 3, 2023

Prerequisites

Expected Behavior

Current Behavior

Failure Information (for bugs)

Failure Logs

goerch commented Oct 3, 2023 • edited Loading

MaggotHATE commented Oct 3, 2023

staviq commented Oct 3, 2023 • edited Loading

goerch commented Oct 3, 2023 • edited Loading

goerch commented Oct 3, 2023

staviq commented Oct 3, 2023

TheBloke commented Oct 3, 2023 • edited Loading

staviq commented Oct 3, 2023 • edited Loading

slaren commented Oct 3, 2023

goerch commented Oct 4, 2023

ggerganov commented Oct 18, 2023

goerch commented Oct 3, 2023 •

edited

Loading

staviq commented Oct 3, 2023 •

edited

Loading

goerch commented Oct 3, 2023 •

edited

Loading

TheBloke commented Oct 3, 2023 •

edited

Loading

staviq commented Oct 3, 2023 •

edited

Loading