-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for #3454 #3455
Workaround for #3454 #3455
Conversation
mistral_7b_openorca doesn't stop generating with this PR. Tested on main in interactive mode. |
OK. Does this mean |
I'm currently testing this, it appears that not using |
I tried this according to suggested format, and is seems to work: However, it's not consistent: it added <|im_end|> after the first prompt, but didn't after the second.
Maybe this model just requires using prefix and suffix. |
@MaggotHATE i believe the correct prompt format expects "system" "user" "assistant" followed by endline instead of ":" https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca#example-prompt-exchange Edit: if you add |
It seems to me you are testing a rather new model? W.r.t. #3525 it is probably important to distinguish regressions (which the assertions disabled in this patch are).
@staviq : I believe with this PR we should be backwards compatible for |
@staviq Thank you! Forgot about it completely. After testing I found that something like With -e, however, if the prompt format is slightly different, it breaks - for example, if adding \n after <|im_start|> in antiprompt, the model doesn't even consider it:
With corrected format:
In this example, my inputs were only first and last, it just didn't stop on the second one. Maybe it's because \n is added at the end of each input. And finally, it works best without suffix, completing the format for you. Still not ideal, though.
which is also a waste of tokens, I guess, since it generates all that extra formatting each time. Still, it's just a 7b model, so it's hard to tell when it's hallucinating and when it works as intended. |
I already tested it and it does solve that particular problem with assert crash, though I would prefer @cebtenzzre to confirm, he's been following recent tokenizer changes. Convert and quantize actually take less than 10mins on fairly old CPU, if that would make it easier for you, but it's not a problem for me either. |
I think I found what's wrong with prompt format Reverse prompt Can you try editing Line 618 in 79f34ab
paste this:
so It looks like this:
After that, do make clean and rebuild, and try those parameters ( actual main prompt text isn't important here ):
I just tested this and managed to have couple of pages worth of conversation without a single problem. |
I'm confused as what's been tested here.
|
Yes, |
@staviq thank you! Processing for escape sequences fixes the issue with stopping (sometimes even without <|im_start|> and <|im_end|>). For example, questions about Vulkan API seem to test the model best - it stops properly only with <|im_start|> and <|im_end|>, while with simpler questions or conversations it can work without special tokens. Overall, I think this is just a quirky model that is too much of an effort to work with at the moment, unless there is a way to insert the tokens as tokens into prompt and antiprompt. Thank you for help anyway! |
I still haven't gotten a chance to read over the BPE tokenizer PR. Could we just revert whatever was changed in the sentencepiece tokenizer? It wasn't intended to be part of the PR's scope anyway, right? |
I believe adding assertion(s) are the only changes done to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to more or less revert 17ca832 for SPM, so it should be fine.
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
…example * 'master' of github.com:ggerganov/llama.cpp: py : change version of numpy requirement to 1.24.4 (ggerganov#3515) quantize : fail fast on write errors (ggerganov#3521) metal : support default.metallib load & reuse code for swift package (ggerganov#3522) llm : support Adept Persimmon 8B (ggerganov#3410) Fix for ggerganov#3454 (ggerganov#3455) readme : update models, cuda + ppl instructions (ggerganov#3510) server : docs fix default values and add n_probs (ggerganov#3506)
Accepting all unsupported token types and handling them like CONTROL tokens.