-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream 2 sentence output is sometimes removing space between words from input tokens #5
Comments
Will look into that, code to reproduce would be awesome |
I'm running this on a Raspberry pi 5, seems like it happens every 5 sentences or so. Thanks for taking a look
|
I am also experiencing this issue. Any updates? |
Thanks for reporting. Should be fixed now in v0.2.7. Feedback would be awesome. |
Yes it seems perfect now! Thank you so much for this awesome library. |
issue is fixed, thanks for updating |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am using llama cpp python and my generator is yielding one token at a time and passing that into stream2sentence. Sometimes words are being combined in output sentences. I am using the default settings, nltk tokenizer. Notice "thingwas" and "nerviouslychuckles" are one word in the output below. I confirmed this wasn't in the input tokens (using Llama 3B instruct). I am observing this once every 50-100 tokens or so, and I haven't noticed a pattern around when it occurs. I can provide code to reproduce later if this isn't a known issue, and if you point me in the right direction I can try to fix it myself.
Sentence 4: That thingwas older than my aunt from Quebec, which is saying something, right?
Sentence 5: (nervouslychuckles once more) Anyway, that was the oldest car I've ever seen near the border of Canada, and I'm glad I got to see it...
The text was updated successfully, but these errors were encountered: