Stream 2 sentence output is sometimes removing space between words from input tokens #5

ekcrisp · 2024-10-31T05:42:21Z

I am using llama cpp python and my generator is yielding one token at a time and passing that into stream2sentence. Sometimes words are being combined in output sentences. I am using the default settings, nltk tokenizer. Notice "thingwas" and "nerviouslychuckles" are one word in the output below. I confirmed this wasn't in the input tokens (using Llama 3B instruct). I am observing this once every 50-100 tokens or so, and I haven't noticed a pattern around when it occurs. I can provide code to reproduce later if this isn't a known issue, and if you point me in the right direction I can try to fix it myself.

Sentence 4: That thingwas older than my aunt from Quebec, which is saying something, right?
Sentence 5: (nervouslychuckles once more) Anyway, that was the oldest car I've ever seen near the border of Canada, and I'm glad I got to see it...

KoljaB · 2024-10-31T10:19:39Z

Will look into that, code to reproduce would be awesome

ekcrisp · 2024-11-01T18:07:42Z

I'm running this on a Raspberry pi 5, seems like it happens every 5 sentences or so. Thanks for taking a look


import random
from llama_cpp import Llama
from stream2sentence import generate_sentences

chat_input = ''' 
<|system|>
You are a creative writer who is interested in nature. You have traveled the world and have many stories to tell.
</s>
<|user|>
Where have you traveled recently?
</s>
<|assistant|>
'''

llm = Llama(
    model_path='./Llama-3.2-3B-Instruct-Q8_0.gguf',
    n_ctx=4096,
    n_threads=4,
    verbose=False
)

def output_generator():
    for output in llm(
        chat_input,
        stream=True,
        seed=random.randint(1, 1000000),
        max_tokens=1000
    ):
        yield output['choices'][0]['text']

for idx, sentence in enumerate(
    generate_sentences(
        output_generator()
    ), start=1):
    print(f"Sentence {idx}: {sentence}")

davidchi31415 · 2024-11-07T10:27:47Z

I am also experiencing this issue. Any updates?

KoljaB · 2024-11-07T13:02:13Z

Thanks for reporting. Should be fixed now in v0.2.7. Feedback would be awesome.

davidchi31415 · 2024-11-07T20:52:51Z

Yes it seems perfect now! Thank you so much for this awesome library.

ekcrisp · 2024-11-08T08:40:40Z

issue is fixed, thanks for updating

ekcrisp closed this as completed Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream 2 sentence output is sometimes removing space between words from input tokens #5

Stream 2 sentence output is sometimes removing space between words from input tokens #5

ekcrisp commented Oct 31, 2024

KoljaB commented Oct 31, 2024

ekcrisp commented Nov 1, 2024 •

edited

Loading

davidchi31415 commented Nov 7, 2024

KoljaB commented Nov 7, 2024

davidchi31415 commented Nov 7, 2024

ekcrisp commented Nov 8, 2024

Stream 2 sentence output is sometimes removing space between words from input tokens #5

Stream 2 sentence output is sometimes removing space between words from input tokens #5

Comments

ekcrisp commented Oct 31, 2024

KoljaB commented Oct 31, 2024

ekcrisp commented Nov 1, 2024 • edited Loading

davidchi31415 commented Nov 7, 2024

KoljaB commented Nov 7, 2024

davidchi31415 commented Nov 7, 2024

ekcrisp commented Nov 8, 2024

ekcrisp commented Nov 1, 2024 •

edited

Loading