Anyone know why talk-llama does not work when setting the context length to a really low number with the Mistral-7B-Instruct model #2492

pseelam02 · 2024-10-16T21:44:44Z

pseelam02
Oct 16, 2024

I have been reading about the context-length and how it effects the performance for transformer based large language models. I have understood that the context-length is the maximum number of tokens the current token can attend to during attention. So that means the kv-cache has a maximum size that is equal to the context-length. When I set the context-length to 128 it segmentation faults in the decode stage of llama.cpp. I am trying to understand why this is happening, I believe that the context length should not be resulting in any sort of issues(theoretically), however maybe this is due to the way the model was trained or something on the architectural level that is causing this. Any articles to read about this or explanations of why this is happening would be greatly appreciated. I believe the error is being caused in the llama_decode_internal on line 17095(from when I cloned)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anyone know why talk-llama does not work when setting the context length to a really low number with the Mistral-7B-Instruct model #2492

{{title}}

Replies: 0 comments

Select a reply

Anyone know why talk-llama does not work when setting the context length to a really low number with the Mistral-7B-Instruct model #2492

pseelam02 Oct 16, 2024

Replies: 0 comments

pseelam02
Oct 16, 2024