Inconsistent results using thebloke\llama-2-13b-chat.Q5_K_M.gguf #845

Jurys22 · 2023-10-25T22:14:15Z

Jurys22
Oct 25, 2023

Some context: I have just started using the model from Hugging Face, thebloke\llama-2-13b-chat.Q5_K_M.gguf. I am using it through llama_cpp bindings in Python and I use 1 GPU.

My goal: to retrieve pros and cons from restaurant reviews.

What I am trying to achieve at the moment: I want to test the consistency of the output by running the same question several times and evaluating the text generated. While I don't expect the same results since it's probabilistic, I expect it to be similar.

My issue: sometimes (8/31 run) the text generated seems cut. I don't change the parameters or the prompt. I would expect a similar output, but this is not the case.

This is my input:
Give a precise answer to the question based on the context. Don't be verbose. Context: If you enjoy Indian food, this is a must try restaurant! Great atmosphere and welcoming service. We were at Swad with another couple and shared a few dishes. Be sure and ask for them to come at the same time and not family style as they will come one at a time. I had to try the butter chicken which was at the top of the list for the best I have ever tasted. We ordered two fabulous vegetable dishes, Aloo Gobhi Vegetable Korma, both were wonderful. Lastly we had a delightful white fish that was cooked to perfection. The service was excellent and the food amazing. I strongly recommend reservations on a Friday or Saturday night. Q: what are the pros and cons of this restaurant?\n

These are the possible results:

Pros: Great atmosphere, welcoming service, delicious Indian food, best butter chicken, wonderful vegetable dishes, delightful white fish, excellent service. Cons: None mentioned in the review.

A: Pros:

A: Based on the review, here are the pros and cons of the restaurant:

My code:

output = []
model_path = "models_gguf\\llama-2-13b-chat.Q5_K_M.gguf"
from llama_cpp import Llama
 
review = "If you enjoy Indian food, this is a must try restaurant! Great atmosphere and welcoming service. We were at Swad with another couple and shared a few dishes. Be sure and ask for them to come at the same time and not family style as they will come one at a time. I had to try the butter chicken which was at the top of the list for the best I have ever tasted. We ordered two fabulous vegetable dishes, Aloo Gobhi Vegetable Korma, both were wonderful. Lastly we had a delightful white fish that was cooked to perfection. The service was excellent and the food amazing. I strongly recommend reservations on a Friday or Saturday night."
sys_prompt = "Q: Give a precise answer to the question based on the context. Don't be verbose. Context: "
 
for test_no in range(0,25):
    llm = Llama(model_path = model_path, 
            n_ctx=2048, 
            n_gpu_layers=43, 
            temp=0.7,  
            top_k= 10
            )
    output.append(llm(sys_prompt + review + " Question: what are the pros and cons of this restaurant?\n A: ", 
                 max_tokens = 1000,
                 stop=["Q:", "\n"],
                 echo=True))

What am I doing wrong? Why is the text being cut?

tk-master · 2023-11-13T10:55:44Z

tk-master
Nov 13, 2023

@Jurys22 late reply but.. it's probably because of your prompt format.. While the model is smart enough to work with Q: {input} A:.. it was definitely not trained that way (no model uses that kind of format afaik) this is just an example format to test if a script is working.
You should use the appropriate prompt template for the model for best results.. for llama-2 its probably:

[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
{prompt}[/INST]

And remove stop=["Q:", "\n"],
A this point though, there are better models out there than vanilla llama-2-13b.. for instance mistral 7b outperforms it on average.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results using thebloke\llama-2-13b-chat.Q5_K_M.gguf #845

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Inconsistent results using thebloke\llama-2-13b-chat.Q5_K_M.gguf #845

Jurys22 Oct 25, 2023

Replies: 1 comment

tk-master Nov 13, 2023

Jurys22
Oct 25, 2023

tk-master
Nov 13, 2023