-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Responses are truncated too early #6
Comments
Using BeforeI'm glad you're interested in finding a unique name for your feline friend! Here are 400 creative and fun name suggestions for cats:
AfterI'm glad you're interested in finding a unique name for your feline friend! Here are 400 creative and fun name suggestions for cats:
Only 98 names for some reason from Llama2 but many more tokens than the 110ish I was getting before |
Also running into this -- responses getting truncated; this is an amazing tool already though :) |
This seems to help: diff --git a/llm_llama_cpp.py b/llm_llama_cpp.py
index f2fc977..62f716b 100644
--- a/llm_llama_cpp.py
+++ b/llm_llama_cpp.py
@@ -226,7 +226,9 @@ class LlamaModel(llm.Model):
def execute(self, prompt, stream, response, conversation):
with SuppressOutput(verbose=prompt.options.verbose):
llm_model = Llama(
- model_path=self.path, verbose=prompt.options.verbose, n_ctx=4000
+ model_path=self.path,
+ verbose=prompt.options.verbose,
+ n_ctx=4000,
)
if self.is_llama2_chat:
prompt_bits = self.build_llama2_chat_prompt(prompt, conversation)
@@ -234,7 +236,7 @@ class LlamaModel(llm.Model):
response._prompt_json = {"prompt_bits": prompt_bits}
else:
prompt_text = prompt.prompt
- stream = llm_model(prompt_text, stream=True)
+ stream = llm_model(prompt_text, stream=True, max_tokens=4000)
for item in stream:
# Each item looks like this:
# {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [ |
I ran into this problem immediately with local models and this fixed it FWIW. |
https://twitter.com/mullinsms/status/1686480711211945984
Solution may be the
max_tokens
parameter.The text was updated successfully, but these errors were encountered: