Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt Truncation Bug for models with smaller max_seq_length - truncation of all prompt #4379

Closed
1 task done
TuanaCelik opened this issue Mar 10, 2023 · 3 comments · Fixed by #4651
Closed
1 task done
Assignees
Labels
topic:promptnode type:bug Something isn't working

Comments

@TuanaCelik
Copy link
Contributor

Bug was discovered by @recrudesce @rolandtannous and @danielbichuetti

It seems that when the prompt truncation PR was merged for OpenAI #4179 this made it so that if the specified maximum length in the PromptNode definition is equal to or larger than the models own max sequence length, then the truncation can also truncate the prompt itself + documents too e.g. for models that have a significantly lower token limit like flan models. So the Prompt is empty, and documents are also truncated.

To Reproduce

from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate
from haystack.document_stores import WeaviateDocumentStore
from haystack.pipelines import Pipeline
import sys

document_store = WeaviateDocumentStore(similarity="cosine", embedding_dim=768)
lfqa_prompt = PromptTemplate(
    name="lfqa",
    prompt_text="Generate a comprehensive, summarized answer to the given question using the provided paragraphs and reply as if you are Yoda.  \n\n Paragraphs: $documents \n\n Question: $query \n\n Answer:",
)

retriever = EmbeddingRetriever(
    document_store=document_store,
    embedding_model="flax-sentence-embeddings/all_datasets_v3_mpnet-base",
    model_format="sentence_transformers",
    top_k=20,
)

prompt_node = PromptNode(
    model_name_or_path="google/flan-t5-xl",
    default_prompt_template=lfqa_prompt,
    top_k=9,
    max_length=512,
)

pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

output = pipe.run(query=sys.argv[1])

full_sentence = list(filter(lambda x: x.endswith("."), output["results"]))

longest_output = max(full_sentence, key=len)

print("**Answer:** " + longest_output

FAQ Check

@TuanaCelik TuanaCelik added type:bug Something isn't working topic:promptnode labels Mar 10, 2023
@danielbichuetti
Copy link
Contributor

danielbichuetti commented Mar 10, 2023

I think that we should not truncate from Prompt to Answer, but in the opposite direction. And we warn the user:
“Your prompt uses Z tokens, this LLM has a max seq. length of X tokens, you requested an Answer token space of Y tokens, which would exacerbate the model limits by I tokens. You may get no Answer. Please reduce your documents size or numbers, and/or your requested max Answer size.”

Going the opposite direction means that we are reserving space for an Answer and not minding about the prompt, documents, and the query itself. Any LLM without a prompt would be unusable, and the consequence is no answer or just a random-generated answer (noise).

@sjrl
Copy link
Contributor

sjrl commented Mar 13, 2023

Hey everyone, @zoltan-fedor also brought up a really good point in this issue #4388. For at least the flan-T5 models it doesn't seem to make sense to have this token limit enforced. Check out this comment from the linked issue for more info.

@danielbichuetti
Copy link
Contributor

danielbichuetti commented Mar 13, 2023

Yes. FLAN limit set by the tokenizer is not really a hard limit because of the attention mech. There is a post about this on the HF forum.

It's a common fit as the memory usage increase exponentially.

@sjrl but anyway, even if not enforcing the recommended value which fits most scenarios, there are other models and use cases, e.g., even for OpenAI if we set the answer length to a value of 4000. So, maybe changing the way we warn users or the message would be advisable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:promptnode type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants