Prompt Truncation Bug for models with smaller max_seq_length - truncation of all prompt #4379

TuanaCelik · 2023-03-10T15:04:06Z

Bug was discovered by @recrudesce @rolandtannous and @danielbichuetti

It seems that when the prompt truncation PR was merged for OpenAI #4179 this made it so that if the specified maximum length in the PromptNode definition is equal to or larger than the models own max sequence length, then the truncation can also truncate the prompt itself + documents too e.g. for models that have a significantly lower token limit like flan models. So the Prompt is empty, and documents are also truncated.

To Reproduce

from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate
from haystack.document_stores import WeaviateDocumentStore
from haystack.pipelines import Pipeline
import sys

document_store = WeaviateDocumentStore(similarity="cosine", embedding_dim=768)
lfqa_prompt = PromptTemplate(
    name="lfqa",
    prompt_text="Generate a comprehensive, summarized answer to the given question using the provided paragraphs and reply as if you are Yoda.  \n\n Paragraphs: $documents \n\n Question: $query \n\n Answer:",
)

retriever = EmbeddingRetriever(
    document_store=document_store,
    embedding_model="flax-sentence-embeddings/all_datasets_v3_mpnet-base",
    model_format="sentence_transformers",
    top_k=20,
)

prompt_node = PromptNode(
    model_name_or_path="google/flan-t5-xl",
    default_prompt_template=lfqa_prompt,
    top_k=9,
    max_length=512,
)

pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

output = pipe.run(query=sys.argv[1])

full_sentence = list(filter(lambda x: x.endswith("."), output["results"]))

longest_output = max(full_sentence, key=len)

print("**Answer:** " + longest_output

FAQ Check

Have you had a look at our new FAQ page?

The text was updated successfully, but these errors were encountered:

danielbichuetti · 2023-03-10T15:19:22Z

I think that we should not truncate from Prompt to Answer, but in the opposite direction. And we warn the user:
“Your prompt uses Z tokens, this LLM has a max seq. length of X tokens, you requested an Answer token space of Y tokens, which would exacerbate the model limits by I tokens. You may get no Answer. Please reduce your documents size or numbers, and/or your requested max Answer size.”

Going the opposite direction means that we are reserving space for an Answer and not minding about the prompt, documents, and the query itself. Any LLM without a prompt would be unusable, and the consequence is no answer or just a random-generated answer (noise).

sjrl · 2023-03-13T08:48:05Z

Hey everyone, @zoltan-fedor also brought up a really good point in this issue #4388. For at least the flan-T5 models it doesn't seem to make sense to have this token limit enforced. Check out this comment from the linked issue for more info.

danielbichuetti · 2023-03-13T09:09:58Z

Yes. FLAN limit set by the tokenizer is not really a hard limit because of the attention mech. There is a post about this on the HF forum.

It's a common fit as the memory usage increase exponentially.

@sjrl but anyway, even if not enforcing the recommended value which fits most scenarios, there are other models and use cases, e.g., even for OpenAI if we set the answer length to a value of 4000. So, maybe changing the way we warn users or the message would be advisable.

TuanaCelik added type:bug Something isn't working topic:promptnode labels Mar 10, 2023

silvanocerza assigned vblagoje Mar 29, 2023

vblagoje mentioned this issue Apr 13, 2023

fix: Add model_max_length model_kwargs parameter to HF PromptNode #4651

Merged

6 tasks

vblagoje closed this as completed in #4651 Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Truncation Bug for models with smaller max_seq_length - truncation of all prompt #4379

Prompt Truncation Bug for models with smaller max_seq_length - truncation of all prompt #4379

TuanaCelik commented Mar 10, 2023

danielbichuetti commented Mar 10, 2023 •

edited

Loading

sjrl commented Mar 13, 2023

danielbichuetti commented Mar 13, 2023 •

edited

Loading

Prompt Truncation Bug for models with smaller max_seq_length - truncation of all prompt #4379

Prompt Truncation Bug for models with smaller max_seq_length - truncation of all prompt #4379

Comments

TuanaCelik commented Mar 10, 2023

danielbichuetti commented Mar 10, 2023 • edited Loading

sjrl commented Mar 13, 2023

danielbichuetti commented Mar 13, 2023 • edited Loading

danielbichuetti commented Mar 10, 2023 •

edited

Loading

danielbichuetti commented Mar 13, 2023 •

edited

Loading