Support models that doesn't output `past_key_values` #91

younesbelkada · 2023-05-22T09:01:10Z

What does this PR do?

Originally pointed out in huggingface/transformers#22797 (comment) by @fullstackwebdev

By design, some models in transformers does not output past_key_value.
This is the case for a new architecture called RWKV, recently integrated in Hugging Face's transformers: huggingface/transformers#22797
For that specific architecture, it is an 'attention free' LLM that does not rely on past key value mechanism to return the cache of the model, as the tokens are always processed one by one.
This PR adds the support of these custom models, by returning None if past_key_values is not present in the model's output. The generate method should automatically take care of the rest under the hood in transformers.

To reproduce

Simply run the snippet below:

import guidance
# we use StableLM as an open example, but these issues impact all models to varying degrees
guidance.llm = guidance.llms.Transformers("RWKV/rwkv-4-169m-pile", device=0)

# we turn token healing off so that guidance acts like a normal prompting library
program = guidance('''Hello my name is {{gen max_tokens=10}}''')
print(program())

@slundberg

add safety checker to retrieve `None` in case the model has no `past_key_values`

slundberg · 2023-05-22T18:50:42Z

Thanks! I have not read up read on RWKV, so I'll take a look at that and this PR and then merge assuming it all looks good.

slundberg · 2023-05-22T19:17:25Z

@younesbelkada So, after digging into things a bit it seems that this PR would make RWKV function, but would disable all the Guidance acceleration that we normally get. Is there an easy way to reuse the state vector for RWKV? Basically if we have a program like below, we want to save the state at the end of the first generation and then do a batch computation that extends the state with the fixed text between the generations, and then run the second generation.

import guidance

guidance.llm = guidance.llms.Transformers("RWKV/rwkv-4-169m-pile", device=0)

program = guidance('''Hello my name is {{gen 'name' max_tokens=10 stop=" "}}, and I have a story titled "{{gen 'title'}}"''')
print(program())

We can merge as is, but it is not a good long term solution for performance-sensitive uses (meaning anytime you are waiting for a while for results).

younesbelkada · 2023-05-22T19:45:23Z

Thank you very much for reviewing !
I believe one can retrieve the state vector from outputs.state: https://github.com/huggingface/transformers/blob/e69feab8a13cf6cbf99fd6f3ff6cbc105d2183d9/src/transformers/models/rwkv/modeling_rwkv.py#LL533C1-L533C1
However, this might require some work on guidance, so maybe better to work on a proper accelerated RWKV integration as a follow up PR

slundberg · 2023-05-22T21:03:53Z

Sounds good. I'll merge this and am happy to review any followup PR that does proper acceleration. Also, for the benefit of transformers it might be good to consider how to expose session-based state caching in a more standard way. guidance has to use monkey patching to get what we want, which is probably not a great long term solution.

Update _transformers.py

be032c5

add safety checker to retrieve `None` in case the model has no `past_key_values`

younesbelkada mentioned this pull request May 22, 2023

Add RWKV-4 huggingface/transformers#22797

Merged

7 tasks

slundberg merged commit 0a64fa9 into guidance-ai:main May 22, 2023

younesbelkada deleted the patch-1 branch May 22, 2023 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support models that doesn't output `past_key_values` #91

Support models that doesn't output `past_key_values` #91

younesbelkada commented May 22, 2023 •

edited

Loading

slundberg commented May 22, 2023

slundberg commented May 22, 2023 •

edited

Loading

younesbelkada commented May 22, 2023

slundberg commented May 22, 2023

Support models that doesn't output past_key_values #91

Support models that doesn't output past_key_values #91

Conversation

younesbelkada commented May 22, 2023 • edited Loading

What does this PR do?

To reproduce

slundberg commented May 22, 2023

slundberg commented May 22, 2023 • edited Loading

younesbelkada commented May 22, 2023

slundberg commented May 22, 2023

Support models that doesn't output `past_key_values` #91

Support models that doesn't output `past_key_values` #91

younesbelkada commented May 22, 2023 •

edited

Loading

slundberg commented May 22, 2023 •

edited

Loading