-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inverse chat templating #33321
base: main
Are you sure you want to change the base?
Add inverse chat templating #33321
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
2777d17
to
65543a2
Compare
dd3ae44
to
0bda395
Compare
0bda395
to
b90f2d6
Compare
Thanks @Rocketknight1! I'll review ASAP |
@LysandreJik Don't worry, it's not quite ready! I'm still working on the little details of how this interacts with generation / incomplete inputs, etc. |
b90f2d6
to
ece4e6a
Compare
Update @LysandreJik, this is now ready for review! The last issue I was testing was how the inverse templates would handle incomplete prompts - for example, if the user called My conclusion was that this is just very messy and impossible to handle correctly - in particular, the prompt often contains the assistant header, so it's extremely hard for the inverse template to figure out what's going on if you only pass the generation output. Therefore, I think we should only support applying inverse templates to entire chats, and not single generation outputs, and so there's no need to test it with generation outputs only. |
Awesome, will take a look ASAP |
Shouldn't this support multiple ( In the cases where it makes sense not to make a unified chat template I'd imagine it won't make sense to make a unified inverse template as well... |
We might consider adding that in future! In general, we're trying to move away from having multiple chat templates, in favour of a single template that just uses conditions like |
Sure, I'll ping them! cc @lewtun @xenova @Narsil and @zucchini-nlp, since they've been working with templates (or have requested this feature). |
From VLM side I dont think this feature will be very useful, as I don't see and easy way to implement that if the input text has already been processed ( But I think it's okay as VLMs are one step behind and don't support full potential of chat templates yet |
"apply_chat_template requires jinja2>=3.1.0 to be installed. Your version is " f"{jinja2.__version__}." | ||
) | ||
|
||
jinja_env = SandboxedEnvironment(trim_blocks=True, lstrip_blocks=True, extensions=[jinja2.ext.loopcontrols]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something here, but do you really need a mutable sandbox? Your testing template doesn't look like it does...
{%- set tools = finditer("\[AVAILABLE_TOOLS\] (.*?)\[\/AVAILABLE_TOOLS\]", chat, flags=16) %} | ||
{%- set user_messages = finditer('(?:\[INST\] )(.+?)\[\/INST\]', chat, flags=16, add_tag="user") %} | ||
{%- set asst_messages = finditer('(?:\[\/INST\]|\[\/TOOL_RESULTS\]) (.+?)<\/s>', chat, flags=16, add_tag="assistant") %} | ||
{%- set available_tools = finditer('\[AVAILABLE_TOOLS\] (.*?)\[\/AVAILABLE_TOOLS\]', chat, flags=16, add_tag="available_tools") %} | ||
{%- set tool_calls = finditer('\[TOOL_CALLS\] (.+?\])<\/s>', chat, flags=16, add_tag="tool_calls") %} | ||
{%- set tool_results = finditer('\[TOOL_RESULTS\] (.+?)\[\/TOOL_RESULTS\]', chat, flags=16, add_tag="tool") %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're exposing regex flags to the template anyway should not all these use flags=DOTALL
instead of a magic number?
@Rocketknight1 BTW, really looking forward to this feature, it will be super useful. I'm working on a chat template editor (Gradio space) which will make it super easy to view, modify, test and create PRs for chat templates on HF. I'm hoping to add support for inverse templates as well, but it will require the HF API to return the inverse template in the |
Hi @CISC - in that case, I'd pay attention to this PR too! #33957 We're planning a medium-term refactor because of how much chat templates have blown up, with the goal of eventually moving them to a single file in most model repos. (I may delay this PR until we settle on a spec in #33957, to avoid putting inverse templates somewhere and then immediately having to move them again) |
That PR seems like it needs to be very gradually phased in though, it will impact a lot of projects that need to catch up, does it really make sense to delay this one until that is done? |
@CISC good point, but we'll probably merge that PR relatively soon so that Transformers starts to support the new format, even though we won't actually transition repos until later! |
1c6fd3e
to
b4406a2
Compare
Very experimental PR for now! This PR adds inverse chat templating, where we convert a formatted chat back into message dicts in a universal format. This has been requested several times, and it's critical to allow seamless tool use in pipelines, without requiring users to manually write parsers for each model.
Inverse templating requires the ability to extract text from a large input string. After testing several templates, I found that we can generally handle this by exposing two functions in the inverse templating environment: a slightly modified
re.finditer()
and a totally unmodifiedjson.loads()
TODO:
tokenization_common
so they're not run for every modelMake sure extraction works correctly with generation, and add some tests that use (static!) generation outputsAdd tool use to pipelines(will put this in a separate PR)