Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inverse chat templating #33321

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Add inverse chat templating #33321

wants to merge 16 commits into from

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Sep 5, 2024

Very experimental PR for now! This PR adds inverse chat templating, where we convert a formatted chat back into message dicts in a universal format. This has been requested several times, and it's critical to allow seamless tool use in pipelines, without requiring users to manually write parsers for each model.

Inverse templating requires the ability to extract text from a large input string. After testing several templates, I found that we can generally handle this by exposing two functions in the inverse templating environment: a slightly modified re.finditer() and a totally unmodified json.loads()

TODO:

  • Make some model PRs to test with this, but don't merge yet! (PRs not open, but templates written)
  • Add tests for loading/saving of inverse templates
  • Add tests for inverse template function
  • Test recovery of tools as well as messages
  • Make sure I don't need any extra functions
  • Refactor chat template tests out of tokenization_common so they're not run for every model
  • Add chat template tests to CircleCI
  • Make sure extraction works correctly with generation, and add some tests that use (static!) generation outputs
  • Add tool use to pipelines (will put this in a separate PR)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1 Rocketknight1 marked this pull request as ready for review September 16, 2024 14:59
@LysandreJik
Copy link
Member

Thanks @Rocketknight1! I'll review ASAP

@Rocketknight1
Copy link
Member Author

@LysandreJik Don't worry, it's not quite ready! I'm still working on the little details of how this interacts with generation / incomplete inputs, etc.

@Rocketknight1
Copy link
Member Author

Update @LysandreJik, this is now ready for review! The last issue I was testing was how the inverse templates would handle incomplete prompts - for example, if the user called model.generate(), but only passed the new tokens to the inverse template, instead of the entire prompt + output.

My conclusion was that this is just very messy and impossible to handle correctly - in particular, the prompt often contains the assistant header, so it's extremely hard for the inverse template to figure out what's going on if you only pass the generation output. Therefore, I think we should only support applying inverse templates to entire chats, and not single generation outputs, and so there's no need to test it with generation outputs only.

@LysandreJik
Copy link
Member

Awesome, will take a look ASAP

@CISC
Copy link
Contributor

CISC commented Sep 25, 2024

Shouldn't this support multiple (tool_use/rag) templates just like chat templates?

In the cases where it makes sense not to make a unified chat template I'd imagine it won't make sense to make a unified inverse template as well...

@Rocketknight1
Copy link
Member Author

We might consider adding that in future! In general, we're trying to move away from having multiple chat templates, in favour of a single template that just uses conditions like {% if tools %} to switch its behaviour, although we'll continue supporting multiple (non-inverse) templates indefinitely.

@LysandreJik
Copy link
Member

Looks like a simple enough PR for me; would you like to check with others tha are using chat templates extensively like @xenova or @Narsil to get a review?

@Rocketknight1
Copy link
Member Author

Sure, I'll ping them! cc @lewtun @xenova @Narsil and @zucchini-nlp, since they've been working with templates (or have requested this feature).

@zucchini-nlp
Copy link
Member

From VLM side I dont think this feature will be very useful, as I don't see and easy way to implement that if the input text has already been processed (processor.call()) and then detokenized back. Saying because I guess it will be used to format back to dict after generating text

But I think it's okay as VLMs are one step behind and don't support full potential of chat templates yet

"apply_chat_template requires jinja2>=3.1.0 to be installed. Your version is " f"{jinja2.__version__}."
)

jinja_env = SandboxedEnvironment(trim_blocks=True, lstrip_blocks=True, extensions=[jinja2.ext.loopcontrols])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something here, but do you really need a mutable sandbox? Your testing template doesn't look like it does...

Comment on lines +786 to +791
{%- set tools = finditer("\[AVAILABLE_TOOLS\] (.*?)\[\/AVAILABLE_TOOLS\]", chat, flags=16) %}
{%- set user_messages = finditer('(?:\[INST\] )(.+?)\[\/INST\]', chat, flags=16, add_tag="user") %}
{%- set asst_messages = finditer('(?:\[\/INST\]|\[\/TOOL_RESULTS\]) (.+?)<\/s>', chat, flags=16, add_tag="assistant") %}
{%- set available_tools = finditer('\[AVAILABLE_TOOLS\] (.*?)\[\/AVAILABLE_TOOLS\]', chat, flags=16, add_tag="available_tools") %}
{%- set tool_calls = finditer('\[TOOL_CALLS\] (.+?\])<\/s>', chat, flags=16, add_tag="tool_calls") %}
{%- set tool_results = finditer('\[TOOL_RESULTS\] (.+?)\[\/TOOL_RESULTS\]', chat, flags=16, add_tag="tool") %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're exposing regex flags to the template anyway should not all these use flags=DOTALL instead of a magic number?

@CISC
Copy link
Contributor

CISC commented Oct 9, 2024

@Rocketknight1 BTW, really looking forward to this feature, it will be super useful.

I'm working on a chat template editor (Gradio space) which will make it super easy to view, modify, test and create PRs for chat templates on HF. I'm hoping to add support for inverse templates as well, but it will require the HF API to return the inverse template in the config.tokenizer_config entry in ModelInfo, who can ensure the API is updated once this feature is added?

@Rocketknight1
Copy link
Member Author

Rocketknight1 commented Oct 9, 2024

Hi @CISC - in that case, I'd pay attention to this PR too! #33957

We're planning a medium-term refactor because of how much chat templates have blown up, with the goal of eventually moving them to a single file in most model repos.

(I may delay this PR until we settle on a spec in #33957, to avoid putting inverse templates somewhere and then immediately having to move them again)

@CISC
Copy link
Contributor

CISC commented Oct 9, 2024

(I may delay this PR until we settle on a spec in #33957, to avoid putting inverse templates somewhere and then immediately having to move them again)

That PR seems like it needs to be very gradually phased in though, it will impact a lot of projects that need to catch up, does it really make sense to delay this one until that is done?

@Rocketknight1
Copy link
Member Author

@CISC good point, but we'll probably merge that PR relatively soon so that Transformers starts to support the new format, even though we won't actually transition repos until later!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants