Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate chat templates into a single file #33957

Merged
merged 33 commits into from
Nov 26, 2024
Merged

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Oct 4, 2024

We have several issues with chat templates because they're stored as single lines in the JSON config files:

  • Impossible to review diffs
  • Very hard to edit in the web UI (or in general)
  • Differences between processor templates in chat_template.json and tokenizer templates in tokenizer_config.json causing confusion
  • Some models use multiple templates, requiring a template dict, but we're trying to discourage that in future and move those models to single templates with conditional behaviour instead

The solution:

  • Just move chat templates to a single chat_template.jinja file in the repo
  • If multiple templates are required, then they should still be stored in the JSON file. This is not supported for Processor classes, so processors should always be able to save their template as a raw Jinja file. In general, we'll be gently deprecating multiple templates in future.
  • If a chat_template.jinja file is present, it overrides the JSON files. If a tokenizer is loaded with both Jinja and JSON chat templates and resaved, it should save only the Jinja file, and not have any chat_template entry in tokenizer_config.json.

For now, we continue saving in the old format by default. I'll probably keep it this way for several versions before making the new format the default, to ensure that most users are able to load the new format before it becomes common. Until then, the new format should mostly be used for testing, to make sure it's ready for deployment when we do the switch.

Extremely draft PR for now, since it'll probably break lots of things!

TODO:

  • Add loading/saving single files in tokenizers
  • Add loading/saving single files in processors
  • Add tokenizer tests for saving + loading + push_to_hub working correctly
  • Add processor tests for saving + loading + push_to_hub working correctly
  • Ensure processors + tokenizers don't clash when both saving chat templates

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Oct 4, 2024

Interesting! I think we would need to be extra careful as this can influence quite a lot of other libs, to which we need to open PRs before merging this one!

FYI @LysandreJik and @Narsil

@Rocketknight1 Rocketknight1 force-pushed the chat_template_files branch 2 times, most recently from b4cc131 to 0216160 Compare October 8, 2024 17:48
@LysandreJik
Copy link
Member

Also cc @zucchini-nlp

@CISC
Copy link
Contributor

CISC commented Oct 9, 2024

Oh, wow, this will affect a lot of other projects. Also keep in mind that the HF API exposes the chat templates through the config.tokenizer_config entry in ModelInfo.

I will release a chat template editor (Gradio space) soon that lets you easily view, modify, test and submit PRs (just to address a couple of OP bullet-points), so thanks to @Rocketknight1 for pointing me towards this PR, I will need to keep a close eye on changes. :)

@Rocketknight1
Copy link
Member Author

@CISC yes, I'm aware that this could be disruptive! When this PR is ready we'll discuss it internally and with the community, and maybe we might end up aborting it at that point (and I'll try not to let that delay the inverse templates PR too much either way)

@Rocketknight1
Copy link
Member Author

This is ready for review! There's one major question: Right now I'm saving processor_chat_template.jinja and chat_template.jinja as separate files to ensure no conflicts, even though most repos should only have one.

We could consider always using chat_template.jinja for both, however in that case we would need to override the loading behaviour on the tokenizer, if it's the child of a processor, because if the processor saves a chat template file then the tokenizer will also load that and use it to set its own chat template attribute as well, which might have strange side effects.

Other than that, this should be ready for review! (cc some possibly affected people @zucchini-nlp @Narsil @xenova @LysandreJik)

Failing tests in UDOP seem to be unrelated and I can't reproduce them locally

@Rocketknight1 Rocketknight1 marked this pull request as ready for review October 10, 2024 15:02
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, but for processors now we'll have three ways of saving and loading templates. Some people (including repos maintained by us) still have it the old way, as part of processor_config.json. Introducing one more template file might be confusing for users, so we have to maybe add this somewhere in the docs. I mean when we finally decide to roll out saving/loading from chat_template.jinja files by default

@Rocketknight1
Copy link
Member Author

@zucchini-nlp - agreed, but hopefully we can deprecate the others! I hope when we're finally finished, we just have chat_template.jinja and processor_chat_template.jinja. I was considering merging both into chat_template.jinja but that creates lots of weird side effects, and I decided against it.

@Rocketknight1 Rocketknight1 force-pushed the chat_template_files branch 2 times, most recently from aef89c6 to 0c164ab Compare October 15, 2024 18:06
@Rocketknight1
Copy link
Member Author

Update: This should be ready for final review now! The failing UDOP tests are not related to this PR, and I have another PR open to fix them here #34180

To be clear, we will not move the chat templates when this PR is merged. This PR just adds support for saving/loading to the new location, but we will not save to the new file locations by default. This will give the many other frameworks that load chat templates time to adapt, and give users time to update to versions that support the new locations.

cc affected people: @Narsil @xenova @Wauplin

@Wauplin
Copy link
Contributor

Wauplin commented Oct 16, 2024

Thanks for the ping @Rocketknight1! I'll check what it implies server-side to make the parsing of the chat template available in the API.

@Wauplin
Copy link
Contributor

Wauplin commented Oct 16, 2024

High level feedback: to avoid any confusion in the future, shouldn't the long-term files be called tokenizer_chat_template.jinja and processor_chat_template.jinja?
Otherwise some model repos will end up with chat_template.json/processor_chat_template.jinja for processors and tokenizer_config.json/chat_template.jinja for tokenizers (for backward compatibility). Meaning that chat_completion.jinja and chat_completion.json won't have the same meaning.

Having a longer and more explicit final name would avoid confusion in my opinion.

@Rocketknight1
Copy link
Member Author

I think that makes sense, yes! It'll be a bit longer but very explicit about which files affect which classes. I'll make that change if other reviewers agree as well.

@Wauplin
Copy link
Contributor

Wauplin commented Oct 18, 2024

I was considering merging both into chat_template.jinja but that creates lots of weird side effects, and I decided against it.

Hi @Rocketknight1 @zucchini-nlp Getting back to this topic as I'd like to understand more the reasoning behind this file structure. I have a few questions:

  1. What are the side effects in play if we set a single jinja file for both tokenizers and processors?
  2. Apart from the text-only vs text+image(+<any modality?>) difference, is there another fundamental difference between a tokenizer and a processor? Can it happen that a VLM has 1 template for text-only and 1 template for text+image?
  3. (maybe a stupid question) if we have extra information to store about a jinja template, could it make sense to define a header structure that could be added to jinja templates (as jinja comments) with high level metadata? typically to indicate if the template is suitable for tokenizers and/or processors (an alternative could be to use a yaml toml config file instead of pure jinja)
  4. what are the main libraries/tools that have to be updated when we update this standard? At least transformers/transformers.js/TGI for sure. Any other one to think about?

Without knowing all the technical details, I'd advocate for a truly single file chat template if possible. If that's the case, we would support things differently server-side, typically having a single model_info.config.chat_template field instead of model_info.config.tokenizer_config.chat_template and model_info.config.processor_config.chat_template. Discussed it further with @julien-c who advocated to support only the (expected) final version on the Hub instead of the intermediate chat_template.json/... alternatives, to nudge model authors to use the "standard" asap (meaning reverting the PR I merged yesterday on moon-landing).

And sorry if all of this has already been discussed elsewhere!

@julien-c
Copy link
Member

Discussed it further with @julien-c who advocated to support only the (expected) final version on the Hub instead of the intermediate chat_template.json/... alternatives, to nudge model authors to use the "standard" asap

100%

@Rocketknight1
Copy link
Member Author

Rocketknight1 commented Oct 18, 2024

Hi @Wauplin, 1. is a good question! The main reason I wanted separate templates is because I thought the side effects would be unpredictable. For example, many processors have processor.chat_template but tokenizer.chat_template is None. However, if we put all chat templates in chat_template.json, then when the Tokenizer is loaded for that model, it will also set processor.tokenizer.chat_template. This is probably safe, but it might break some stuff that I can't predict!

@CISC
Copy link
Contributor

CISC commented Oct 23, 2024

I've published the initial version of Chat Template Editor, however the full functionality is currently blocked by the gr.LoginButton not working.

I've started work related to this and the inverse template PR, but it's currently disabled/hidden until finalized.

I have a small question I hope someone can provide an answer to however; this PR mentions processor chat templates, and I've tried locating repos containing them to see how they work and possibly add support for them, but I can't seem to find any?

Found community Pixtral which was very helpful, added support for chat_template.json and multimodal messages.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with a single file systeme and define a good standard for it. If the chat template stays as it is today, it does not have to be "in" the tokenizer: it's a new form of pre-processing that comes before the tokenzier and as such we don't need different file.

let's synch about what's the best format to use, the simpler the better

@Rocketknight1 Rocketknight1 force-pushed the chat_template_files branch 2 times, most recently from ef6a33c to a6ff538 Compare November 5, 2024 18:40
@Rocketknight1
Copy link
Member Author

I got waylaid by a couple of urgent tasks, but this should finally be ready! The summary is:

  • The only raw chat template file is chat_template.jinja, for both processors and tokenizers. This means that processors and tokenizers saved in the same repo cannot have different chat templates.
  • For now, we continue saving in the old format by default, but we support loading the new format. We keep it like that for a couple of versions so that when we start saving the new format it doesn't break everyone's experience.
  • When a tokenizer has multiple templates, it cannot be saved in the new format. This probably won't be changed - we want to gently deprecate multiple templates.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but the PR description needs to be updated!

@Rocketknight1
Copy link
Member Author

PR description updated, and merging!

@Rocketknight1 Rocketknight1 merged commit d5cf91b into main Nov 26, 2024
27 checks passed
@Rocketknight1 Rocketknight1 deleted the chat_template_files branch November 26, 2024 14:18
@Wauplin
Copy link
Contributor

Wauplin commented Nov 27, 2024

Thanks @Rocketknight1 ! Is there a first repo example on the Hub to check that? Asking since we'll have to support this new (and definitive!) file in the server-side parsing

@Rocketknight1
Copy link
Member Author

cc @Wauplin not yet, but I'll make one in hf-internal-testing today!

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* Initial draft

* Add .jinja file loading for processors

* Add processor saving of naked chat template files

* make fixup

* Add save-load test for tokenizers

* Add save-load test for tokenizers

* stash commit

* Try popping the file

* make fixup

* Pop the arg correctly

* Pop the arg correctly

* Add processor test

* Fix processor code

* stash commit

* Processor clobbers child tokenizer's chat template

* Processor clobbers child tokenizer's chat template

* make fixup

* Split processor/tokenizer files to avoid interactions

* fix test

* Expand processor tests

* Rename arg to "save_raw_chat_template" across all classes

* Update processor warning

* Move templates to single file

* Move templates to single file

* Improve testing for processor/tokenizer clashes

* Improve testing for processor/tokenizer clashes

* Extend saving test

* Test file priority correctly

* make fixup

* Don't pop the chat template file before the slow tokenizer gets a look

* Remove breakpoint

* make fixup

* Fix error
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
* Initial draft

* Add .jinja file loading for processors

* Add processor saving of naked chat template files

* make fixup

* Add save-load test for tokenizers

* Add save-load test for tokenizers

* stash commit

* Try popping the file

* make fixup

* Pop the arg correctly

* Pop the arg correctly

* Add processor test

* Fix processor code

* stash commit

* Processor clobbers child tokenizer's chat template

* Processor clobbers child tokenizer's chat template

* make fixup

* Split processor/tokenizer files to avoid interactions

* fix test

* Expand processor tests

* Rename arg to "save_raw_chat_template" across all classes

* Update processor warning

* Move templates to single file

* Move templates to single file

* Improve testing for processor/tokenizer clashes

* Improve testing for processor/tokenizer clashes

* Extend saving test

* Test file priority correctly

* make fixup

* Don't pop the chat template file before the slow tokenizer gets a look

* Remove breakpoint

* make fixup

* Fix error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants