[`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed #25626

ArthurZucker · 2023-08-21T10:51:09Z

What does this PR do?

Fixes #25602, making can_save_slow a property rather than an attribute as we need to check if the vocab_file still exists!

…sformers into nit-llama-fast

HuggingFaceDocBuilderDev · 2023-08-21T11:15:31Z

The documentation is not available anymore as the PR was closed or merged.

amyeroberts

Thanks for updating!

amyeroberts · 2023-08-30T13:50:11Z

src/transformers/models/xlm_prophetnet/tokenization_xlm_prophetnet.py

@@ -189,6 +189,10 @@ def __init__(
        for k in self.fairseq_tokens_to_ids.keys():
            self.unique_no_split_tokens.append(k)

+    @property
+    def can_save_slow_tokenizer(self) -> bool:


Do we know why the tokenizer didn't have this as an attribute before?

Haha good question, I think it's because it was not really detected. In this special case the folder containing the sentencepiece model was deleted, which if it is in the cache of transformers then never / rarely happens

…ocab_file`'s folder was removed (huggingface#25626) * pad token should be None by default * fix tests * nits * check if isfile vocabfile * add warning if sp model folder was deleted * save SPM when missing folder for sloz * update the ` can_save_slow_tokenizer` to be a property * first batch * second batch * missing one

ArthurZucker added 6 commits July 20, 2023 18:59

pad token should be None by default

1bd6da0

fix tests

7c244b6

nits

23254c5

check if isfile vocabfile

8334785

add warning if sp model folder was deleted

4405788

Merge branch 'nit-llama-fast' of https://github.com/arthurzucker/tran…

6911716

…sformers into nit-llama-fast

ArthurZucker marked this pull request as ready for review August 21, 2023 11:34

ArthurZucker added 5 commits August 30, 2023 10:57

save SPM when missing folder for sloz

fb51069

update the can_save_slow_tokenizer to be a property

b04e274

first batch

8dc74a8

second batch

458f9e2

missing one

bebfb9a

ArthurZucker changed the title ~~[TokenizerFast] Warn when vocab_file folder was removed~~ [TokenizerFast] can_save_slow as a property for when vocab_file's folder was removed Aug 30, 2023

ArthurZucker changed the title ~~[TokenizerFast] can_save_slow as a property for when vocab_file's folder was removed~~ [TokenizerFast] can_save_slow_tokenizer as a property for when vocab_file's folder was removed Aug 30, 2023

ArthurZucker requested a review from amyeroberts August 30, 2023 12:06

amyeroberts approved these changes Aug 30, 2023

View reviewed changes

ArthurZucker merged commit 3b39b90 into huggingface:main Aug 31, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed #25626

[`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed #25626

ArthurZucker commented Aug 21, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 21, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Aug 30, 2023

ArthurZucker Aug 31, 2023

[TokenizerFast] can_save_slow_tokenizer as a property for when vocab_file's folder was removed #25626

[TokenizerFast] can_save_slow_tokenizer as a property for when vocab_file's folder was removed #25626

Conversation

ArthurZucker commented Aug 21, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 21, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Aug 30, 2023

Choose a reason for hiding this comment

ArthurZucker Aug 31, 2023

Choose a reason for hiding this comment

[`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed #25626

[`TokenizerFast`] `can_save_slow_tokenizer` as a property for when `vocab_file`'s folder was removed #25626

ArthurZucker commented Aug 21, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 21, 2023 •

edited

Loading