-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TokenizerFast
] can_save_slow_tokenizer
as a property for when vocab_file
's folder was removed
#25626
Conversation
The documentation is not available anymore as the PR was closed or merged. |
TokenizerFast
] Warn when vocab_file
folder was removedTokenizerFast
] can_save_slow
as a property for when vocab_file
's folder was removed
TokenizerFast
] can_save_slow
as a property for when vocab_file
's folder was removedTokenizerFast
] can_save_slow_tokenizer
as a property for when vocab_file
's folder was removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating!
@@ -189,6 +189,10 @@ def __init__( | |||
for k in self.fairseq_tokens_to_ids.keys(): | |||
self.unique_no_split_tokens.append(k) | |||
|
|||
@property | |||
def can_save_slow_tokenizer(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why the tokenizer didn't have this as an attribute before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha good question, I think it's because it was not really detected. In this special case the folder containing the sentencepiece model was deleted, which if it is in the cache
of transformers
then never / rarely happens
…ocab_file`'s folder was removed (huggingface#25626) * pad token should be None by default * fix tests * nits * check if isfile vocabfile * add warning if sp model folder was deleted * save SPM when missing folder for sloz * update the ` can_save_slow_tokenizer` to be a property * first batch * second batch * missing one
…ocab_file`'s folder was removed (huggingface#25626) * pad token should be None by default * fix tests * nits * check if isfile vocabfile * add warning if sp model folder was deleted * save SPM when missing folder for sloz * update the ` can_save_slow_tokenizer` to be a property * first batch * second batch * missing one
…ocab_file`'s folder was removed (huggingface#25626) * pad token should be None by default * fix tests * nits * check if isfile vocabfile * add warning if sp model folder was deleted * save SPM when missing folder for sloz * update the ` can_save_slow_tokenizer` to be a property * first batch * second batch * missing one
What does this PR do?
Fixes #25602, making
can_save_slow
a property rather than an attribute as we need to check if the vocab_file still exists!