You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Traceback (most recent call last):
....
File "python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "python3.9/site-packages/transformers/trainer.py", line 1916, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "python3.9/site-packages/transformers/trainer.py", line 2237, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "python3.9/site-packages/transformers/trainer.py", line 2294, in _save_checkpoint
self.save_model(output_dir, _internal_call=True)
File "python3.9/site-packages/transformers/trainer.py", line 2749, in save_model
self._save(output_dir, state_dict=state_dict)
File "python3.9/site-packages/transformers/trainer.py", line 2832, in _save
self.tokenizer.save_pretrained(output_dir)
File "python3.9/site-packages/transformers/tokenization_utils_base.py", line 2221, in save_pretrained
save_files = self._save_pretrained(
File "python3.9/site-packages/transformers/tokenization_utils_fast.py", line 595, in _save_pretrained
vocab_files = self.save_vocabulary(save_directory, filename_prefix=filename_prefix)
File "python3.9/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 186, in save_vocabulary
copyfile(self.vocab_file, out_vocab_file)
File "/opt/bb/lib/python3.9/shutil.py", line 264, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: './model/tokenizer.model'
Expected behavior
When I finetune llama, it threw out this error when saving the first checkpoint because the original model directory was deleted.
Sure, the problem is that in fast we cannot recover the content of vocab_file if the repo was deleted. We can produce a warning however, mentioning that you won't be able to initialize a slow tokenizer. Opening a PR to fix this! Thanks for reporting
It depends, if you have a tokenizer.json file then yes, if not, then you cannot convert the slow tokenizer if the vocab_file (which in this case is the sentencepiece model) was deleted no?
System Info
transformers==4.31.0
torch==2.0.1
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Traceback (most recent call last):
....
File "python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "python3.9/site-packages/transformers/trainer.py", line 1916, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "python3.9/site-packages/transformers/trainer.py", line 2237, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "python3.9/site-packages/transformers/trainer.py", line 2294, in _save_checkpoint
self.save_model(output_dir, _internal_call=True)
File "python3.9/site-packages/transformers/trainer.py", line 2749, in save_model
self._save(output_dir, state_dict=state_dict)
File "python3.9/site-packages/transformers/trainer.py", line 2832, in _save
self.tokenizer.save_pretrained(output_dir)
File "python3.9/site-packages/transformers/tokenization_utils_base.py", line 2221, in save_pretrained
save_files = self._save_pretrained(
File "python3.9/site-packages/transformers/tokenization_utils_fast.py", line 595, in _save_pretrained
vocab_files = self.save_vocabulary(save_directory, filename_prefix=filename_prefix)
File "python3.9/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 186, in save_vocabulary
copyfile(self.vocab_file, out_vocab_file)
File "/opt/bb/lib/python3.9/shutil.py", line 264, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: './model/tokenizer.model'
Expected behavior
When I finetune llama, it threw out this error when saving the first checkpoint because the original model directory was deleted.
I noticed that in
transformers/src/transformers/models/llama/tokenization_llama.py
Line 281 in ef15342
Could this be added to tokenization_llama_fast.py too?
The text was updated successfully, but these errors were encountered: