Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When saving a checkpoint during model training, an OSError: [Errno 122] Disk quota exceeded message appears, but there is still space left in the save directory. #12069

Open
Wzh10032 opened this issue Dec 9, 2024 · 0 comments
Assignees

Comments

@Wzh10032
Copy link

Wzh10032 commented Dec 9, 2024

12/09 06:49:23 - mmengine - INFO - Exp name: yolov3_d53_8xb8-ms-608-273e_coco_20241208_233948
12/09 06:49:23 - mmengine - INFO - Saving checkpoint at 14 epochs
Traceback (most recent call last):
File "tools/train.py", line 122, in
main()
File "tools/train.py", line 118, in main
runner.train()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 98, in run
self.run_epoch()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 117, in run_epoch
self.runner.call_hook('after_train_epoch')
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1839, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 345, in after_train_epoch
self._save_checkpoint(runner)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 476, in _save_checkpoint
self._save_checkpoint_with_step(runner, step, meta=meta)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 443, in _save_checkpoint_with_step
runner.save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dist/utils.py", line 427, in wrapper
return func(*args, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 2271, in save_checkpoint
save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 793, in save_checkpoint
file_backend.put(f.getvalue(), filename)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/fileio/backends/local_backend.py", line 78, in put
f.write(obj)
OSError: [Errno 122] Disk quota exceeded

My save directory still has 46G of free space:
Size Used Avail Use% Mounted on
12T 12T 46G 100% /mnt

Will the save method also save the corresponding files in another disk? My /home disk is indeed full.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants