You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When saving a checkpoint during model training, an OSError: [Errno 122] Disk quota exceeded message appears, but there is still space left in the save directory.
#12069
Open
Wzh10032 opened this issue
Dec 9, 2024
· 0 comments
12/09 06:49:23 - mmengine - INFO - Exp name: yolov3_d53_8xb8-ms-608-273e_coco_20241208_233948
12/09 06:49:23 - mmengine - INFO - Saving checkpoint at 14 epochs
Traceback (most recent call last):
File "tools/train.py", line 122, in
main()
File "tools/train.py", line 118, in main
runner.train()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 98, in run
self.run_epoch()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 117, in run_epoch
self.runner.call_hook('after_train_epoch')
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1839, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 345, in after_train_epoch
self._save_checkpoint(runner)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 476, in _save_checkpoint
self._save_checkpoint_with_step(runner, step, meta=meta)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 443, in _save_checkpoint_with_step
runner.save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dist/utils.py", line 427, in wrapper
return func(*args, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 2271, in save_checkpoint
save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 793, in save_checkpoint
file_backend.put(f.getvalue(), filename)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/fileio/backends/local_backend.py", line 78, in put
f.write(obj)
OSError: [Errno 122] Disk quota exceeded
My save directory still has 46G of free space:
Size Used Avail Use% Mounted on
12T 12T 46G 100% /mnt
Will the save method also save the corresponding files in another disk? My /home disk is indeed full.
The text was updated successfully, but these errors were encountered:
12/09 06:49:23 - mmengine - INFO - Exp name: yolov3_d53_8xb8-ms-608-273e_coco_20241208_233948
12/09 06:49:23 - mmengine - INFO - Saving checkpoint at 14 epochs
Traceback (most recent call last):
File "tools/train.py", line 122, in
main()
File "tools/train.py", line 118, in main
runner.train()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 98, in run
self.run_epoch()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 117, in run_epoch
self.runner.call_hook('after_train_epoch')
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1839, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 345, in after_train_epoch
self._save_checkpoint(runner)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 476, in _save_checkpoint
self._save_checkpoint_with_step(runner, step, meta=meta)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 443, in _save_checkpoint_with_step
runner.save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dist/utils.py", line 427, in wrapper
return func(*args, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 2271, in save_checkpoint
save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 793, in save_checkpoint
file_backend.put(f.getvalue(), filename)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/fileio/backends/local_backend.py", line 78, in put
f.write(obj)
OSError: [Errno 122] Disk quota exceeded
My save directory still has 46G of free space:
Size Used Avail Use% Mounted on
12T 12T 46G 100% /mnt
Will the save method also save the corresponding files in another disk? My /home disk is indeed full.
The text was updated successfully, but these errors were encountered: