You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you very much for your excellent work. I am now encountering this problem while training my model in a virtual environment.
The passed generator was created on 'cpu' even though a tensor on cuda:0 was expected. Tensors will be created on 'cpu' and then moved to cuda:0. Note that one can probably slighly speed up this function by passing a generator that was created on the cuda:0 device.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:24<00:00, 1.65it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 33.16it/s]
Moviepy - Building video ./exp_output/stage2/validation/1_1_1.mp4.
MoviePy - Writing audio in 1_1_1TEMP_MPY_wvf_snd.mp4
MoviePy - Done.
Moviepy - Writing video ./exp_output/stage2/validation/1_1_1.mp4
Moviepy - Done !
Moviepy - video ready ./exp_output/stage2/validation/1_1_1.mp4
Steps: 0%| | 1/3000 [06:10<6:40:15, 8.01s/it, lr=1e-5, step_loss=0.271, td=3.17s][2024-08-10 10:01:34,981] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2147483648, reducing to 1073741824
Steps: 0%| | 2/3000 [06:20<185:23:13, 222.61s/it, lr=1e-5, step_loss=0.258, td=4.30s][2024-08-10 10:01:44,611] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1073741824, reducing to 536870912
Steps: 0%|▏ | 3/3000 [06:30<104:21:45, 125.36s/it, lr=1e-5, step_loss=0.371, td=3.83s][2024-08-10 10:01:53,991] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 536870912, reducing to 268435456
Steps: 0%|▏ | 4/3000 [06:39<66:13:19, 79.57s/it, lr=1e-5, step_loss=0.374, td=3.64s][2024-08-10 10:02:03,559] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 268435456, reducing to 134217728
Steps: 0%|▎ | 5/3000 [06:49<45:11:54, 54.33s/it, lr=1e-5, step_loss=0.373, td=4.09s][2024-08-10 10:02:13,085] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 134217728, reducing to 67108864
Steps: 0%|▎ | 6/3000 [06:58<32:30:51, 39.10s/it, lr=1e-5, step_loss=0.262, td=3.67s][2024-08-10 10:02:21,432] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 67108864, reducing to 33554432
Steps: 0%|▍ | 7/3000 [07:07<24:08:47, 29.04s/it, lr=1e-5, step_loss=0.259, td=3.89s][2024-08-10 10:02:31,178] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 33554432, reducing to 16777216
Steps: 0%|▍ | 8/3000 [07:17<19:01:57, 22.90s/it, lr=1e-5, step_loss=0.297, td=3.85s][2024-08-10 10:02:40,832] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16777216, reducing to 8388608
Steps: 0%|▌ | 9/3000 [07:26<15:35:07, 18.76s/it, lr=1e-5, step_loss=0.284, td=3.83s][2024-08-10 10:02:50,562] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8388608, reducing to 4194304
Steps: 0%|▌ | 10/3000 [07:36<13:15:55, 15.97s/it, lr=1e-5, step_loss=0.316, td=3.91s][2024-08-10 10:03:00,072] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4194304, reducing to 2097152
Steps: 0%|▋ | 11/3000 [07:45<11:37:07, 13.99s/it, lr=1e-5, step_loss=0.243, td=3.69s]
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
c
The text was updated successfully, but these errors were encountered:
Thank you very much for your excellent work. I am now encountering this problem while training my model in a virtual environment.
The passed generator was created on 'cpu' even though a tensor on cuda:0 was expected. Tensors will be created on 'cpu' and then moved to cuda:0. Note that one can probably slighly speed up this function by passing a generator that was created on the cuda:0 device.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:24<00:00, 1.65it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 33.16it/s]
Moviepy - Building video ./exp_output/stage2/validation/1_1_1.mp4.
MoviePy - Writing audio in 1_1_1TEMP_MPY_wvf_snd.mp4
MoviePy - Done.
Moviepy - Writing video ./exp_output/stage2/validation/1_1_1.mp4
Moviepy - Done !
Moviepy - video ready ./exp_output/stage2/validation/1_1_1.mp4
Steps: 0%| | 1/3000 [06:10<6:40:15, 8.01s/it, lr=1e-5, step_loss=0.271, td=3.17s][2024-08-10 10:01:34,981] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2147483648, reducing to 1073741824
Steps: 0%| | 2/3000 [06:20<185:23:13, 222.61s/it, lr=1e-5, step_loss=0.258, td=4.30s][2024-08-10 10:01:44,611] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1073741824, reducing to 536870912
Steps: 0%|▏ | 3/3000 [06:30<104:21:45, 125.36s/it, lr=1e-5, step_loss=0.371, td=3.83s][2024-08-10 10:01:53,991] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 536870912, reducing to 268435456
Steps: 0%|▏ | 4/3000 [06:39<66:13:19, 79.57s/it, lr=1e-5, step_loss=0.374, td=3.64s][2024-08-10 10:02:03,559] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 268435456, reducing to 134217728
Steps: 0%|▎ | 5/3000 [06:49<45:11:54, 54.33s/it, lr=1e-5, step_loss=0.373, td=4.09s][2024-08-10 10:02:13,085] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 134217728, reducing to 67108864
Steps: 0%|▎ | 6/3000 [06:58<32:30:51, 39.10s/it, lr=1e-5, step_loss=0.262, td=3.67s][2024-08-10 10:02:21,432] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 67108864, reducing to 33554432
Steps: 0%|▍ | 7/3000 [07:07<24:08:47, 29.04s/it, lr=1e-5, step_loss=0.259, td=3.89s][2024-08-10 10:02:31,178] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 33554432, reducing to 16777216
Steps: 0%|▍ | 8/3000 [07:17<19:01:57, 22.90s/it, lr=1e-5, step_loss=0.297, td=3.85s][2024-08-10 10:02:40,832] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16777216, reducing to 8388608
Steps: 0%|▌ | 9/3000 [07:26<15:35:07, 18.76s/it, lr=1e-5, step_loss=0.284, td=3.83s][2024-08-10 10:02:50,562] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8388608, reducing to 4194304
Steps: 0%|▌ | 10/3000 [07:36<13:15:55, 15.97s/it, lr=1e-5, step_loss=0.316, td=3.91s][2024-08-10 10:03:00,072] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4194304, reducing to 2097152
Steps: 0%|▋ | 11/3000 [07:45<11:37:07, 13.99s/it, lr=1e-5, step_loss=0.243, td=3.69s]
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
c
The text was updated successfully, but these errors were encountered: