training problem #40

vision-heng · 2022-03-18T07:39:32Z

Hello, Professor! I have the following problem when running the code on win11. Can you explain what they mean and how to solve the problems? (my graph memory is 8GB) Thank you very much!

python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100
Namespace(K=2, base_lr1=0.1, base_lr2=0.1, baseline='lucir', branch_1='ss', branch_2='free', branch_mode='dual', ckpt_dir_fg='-', ckpt_label='exp01', custom_momentum=0.9, custom_weight_decay=0.0005, data_dir=
'data/seed_1993_subset_100_imagenet/data', dataset='cifar100', disable_gpu_occupancy=True, dist=0.5, dynamic_budget=False, epochs=160, eval_batch_size=128, fusion_lr=1e-08, gpu='0', icarl_T=2, icarl_beta=0.25
, imgnet_backbone='resnet18', lr_factor=0.1, lw_mr=1, nb_cl=10, nb_cl_fg=50, nb_protos=20, num_classes=100, num_workers=1, random_seed=1993, resume=False, resume_fg=False, test_batch_size=100, the_lambda=5, train_batch_size=128)
Using gpu: 0
Total memory: 8192, used memory: 829
Occupy GPU memory in advance.
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\seed_1993_cifar100_order.pkl
Loading the saved class order
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Feature: 64 Class: 50
Setting the dataloaders ...
Check point name: ./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\iter_4_b1.pth

Epoch: 0, learning rate: 0.1
Traceback (most recent call last):
File "main.py", line 88, in
trainer.train()
File "E:\AlgSpace\pycharm\AANets\trainer\trainer.py", line 171, in train
cur_lambda, self.args.dist, self.args.K, self.args.lw_mr)
File "E:\AlgSpace\pycharm\AANets\trainer\zeroth_phase.py", line 63, in incremental_train_and_eval_zeroth_phase
outputs = b1_model(inputs)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_resnet_cifar.py", line 109, in forward
x = self.fc(x)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_linear.py", line 37, in forward
F.normalize(self.weight, p=2, dim=1))
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

The text was updated successfully, but these errors were encountered:

yaoyao-liu · 2022-03-18T08:10:11Z

Thanks for your interest in our work!

I'm not entirely sure what's causing this problem. I think it might be due to the mismatch of PyTorch and CUDA versions.
My NVIDIA driver version is 460.84, and my CUDA version is 11.2. I hope this information might help you.

If you have any further questions, please feel free to add comments to this issue.

vision-heng · 2022-03-18T08:29:14Z

Thanks for your reply.I will check these versions of my NVIDIA driver and CUDA. Yaoyao Liu ***@***.***> 于2022年3月18日周五 16:10写道：

…

Thanks for your interest in our work! I'm not entirely sure what's causing this problem. I think it might be due to the mismatch of PyTorch and CUDA versions. My NVIDIA driver version is 460.84, and my CUDA version is 11.2. I hope this information might help you. If you have any further questions, please feel free to add comments to this issue. — Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANGANBH5BIEMWC4D7YX74ULVAQ267ANCNFSM5RBB6MPA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

cocogt96 · 2022-04-22T15:06:07Z

Hi, I got the same problem here. Do you find out how to fix it? Very appreciated for your help.

yaoyao-liu · 2022-04-22T19:47:30Z

Hi, I got the same problem here. Do you find out how to fix it? Very appreciated for your help.

I don't have this issue when I running the code. So, could you please provide your GPU info, PyTorch version, and CUDA version? Thanks a lot!

cocogt96 · 2022-04-23T15:05:43Z

Hi, thank you for your reply. My ubuntu version is 20.04, driver version 470.103.01, cuda version: 11.4. Very appreciated that.

yaoyao-liu · 2022-04-23T17:45:06Z

Hi, thank you for your reply. My ubuntu version is 20.04, driver version 470.103.01, cuda version: 11.4. Very appreciated that.

Are you using PyTorch 1.2.0 and Python 3.6?

cocogt96 · 2022-04-24T03:15:32Z

Hi, thank you for your reply. My ubuntu version is 20.04, driver version 470.103.01, cuda version: 11.4. Very appreciated that.

Are you using PyTorch 1.2.0 and Python 3.6?

Yes, I follow the exact version of them.

yaoyao-liu · 2022-04-24T12:57:10Z

Hi, thank you for your reply. My ubuntu version is 20.04, driver version 470.103.01, cuda version: 11.4. Very appreciated that.

Are you using PyTorch 1.2.0 and Python 3.6?

Yes, I follow the exact version of them.

Thanks for providing this information. Currently, I cannot reproduce this issue on my system. Thus, I don't have a solution to this issue. I am very sorry about it. I will keep you posted if I find some new solution.

If you find some solutions, you may also post them here. I am truly grateful for it.

cocogt96 · 2022-04-25T14:48:17Z

Sure, Thank you for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training problem #40

training problem #40

vision-heng commented Mar 18, 2022

yaoyao-liu commented Mar 18, 2022

vision-heng commented Mar 18, 2022 via email

cocogt96 commented Apr 22, 2022

yaoyao-liu commented Apr 22, 2022

cocogt96 commented Apr 23, 2022

yaoyao-liu commented Apr 23, 2022

cocogt96 commented Apr 24, 2022

yaoyao-liu commented Apr 24, 2022

cocogt96 commented Apr 25, 2022

training problem #40

training problem #40

Comments

vision-heng commented Mar 18, 2022

yaoyao-liu commented Mar 18, 2022

vision-heng commented Mar 18, 2022 via email

cocogt96 commented Apr 22, 2022

yaoyao-liu commented Apr 22, 2022

cocogt96 commented Apr 23, 2022

yaoyao-liu commented Apr 23, 2022

cocogt96 commented Apr 24, 2022

yaoyao-liu commented Apr 24, 2022

cocogt96 commented Apr 25, 2022