Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce results #4

Open
moghadas76 opened this issue Aug 10, 2024 · 2 comments
Open

Reproduce results #4

moghadas76 opened this issue Aug 10, 2024 · 2 comments

Comments

@moghadas76
Copy link

Hi,

python Run.py -dataset_test PEMS07M -mode eval -model MTGNN

produce:

============================scaler_mae_loss
Applying learning rate decay.
2024-08-10 16:46: Experiment log path in: /home/seyed/PycharmProjects/step/FlashST/model/../SAVE/eval/MTGNN
0%| | 0/20 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/seyed/PycharmProjects/step/FlashST/model/Run.py", line 173, in
trainer.train_eval()
File "/home/seyed/PycharmProjects/step/FlashST/model/Trainer.py", line 128, in train_eval
train_epoch_loss, loss_pre = self.eval_trn_eps()
File "/home/seyed/PycharmProjects/step/FlashST/model/Trainer.py", line 180, in eval_trn_eps
out, q = self.model(data, data, self.args.dataset_test, self.batch_seen, nadj=nadj, lpls=lpls, useGNN=True, DSU=True)
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/seyed/PycharmProjects/step/FlashST/model/FlashST.py", line 152, in forward
return self.forward_pretrain(source, label, select_dataset, batch_seen, nadj, lpls, useGNN, DSU)
File "/home/seyed/PycharmProjects/step/FlashST/model/FlashST.py", line 155, in forward_pretrain
x_prompt_return = self.pretrain_model(source[..., :self.input_base_dim], source, None, nadj, lpls, useGNN)
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/seyed/PycharmProjects/step/FlashST/model/PromptNet.py", line 118, in forward
hidden = torch.cat([time_series_emb] + node_emb + tem_emb, dim=-1).transpose(1, 3)
RuntimeError: Sizes of tensors must match except in dimension 2. Got 228 and 114 (The offending index is 1)

What should I do?

@moghadas76
Copy link
Author

python Run.py -dataset_test PEMS07M -mode eval -model ori

0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/seyed/PycharmProjects/step/FlashST/model/Run.py", line 175, in
trainer.train_eval()
File "/home/seyed/PycharmProjects/step/FlashST/model/Trainer.py", line 128, in train_eval
train_epoch_loss, loss_pre = self.eval_trn_eps()
File "/home/seyed/PycharmProjects/step/FlashST/model/Trainer.py", line 180, in eval_trn_eps
out, q = self.model(data, data, self.args.dataset_test, self.batch_seen, nadj=nadj, lpls=lpls, useGNN=True, DSU=True)
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/seyed/miniconda3/envs/FlashST/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 154, in forward
raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

@LZH-YS1998
Copy link
Collaborator

You can solve this problem by disabling multi-GPU parallelism. We will update it accordingly. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants