Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Shape of gt_sub_bboxes and gt_obj_bboxes didn't Match #3

Open
lijingzhu1 opened this issue Oct 14, 2022 · 3 comments
Open

The Shape of gt_sub_bboxes and gt_obj_bboxes didn't Match #3

lijingzhu1 opened this issue Oct 14, 2022 · 3 comments

Comments

@lijingzhu1
Copy link

Hi,

I would like to reproduce the QPIC module by using the mmhoidet. But I have two issues. For the first issue, the shape of the
sub_bbox_targets and pos_gt_sub_bboxes_targets didn't match in the qpic_head.py.
Below is the error:
Traceback (most recent call last):
File "tools/hoi_train.py", line 195, in
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(**data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(*loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 607, in _get_target_single
sub_bbox_targets[pos_inds] = pos_gt_sub_bboxes_targets
RuntimeError: shape mismatch: value tensor of shape [2, 4] cannot be broadcast to indexing result of shape [0, 4]

The second issue is the gt_sub_bboxes and gt_obj_bboxes will return different lengths in some images, that's weird. Because I have checked the trainval_hico.json, the ground truth of the subject, object, and hoi category should be a pair of triples. Below is the error:

../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "tools/hoi_train.py", line 195, in
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(**data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(*loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 574, in _get_target_single
gt_sub_bboxes, gt_obj_bboxes)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/pseudo_sampler.py", line 47, in sample
assign_result, gt_flags)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/sampling_result.py", line 48, in init
self.pos_gt_sub_bboxes = gt_sub_bboxes[self.pos_assigned_gt_inds.long(), :]
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2ac2b2a807d2 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x2319e (0x2ac2b279e19e in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0x22d (0x2ac2b279fd3d in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x300f48 (0x2ac25f073f48 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x2ac2b2a69005 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #5: + 0x1ed619 (0x2ac25ef60619 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x4e4ec8 (0x2ac25f257ec8 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object
) + 0x299 (0x2ac25f2581c9 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #25: __libc_start_main + 0xf5 (0x2ac25654e555 in /lib64/libc.so.6)

/var/spool/slurmd/job13247525/slurm_script: line 19: 233669 Aborted (core dumped) CUDA_LAUNCH_BLOCKING=1 python tools/hoi_train.py configs/qpic/qpic_r50_150e_hico.py

Both problems are from the qpic_head.py, but I think the problem could be a data loading problem. I trained the model after running the data_convetr file that you told us about in the INSTALL.MD. And I found you add some comments like #TODO: unfinished in the qpic_head.py. Do you have the finished version of this repository? If you could update it or share it that would be great! Thanks!

@KainingYing
Copy link
Owner

@lijingzhu1 你能提供给我你的 命令吗

@lijingzhu1
Copy link
Author

@lijingzhu1 你能提供给我你的 命令吗

CUDA_LAUNCH_BLOCKING=1 python tools/hoi_train.py configs/qpic/qpic_r50_150e_hico.py
加了一个CUDA_LAUNCH_BLOCKING=1,不过应该不影响任何结果

@KainingYing
Copy link
Owner

我传了一个release,你看看可以用吗。
BTW,这个仓库好久没维护了,我之前是可以跑通的,但这个仓库是不完整的,里面或许有一些bug,仅供参考哦~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants