You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to reproduce the QPIC module by using the mmhoidet. But I have two issues. For the first issue, the shape of the
sub_bbox_targets and pos_gt_sub_bboxes_targets didn't match in the qpic_head.py. Below is the error:
Traceback (most recent call last):
File "tools/hoi_train.py", line 195, in
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(**data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(*loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 607, in _get_target_single
sub_bbox_targets[pos_inds] = pos_gt_sub_bboxes_targets
RuntimeError: shape mismatch: value tensor of shape [2, 4] cannot be broadcast to indexing result of shape [0, 4]
The second issue is the gt_sub_bboxes and gt_obj_bboxes will return different lengths in some images, that's weird. Because I have checked the trainval_hico.json, the ground truth of the subject, object, and hoi category should be a pair of triples. Below is the error:
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "tools/hoi_train.py", line 195, in
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(**data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(*loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 574, in _get_target_single
gt_sub_bboxes, gt_obj_bboxes)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/pseudo_sampler.py", line 47, in sample
assign_result, gt_flags)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/sampling_result.py", line 48, in init
self.pos_gt_sub_bboxes = gt_sub_bboxes[self.pos_assigned_gt_inds.long(), :]
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2ac2b2a807d2 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x2319e (0x2ac2b279e19e in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x22d (0x2ac2b279fd3d in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x300f48 (0x2ac25f073f48 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x2ac2b2a69005 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #5: + 0x1ed619 (0x2ac25ef60619 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x4e4ec8 (0x2ac25f257ec8 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object) + 0x299 (0x2ac25f2581c9 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #25: __libc_start_main + 0xf5 (0x2ac25654e555 in /lib64/libc.so.6)
Both problems are from the qpic_head.py, but I think the problem could be a data loading problem. I trained the model after running the data_convetr file that you told us about in the INSTALL.MD. And I found you add some comments like #TODO: unfinished in the qpic_head.py. Do you have the finished version of this repository? If you could update it or share it that would be great! Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I would like to reproduce the QPIC module by using the mmhoidet. But I have two issues. For the first issue, the shape of the
sub_bbox_targets and pos_gt_sub_bboxes_targets didn't match in the qpic_head.py. Below is the error:
Traceback (most recent call last):
File "tools/hoi_train.py", line 195, in
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(**data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(*loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 607, in _get_target_single
sub_bbox_targets[pos_inds] = pos_gt_sub_bboxes_targets
RuntimeError: shape mismatch: value tensor of shape [2, 4] cannot be broadcast to indexing result of shape [0, 4]
The second issue is the gt_sub_bboxes and gt_obj_bboxes will return different lengths in some images, that's weird. Because I have checked the trainval_hico.json, the ground truth of the subject, object, and hoi category should be a pair of triples. Below is the error:
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [4,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [5,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [6,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [7,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.Traceback (most recent call last):
File "tools/hoi_train.py", line 195, in
main()
File "tools/hoi_train.py", line 191, in main
meta=meta)
File "/users/PCS0256/lijing/mmdetection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 249, in train_step
losses = self(**data)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/basehoidetector.py", line 183, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/detectors/qpic.py", line 64, in forward_train
gt_obj_labels, gt_verb_labels, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 213, in forward_train
return self.loss(*loss_inputs)
File "/users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 320, in loss
img_metas_list)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 382, in loss_single
img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 510, in get_targets
gt_sub_bboxes_list, gt_obj_bboxes_list, gt_obj_labels_list, gt_verb_labels_list, img_metas)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(map_results)))
File "/users/PCS0256/lijing/mmdetection/mmdet/models/hoi_heads/qpic_head.py", line 574, in _get_target_single
gt_sub_bboxes, gt_obj_bboxes)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/pseudo_sampler.py", line 47, in sample
assign_result, gt_flags)
File "/users/PCS0256/lijing/mmdetection/mmdet/core/hoi/samplers/sampling_result.py", line 48, in init
self.pos_gt_sub_bboxes = gt_sub_bboxes[self.pos_assigned_gt_inds.long(), :]
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1230 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2ac2b2a807d2 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x2319e (0x2ac2b279e19e in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x22d (0x2ac2b279fd3d in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x300f48 (0x2ac25f073f48 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x2ac2b2a69005 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #5: + 0x1ed619 (0x2ac25ef60619 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x4e4ec8 (0x2ac25f257ec8 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object) + 0x299 (0x2ac25f2581c9 in /users/PCS0256/lijing/indiviual_research/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #25: __libc_start_main + 0xf5 (0x2ac25654e555 in /lib64/libc.so.6)
/var/spool/slurmd/job13247525/slurm_script: line 19: 233669 Aborted (core dumped) CUDA_LAUNCH_BLOCKING=1 python tools/hoi_train.py configs/qpic/qpic_r50_150e_hico.py
Both problems are from the qpic_head.py, but I think the problem could be a data loading problem. I trained the model after running the data_convetr file that you told us about in the INSTALL.MD. And I found you add some comments like #TODO: unfinished in the qpic_head.py. Do you have the finished version of this repository? If you could update it or share it that would be great! Thanks!
The text was updated successfully, but these errors were encountered: