You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to train p2pnet with my custom dataset. While training, I am facing an issue in the collate_fn_crowd(batch) function:
def collate_fn_crowd(batch):
# re-organize the batch
batch_new = []
for b in batch:
imgs, points = b
if imgs.ndim == 3:
imgs = imgs.unsqueeze(0)
for i in range(len(imgs)):
# if len(points) > 0:
# batch_new.append((imgs[i, :, :, :], points[i]))
batch_new.append((imgs[i, :, :, :], points[i]))
batch = batch_new
batch = list(zip(*batch))
batch[0] = nested_tensor_from_tensor_list(batch[0])
return tuple(batch)
in util.misc.py file. The error is of the following.
CUDA_VISIBLE_DEVICES=0 python train.py --data_root $DATA_ROOT --dataset_file SHHA --epochs 3500 --lr_drop 3500 --output_dir ./logs --checkpoints_dir ./weights --tensorboard_dir ./logs --lr 0.0001 --lr_backbone 0.00001 --batch_size 8 --eval_freq 1 --gpu_id 0 --frozen_weights /home/abi/p2p-training/CrowdCounting-P2PNet/weights/prev.pth
Frozen training
Namespace(backbone='vgg16_bn', batch_size=8, checkpoints_dir='./weights', clip_max_norm=0.1, data_root='/home/abi/p2p-training/CrowdCounting-P2PNet/p2p', dataset_file='SHHA', eos_coef=0.5, epochs=3500, eval=False, eval_freq=1, frozen_weights='/home/abi/p2p-training/CrowdCounting-P2PNet/weights/prev.pth', gpu_id=0, line=2, lr=0.0001, lr_backbone=1e-05, lr_drop=3500, num_workers=8, output_dir='./logs', point_loss_coef=0.0002, resume='', row=2, seed=42, set_cost_class=1, set_cost_point=0.05, start_epoch=0, tensorboard_dir='./logs', weight_decay=0.0001)
number of params: 21579344
Start training
Traceback (most recent call last):
File "train.py", line 223, in <module>
main(args)
File "train.py", line 162, in main
args.clip_max_norm)
File "/home/abi/p2p-training/CrowdCounting-P2PNet/engine.py", line 85, in train_one_epoch
for samples, targets in data_loader:
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/abi/anaconda3/envs/p2p-train/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/abi/p2p-training/CrowdCounting-P2PNet/util/misc.py", line 311, in collate_fn_crowd
batch_new.append((imgs[i, :, :, :], points[i]))
IndexError: list index out of range
for solving this I have added if condition to process only points whose length > 0.
After done this it gives RuntimeError: CUDA error: no kernel image is available for execution on the device
What is the issue?
The text was updated successfully, but these errors were encountered:
I want to train p2pnet with my custom dataset. While training, I am facing an issue in the collate_fn_crowd(batch) function:
in util.misc.py file. The error is of the following.
for solving this I have added if condition to process only points whose length > 0.
After done this it gives
RuntimeError: CUDA error: no kernel image is available for execution on the device
What is the issue?
The text was updated successfully, but these errors were encountered: