Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RT-DETR 训练bbox ap =0 #8431

Closed
2 of 3 tasks
zouxiaodong opened this issue Jul 13, 2023 · 19 comments
Closed
2 of 3 tasks

RT-DETR 训练bbox ap =0 #8431

zouxiaodong opened this issue Jul 13, 2023 · 19 comments
Assignees
Labels
bug Something isn't working status/close

Comments

@zouxiaodong
Copy link

zouxiaodong commented Jul 13, 2023

问题确认 Search before asking

  • 我已经查询历史issue,没有发现相似的bug。I have searched the issues and found no similar bug report.

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

使用7类昆虫数据集(农林-昆虫检测)进行rtdetr训练,bbox ap 为0
python tools/train.py -c configs/rtdetr/rtdetr_hgnetv2_x_6x_coco.yml --eval
--use_vdl=true
--vdl_log_dir=vdl_dir/scalar

DONE (t=0.42s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007
[07/13 15:46:41] ppdet.engine INFO: Total sample number: 245, average FPS: 11.944735034594364
[07/13 15:46:41] ppdet.engine INFO: Best test bbox ap is 0.000.
[07/13 15:46:53] ppdet.utils.checkpoint INFO: Save checkpoint: output
[07/13 15:46:55] ppdet.engine INFO: Epoch: [1] [ 0/423] learning_rate: 0.000010 loss_class: 0.544548 loss_bbox: 0.165296 loss_giou: 0.795024 loss_class_aux: 5.065946 loss_bbox_aux: 1.195941 loss_g

复现环境 Environment

使用的aistudio的环境:
项目为
RTDETR-INSECT:https://aistudio.baidu.com/aistudio/projectdetail/6531666?sUid=466051&shared=1&ts=1689235332613
image

Bug描述确认 Bug description confirmation

  • 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR? Are you willing to submit a PR?

  • 我愿意提交PR!I'd like to help by submitting a PR!
@zouxiaodong zouxiaodong added the bug Something isn't working label Jul 13, 2023
@lyuwenyu
Copy link
Collaborator

你用的是最新的代码嘛 commit id发一下

@lyuwenyu
Copy link
Collaborator

修了一个bug #8409 拉一下最新的代码吧, 或者 如果你只用rtdetr 可以用我们试验代码验证一下 https://github.com/lyuwenyu/RT-DETR

@zouxiaodong
Copy link
Author

@lyuwenyu 我是用的最新代码
git clone https://ghproxy.com/https://github.com/paddlepaddle/PaddleDetection.git
!git checkout develop

aistudio的URL:https://aistudio.baidu.com/aistudio/projectdetail/6531666?sUid=466051&shared=1&ts=1689238539735
我试试你给的试验代码

@lyuwenyu
Copy link
Collaborator

好的 你再试一下 看看是不是代码的问题, 有问题再反馈

@zouxiaodong zouxiaodong changed the title RT-DETER 训练bbox ap =0 RT-DETR 训练bbox ap =0 Jul 13, 2023
@zouxiaodong
Copy link
Author

@lyuwenyu 您好,使用https://github.com/lyuwenyu/RT-DETR 在aistudio上测试报错了,看起是cuda必须要10.2,aistudio上找不到这个版本的环境
ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

我在kaggle创建了一个paddlepaddle-gpu:2.4.2.post117的环境,rtdetr_r101vd_6x_coco.yml ,base_lr: 0.00001 ,单卡P100,训练10轮后,IoU=0.50 ,AP 0.732
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.511
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.732
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.624
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.513
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.459
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.721
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.782
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.776
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.673
[07/13 11:35:56] ppdet.engine INFO: Total sample number: 245, average FPS: 17.744516988012954
[07/13 11:35:56] ppdet.engine INFO: Best test bbox ap is 0.511.
而使用PP-YOLOE ppyoloe_plus_crn_s_80e_coco.yml, base_lr: 0.000125 ,相同的数据集相同的epoch,IoU=0.50 AP 0.83
AAverage Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.611
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.836
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.742
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.558
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.518
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.778
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.787
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.785
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.766
[07/12 15:28:42] ppdet.engine INFO: Total sample number: 245, average FPS: 10.062852305401066
[07/12 15:28:42] ppdet.engine INFO: Best test bbox ap is 0.646.

问题:为什么rt-detr的效果要比ppyoloe差得多呢,不应该是rt-detr的精度更高么?是因为rt-detr这上模型收敛更慢?

@lyuwenyu
Copy link
Collaborator

lyuwenyu commented Jul 13, 2023

你训练了多少epoch, 都是加载了coco预训练嘛

@zouxiaodong
Copy link
Author

都是10个epoch,都是使用了配置文件中的预训练权重,
还有个问题,相同paddlepaddle的环境,paddledetection的develop分支,训练时ap就是一直是0,是不是develop分支的代码有什么问题

@lyuwenyu
Copy link
Collaborator

都是10个epoch,都是使用了配置文件中的预训练权重, 还有个问题,相同paddlepaddle的环境,paddledetection的develop分支,训练时ap就是一直是0,是不是develop分支的代码有什么问题

小数据集你最好是加载coco的预训练进行对比, 你ppdet ap==0的commit id是多少

@yeyupiaoling
Copy link

yeyupiaoling commented Jul 14, 2023

@lyuwenyu 您好,我的ye也这样的问题,我使用的是今天才拉取的develop分支代码。使用的模型是rtdetr_r50vd_6x_coco.yml。预训练模型是自带的,应该是ImageNet,训练了72epoch。

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
[07/14 14:49:25] ppdet.engine INFO: Total sample number: 186, average FPS: 36.135789395574754
[07/14 14:49:25] ppdet.engine INFO: Best test bbox ap is 0.000.
[07/14 14:49:26] ppdet.utils.checkpoint INFO: Save checkpoint: output

image

@lyuwenyu
Copy link
Collaborator

lyuwenyu commented Jul 14, 2023

https://github.com/lyuwenyu/RT-DETR

@yeyupiaoling 你用的什么数据集 要不你也用这个先试一下能不能训练出来 https://github.com/lyuwenyu/RT-DETR (这个我昨天已经再次验证过 在coco上没问题) ; 看看是rtdetr本身的问题 还是ppdet库里合的其他东西导致的bug

有结果了辛苦来反馈一下

@yeyupiaoling
Copy link

@lyuwenyu 我试试你的的https://github.com/lyuwenyu/RT-DETR,我的数据集很小,才200多张图片。

@yeyupiaoling
Copy link

@lyuwenyu 你这个可以,下面训练的是55epoch的结果。

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.969
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.997
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.995
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.969
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.977
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.982
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.996
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.996

@todesti2
Copy link

ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory我的也是这样,但是安装10.2的cuda好像还要安装cudnn,这个之前我试过非常麻烦,请问你怎么解决的呢

@yeyupiaoling
Copy link

@49xiyu 这个问题应该更你的的问题没有关系的,你这个是缺少了cuda的一些动态库。是PaddlePaddle问题,可以是conda命令安装PaddlePaddle。

conda install paddlepaddle-gpu==2.5.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

@todesti2
Copy link

@49xiyu 这个问题应该更你的的问题没有关系的,你这个是缺少了cuda的一些动态库。是PaddlePaddle问题,可以是conda命令安装PaddlePaddle。

conda install paddlepaddle-gpu==2.5.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

您好,谢谢您的回复。我的原本使用paddledetection官方库也是存在AP、AR为0的情况,换成这个RT-DETR库之后就出现了上面的问题。我试试您给的方案

@todesti2
Copy link

@49xiyu 这个问题应该更你的的问题没有关系的,你这个是缺少了cuda的一些动态库。是PaddlePaddle问题,可以是conda命令安装PaddlePaddle。

conda install paddlepaddle-gpu==2.5.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

您好,谢谢您的回复。我的原本使用paddledetection官方库也是存在AP、AR为0的情况,换成这个RT-DETR库之后就出现了上面的问题。我试试您给的方案

@49xiyu 这个问题应该更你的的问题没有关系的,你这个是缺少了cuda的一些动态库。是PaddlePaddle问题,可以是conda命令安装PaddlePaddle。

conda install paddlepaddle-gpu==2.5.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

您好,按照您给的建议,我在/RT-DETR/rtdetr_paddle下面执行了该条指令,但是再次执行训练的代码时,还是出现了ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory
这样的错误。
我之前用的是paddledetection.git,跑出来AP、AR都为0;换成RT-DETR.git就会出现上面这样的问题
思考无果,期待您的回复

@yeyupiaoling
Copy link

@49xiyu 你创建一个新的虚拟环境,重新安装PaddlePaddle看看。你有没有正确切换到虚拟环境了?

@yeyupiaoling
Copy link

@49xiyu 我的建议是可以先使用PPYOLOE+,这个并不比RT-DETR差多少,起码能用。

@todesti2
Copy link

todesti2 commented Aug 2, 2023

@49xiyu 我的建议是可以先使用PPYOLOE+,这个并不比RT-DETR差多少,起码能用。

谢谢您的建议!!我重新装了环境,就成功了哈哈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status/close
Projects
None yet
Development

No branches or pull requests

4 participants