about reproduction #8633

konioy · 2022-07-19T08:26:27Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Same data, same code, I trained it twice. The loss curve is very close, but the effect on the test data is very different. Why is this？

Additional

No response

glenn-jocher · 2022-07-19T16:22:11Z

@konioy current master with torch>=1.12.0 is fully reproducible:

konioy · 2022-07-20T01:28:08Z

Is it a problem with the pytorch version? Is the 1.9 version of pytorch unreproducible?

konioy · 2022-07-21T07:47:21Z

I looked at (#8213) and (https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html?highlight=torch%20use_deterministic_algorithms#torch.use_deterministic_algorithms), but don't understand how torch.nn.Upsample is reproducible?

glenn-jocher · 2022-07-23T19:17:37Z

@konioy torch>=1.12 should be fully reproducible using single GPU. Multi-GPU is not yet reproducible and we don't have a clear reason why.

konioy · 2022-08-02T03:41:13Z

I am using pytorch1.9, which I know cannot be reproduced. It may be that the choice of convolution operator is different (because torch.cudnn, benchmark=true).
But from the loss point of view of two repeated training (same code, same data), the difference in loss is very small.
train1 on the left, train2 on the right

But there is a big difference in the validation set. Do you have any ideas or suggestions?
train1:

train2

glenn-jocher · 2022-08-02T10:32:21Z

@konioy zero val loss typically indicates your validation set has no labels

EDIT: if you used --no-val then the above is normal

konioy · 2022-08-03T02:01:51Z

yes, i used --no-val.
I think two repetitions of training, the effect on the validation set should be similar. But my experiments show that repeating the training twice, there is indeed a big gap, as shown below, do you have any suggestions?
map 0.5:
train1:0.8162
train2:0.9167
precision:
train1:0.8159
train2:0.8985
recall:
train1:0.7381
train2:0.8517

glenn-jocher · 2022-08-03T09:21:12Z

@konioy use torch>=12.0 for reproducible Single-GPU CUDA trainings runs

konioy · 2022-08-03T09:25:13Z

Thanks, I know.
Do you have any suggestions for my situation?

konioy · 2022-08-04T11:51:13Z

have you encountered a similar situation？

glenn-jocher · 2022-08-04T14:47:23Z

@konioy your results are expected. torch<1.12 will not produce reproducible results.

konioy · 2022-08-05T06:04:41Z

However, the results fluctuated greatly. this is not expexted.

github-actions · 2022-09-05T00:30:08Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

glenn-jocher · 2023-11-15T15:52:24Z

@konioy Apologies for any confusion. It's possible that the fluctuation in results could be due to the non-reproducibility of training runs with torch<1.12. Upgrading to torch>=1.12 should help minimize these fluctuations.

konioy added the question Further information is requested label Jul 19, 2022

github-actions bot added the Stale label Sep 5, 2022

github-actions bot closed this as completed Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about reproduction #8633

about reproduction #8633

konioy commented Jul 19, 2022

glenn-jocher commented Jul 19, 2022

konioy commented Jul 20, 2022

konioy commented Jul 21, 2022

glenn-jocher commented Jul 23, 2022

konioy commented Aug 2, 2022 •

edited

Loading

glenn-jocher commented Aug 2, 2022 •

edited

Loading

konioy commented Aug 3, 2022

glenn-jocher commented Aug 3, 2022

konioy commented Aug 3, 2022

konioy commented Aug 4, 2022

glenn-jocher commented Aug 4, 2022

konioy commented Aug 5, 2022

github-actions bot commented Sep 5, 2022 •

edited by glenn-jocher

Loading

glenn-jocher commented Nov 15, 2023

about reproduction #8633

about reproduction #8633

Comments

konioy commented Jul 19, 2022

Search before asking

Question

Additional

glenn-jocher commented Jul 19, 2022

konioy commented Jul 20, 2022

konioy commented Jul 21, 2022

glenn-jocher commented Jul 23, 2022

konioy commented Aug 2, 2022 • edited Loading

glenn-jocher commented Aug 2, 2022 • edited Loading

konioy commented Aug 3, 2022

glenn-jocher commented Aug 3, 2022

konioy commented Aug 3, 2022

konioy commented Aug 4, 2022

glenn-jocher commented Aug 4, 2022

konioy commented Aug 5, 2022

github-actions bot commented Sep 5, 2022 • edited by glenn-jocher Loading

glenn-jocher commented Nov 15, 2023

konioy commented Aug 2, 2022 •

edited

Loading

glenn-jocher commented Aug 2, 2022 •

edited

Loading

github-actions bot commented Sep 5, 2022 •

edited by glenn-jocher

Loading