-
-
Notifications
You must be signed in to change notification settings - Fork 16.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about reproduction #8633
Comments
@konioy current master with torch>=1.12.0 is fully reproducible: |
Is it a problem with the pytorch version? Is the 1.9 version of pytorch unreproducible? |
I looked at (#8213) and (https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html?highlight=torch%20use_deterministic_algorithms#torch.use_deterministic_algorithms), but don't understand how torch.nn.Upsample is reproducible? |
@konioy torch>=1.12 should be fully reproducible using single GPU. Multi-GPU is not yet reproducible and we don't have a clear reason why. |
@konioy zero val loss typically indicates your validation set has no labels EDIT: if you used --no-val then the above is normal |
yes, i used --no-val. |
@konioy use torch>=12.0 for reproducible Single-GPU CUDA trainings runs |
Thanks, I know. |
have you encountered a similar situation? |
@konioy your results are expected. torch<1.12 will not produce reproducible results. |
However, the results fluctuated greatly. this is not expexted. |
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
@konioy Apologies for any confusion. It's possible that the fluctuation in results could be due to the non-reproducibility of training runs with torch<1.12. Upgrading to torch>=1.12 should help minimize these fluctuations. |
Search before asking
Question
Same data, same code, I trained it twice. The loss curve is very close, but the effect on the test data is very different. Why is this?
Additional
No response
The text was updated successfully, but these errors were encountered: