-
-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve FB metric for DDP #1805
Improve FB metric for DDP #1805
Conversation
Regarding the TPU test failure https://github.com/pytorch/ignite/pull/1805/checks?check_run_id=2122146650, i guess we got the same issue in #1699, i will try to fix it the same way |
@KickItLikeShika thanks ! yes, I think it should be division by zero somewhere... |
@vfdev-5 can you please check this https://github.com/pytorch/ignite/pull/1805/checks?check_run_id=2122812029 |
Yes, this can happend. Let's reduce the tolerence to 1e-5 for xla |
@vfdev-5 You mean by reducing the tolerence it should be smaller than 1e-15, right? It should be for example 1e-30 |
I mean pytest.approx tolerence, like here:
|
@vfdev-5 i have tried to manipulate the tolerence or the eps but it didn't work, still getting the same error |
@KickItLikeShika let's try to repro the issue locally. Please, follow these installation steps: ignite/.github/workflows/tpu-tests.yml Lines 52 to 65 in b973213
and run the test: ignite/.github/workflows/tpu-tests.yml Lines 67 to 74 in b973213
or even pytest tests/ignite/contrib/metrics/regression/test_fractional_bias.py::test_distrib_single_device_xla -vvv -m tpu |
@vfdev-5 it works now, Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KickItLikeShika Thank you for this work! LGTM
Horovod is ko 😔 |
@sdesrozis yes i see! but i think it's not related, right? as the tests for |
@sdesrozis please restart the CI to see what's going to happen.. |
Let’s see tomorrow about this failure @vfdev-5 |
@sdesrozis it works fine, thanks! |
@KickItLikeShika probably we have to adjust tolerance for GPU tests as well : https://app.circleci.com/pipelines/github/pytorch/ignite/1607/workflows/381ab87c-6226-466e-ab64-4bbe8a835f56/jobs/4649 Please, send a follow-up PR with tol ~1e-5 |
Improving
FractionalBias
implementation to be compatible with DDP and add the necessary testsCheck list: