Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _safe_divide to allow Accuracy to run on the GPU #2640

Merged
merged 20 commits into from
Jul 30, 2024

Conversation

ndrwrbgs
Copy link
Contributor

@ndrwrbgs ndrwrbgs commented Jul 22, 2024

Caused a CPU/GPU sync point. torch.where can accept the float directly

What does this PR do?

Fixes #2638

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
Ensure CPU/GPU sync point avoided for Accuracy metric

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃


📚 Documentation preview 📚: https://torchmetrics--2640.org.readthedocs.build/en/2640/

Caused a CPU/GPU sync point. torch.where can accept the float directly
ndrwrbgs and others added 3 commits July 22, 2024 10:29
Update comment locations from pre-commit (I got it set up after the original change and missed it locally)

Also remove redundant comment
@Borda Borda changed the title Update _safe_divide to allow Accuracy to run on the GPU Update _safe_divide to allow Accuracy to run on the GPU Jul 22, 2024
@ndrwrbgs
Copy link
Contributor Author

Oh! Thank you @Borda for doing cleanups for me. I apologize I didn't get pre-commit installed until after the initial commit.

@Borda
Copy link
Member

Borda commented Jul 22, 2024

Oh! Thank you @Borda for doing cleanups for me. I apologize I didn't get pre-commit installed until after the initial commit.

no worries :)

Copy link

codecov bot commented Jul 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69%. Comparing base (bd777c9) to head (721bcf8).
Report is 103 commits behind head on master.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #2640   +/-   ##
======================================
  Coverage      69%     69%           
======================================
  Files         316     316           
  Lines       17881   17881           
======================================
  Hits        12331   12331           
  Misses       5550    5550           

@Borda
Copy link
Member

Borda commented Jul 23, 2024

@ndrwrbgs mind checking the failing tests?

@ndrwrbgs
Copy link
Contributor Author

Of course! I’m grateful for the good test coverage and can see what’s happening here.

Downgrading to draft so it’s off the docket until it is ready, I overestimated the ease.

@ndrwrbgs ndrwrbgs marked this pull request as draft July 23, 2024 11:43
@Borda
Copy link
Member

Borda commented Jul 23, 2024

Downgrading to draft so it’s off the docket until it is ready, I overestimated the ease.

as a draft you will not run all tests so as it is at its advanced stage, let's keep it as full PR just add wip in the title

@Borda Borda marked this pull request as ready for review July 23, 2024 12:04
@Borda Borda changed the title Update _safe_divide to allow Accuracy to run on the GPU Update _safe_divide to allow Accuracy to run on the GPU [wip] Jul 23, 2024
@ndrwrbgs ndrwrbgs changed the title Update _safe_divide to allow Accuracy to run on the GPU [wip] [wip] Update _safe_divide to allow Accuracy to run on the GPU [wip] Jul 23, 2024
@SkafteNicki
Copy link
Member

@ndrwrbgs the error only happens for older versions of PyTorch. The problem is that torch.where:

torch.where(condition, input, other, *, out=None)

expects other and input to have same dtype in older versions of Pytorch whereas in newer versions of Pytorch it will automatically upcast to the highest precision. So we most likely need to explicit cast zero_division to the same dtype as num/denom

@SkafteNicki SkafteNicki added the bug / fix Something isn't working label Jul 25, 2024
@SkafteNicki SkafteNicki added this to the v1.4.x milestone Jul 25, 2024
@ndrwrbgs
Copy link
Contributor Author

ndrwrbgs commented Jul 25, 2024 via email

@ndrwrbgs
Copy link
Contributor Author

ndrwrbgs commented Jul 28, 2024

Thank you for your patience!


I noticed errors like below in this Azure job:

error:  cannot create tests/_cache-references/tests/_cache-references/.unittests.retrieval.helpers._compute_sklearn_metric_276bbfaa62e3d293c69b71ebae3d3e4a266434eef12c60faa590f4111afbb6f7
        No space left on device

Here's an example both from my build and another before to confirm it's not introduced with this PR:

🤔 I'm not sure whether it has any observable impact or not.

cc @Borda

@ndrwrbgs ndrwrbgs changed the title [wip] Update _safe_divide to allow Accuracy to run on the GPU [wip] Update _safe_divide to allow Accuracy to run on the GPU Jul 28, 2024
@mergify mergify bot added the ready label Jul 28, 2024
@Borda Borda enabled auto-merge (squash) July 29, 2024 19:51
@ndrwrbgs
Copy link
Contributor Author

cc @SkafteNicki Another failure on the Windows 2.4.0 torch.Unique: https://github.com/Lightning-AI/torchmetrics/actions/runs/10150851419/job/28068885801?pr=2640

@Borda the other failures appear to be unrelated to the change, but I'm not seeing any rerun button available on my end. Is that something you're able to to with write permissions?

@Borda Borda merged commit 56d3495 into Lightning-AI:master Jul 30, 2024
65 checks passed
Borda pushed a commit that referenced this pull request Aug 2, 2024
---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
(cherry picked from commit 56d3495)
Borda pushed a commit that referenced this pull request Aug 2, 2024
---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
(cherry picked from commit 56d3495)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accuracy incurs a GPU/CPU sync point
3 participants