Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow performance for confusion matrix based metrics #1302

Merged
merged 4 commits into from
Oct 31, 2022

Conversation

SkafteNicki
Copy link
Member

@SkafteNicki SkafteNicki commented Oct 31, 2022

What does this PR do?

Fixes #1267
Fixes #1277
After the refactor the underlying _bincount function that does a lot of the computations for confusion matrix based metrics was changed. I did some testing at the time and everything seemed to be fine. However, for large inputs the
implementation is really slow.

Here is a direct comparison in colab:
https://colab.research.google.com/drive/18tGZj_dPria6NSwVOIgwPXO8mJBr21kc?usp=sharing

The results:
image
The TLDR:

  • Regardless of batch size the new implementation is approx 2x slower on cpu
  • For batch size 100 the new implementation is approx ~4x FASTER on gpu
  • For batch size 10.000 the new implementation is approx ~30x SLOWER on gpu

The simple fix currently in this PR is to change it back to the old implementation. The alternative is that we have something like this in the code:

def _bincount(x, minlength=minlength):
    ...
    if len(x) > N or not x.is_cuda:
        return torch.bincount(x, minlength=minlenght)
    else:
        z = torch.zeros(minlength, device=x.device, dtype=x.dtype)
        return z.index_add_(0, x, torch.ones_like(x))

where we have to set N based on some experimentation.
@Borda, @justusschock opinions?

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@SkafteNicki SkafteNicki added the bug / fix Something isn't working label Oct 31, 2022
@SkafteNicki SkafteNicki added this to the v0.10 milestone Oct 31, 2022
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@Borda Borda enabled auto-merge (squash) October 31, 2022 12:13
@codecov
Copy link

codecov bot commented Oct 31, 2022

Codecov Report

Merging #1302 (fb3a387) into master (604ed80) will decrease coverage by 0%.
The diff coverage is 100%.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #1302   +/-   ##
======================================
- Coverage      87%     87%   -0%     
======================================
  Files         190     190           
  Lines       11121   11120    -1     
======================================
- Hits         9621    9620    -1     
  Misses       1500    1500           

Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SkafteNicki For now, I think sticking to the default here makes sense. I would be open for changing this conditionally in the future, but this would require more experimenting :)

@Borda
Copy link
Member

Borda commented Oct 31, 2022

I would be open for changing this conditionally in the future, but this would require more experimenting :)

do you want to open an issue for it so we won't forget? 🦦

@mergify mergify bot added ready and removed has conflicts labels Oct 31, 2022
@Borda Borda disabled auto-merge October 31, 2022 19:10
@Borda Borda merged commit 8cc0cd7 into master Oct 31, 2022
@Borda Borda deleted the bugfix/slow_bincount branch October 31, 2022 19:11
Borda pushed a commit that referenced this pull request Oct 31, 2022
* bincount fix
* chlog

(cherry picked from commit 8cc0cd7)
@Callidior
Copy link
Contributor

@SkafteNicki I stumbled upon this issue and did some investigation.

The new implementation used torch.Tensor.index_add_, which is known to be slow on integer tensors (see pytorch/pytorch#42109). That issue is open now for over 2 years, so a fix in the near future seems unlikely. However, the issue mentions using a float64 tensor as a work-around.

I ran a quick performance test with different tensor sizes.

Integer implementation:

z = torch.zeros(4, device=unique_mapping.device, dtype=unique_mapping.dtype)
return z.index_add_(0, unique_mapping, torch.ones_like(unique_mapping))

Float implementation:

z = torch.zeros(4, device=unique_mapping.device, dtype=torch.float64)
return z.index_add_(0, unique_mapping, torch.ones_like(unique_mapping, dtype=torch.float64))

Timing results on RTX 3090 (in milliseconds):

Number of Samples Integer index_add Float index_add torch.bincount
1,000 0.3 0.1 0.2
100,000 144.7 0.1 0.2
10,000,000 18,497.1 9.3 4.2

Timing results on CPU (in milliseconds):

Number of Samples Integer index_add Float index_add torch.bincount
1,000 0.20 0.03 0.01
100,000 0.48 0.39 0.19
10,000,000 18.0 16.3 8.4

Looks like torch.bincount is the best option, aside from other issues with it like #1413.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working ready
Projects
None yet
5 participants