Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add identity hasher #514

Merged
merged 12 commits into from
Jun 28, 2024
Merged

Add identity hasher #514

merged 12 commits into from
Jun 28, 2024

Conversation

njones93531
Copy link
Contributor

@njones93531 njones93531 commented Jun 24, 2024

Contribute to #487
Partially fixes #505

  • Added cuco/detail/hash_functions/identityhash.cuh
    • Identity hasher uses thrust::identity to apply the identity function to its key
  • Added IdentityHash to hash_functions.cuh
  • Added unit tests to validate that the identity hash function works properly. Tests passed successfully.

Copy link

copy-pr-bot bot commented Jun 24, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@njones93531 njones93531 marked this pull request as ready for review June 24, 2024 20:39
@PointKernel PointKernel added the type: feature request New feature request label Jun 24, 2024
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
tests/utility/hash_test.cu Outdated Show resolved Hide resolved
include/cuco/hash_functions.cuh Outdated Show resolved Hide resolved
tests/utility/hash_test.cu Show resolved Hide resolved
@PointKernel PointKernel added the Needs Review Awaiting reviews before merging label Jun 24, 2024
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
tests/utility/hash_test.cu Show resolved Hide resolved
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final suggestions otherwise look good to me.

tests/utility/hash_test.cu Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identityhash.cuh Outdated Show resolved Hide resolved
@njones93531 njones93531 requested a review from PointKernel June 25, 2024 17:58
@PointKernel
Copy link
Member

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🔥

Thanks! @njones93531

@PointKernel PointKernel changed the title [REVIEW] Identity hasher Add identity hasher Jun 25, 2024
@PointKernel
Copy link
Member

@njones93531 Just FYI, I've updated the PR title since it will become the commit message with squash merge.

Using the github keyword resolves will close the #487 issue when merging this PR thus I've changed it as well.

Copy link

@esoha-nvidia esoha-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this PR! I like that it is small and easy to reason about!

include/cuco/detail/hash_functions/identity_hash.cuh Outdated Show resolved Hide resolved
tests/utility/hash_test.cu Show resolved Hide resolved
tests/utility/hash_test.cu Outdated Show resolved Hide resolved
tests/utility/hash_test.cu Show resolved Hide resolved
@sleeepyjack
Copy link
Collaborator

The tag type approach has been abandoned some time ago in many libraries and replaced with specialized type traits, e.g., specializing std::iterator_traits<my_iterator_type>.
Based on offline discussions (involving all of us IIRC) we settled on a different design to make using the perfect_hashing probing scheme safe:

  • Make the ctor of the perfect_hashing class private and only allow it to be constructed via a make_perfect_hashing factory function
  • The factory returns a cuda::std::optional<perfect_hashing<my_hasher>> object which will only hold an actual probing scheme object in case the factory could verify that it's actually perfect and thus safe to use

In case construction fails, the user has to provide an alternative code path, i.e., a different non-perfect probing scheme.

One thing is unclear to me: @esoha-nvidia you mentioned we want the is_perfect check to be available at compile time but in my understanding there can be scenarios where this information is only known after introspecting the data, i.e., at runtime.

@esoha-nvidia
Copy link

* Make the ctor of the `perfect_hashing` class `private` and only allow it to be constructed via a `make_perfect_hashing` factory function
* The factory returns a `cuda::std::optional<perfect_hashing<my_hasher>>` object which will only hold an actual probing scheme object in case the factory could verify that it's actually perfect and thus safe to use

You can do it this way if you want but it has the downside that you can no longer default construct the perfect_hashing. Though you weren't really able to in the first place.

If you make a private constructor that does all the work in the constructor then, as far as I can tell, your factory function is just:

std::optional<perfect_hashing> make_perfect_hashing(args) {
  try {
    return {perfect_hashing(args)};
  } catch {
    return std::nullopt;
  }
}

Right? It's kind of a trivial factory fuction, just converting an exception to an optional. Is it pointless?

Or, you could make a constructor that never throws and then put the failure logic into the factory. The problem with this is that now you have split-brained code where the creation of the perfect hash is happening partly in the factory and party in the constructor. If you get into a situation where you end up doing some math in the factory just to send it to the constructor and then having to examine that constructor to determine if a valid hash functor was created...

I guess that we'll see how it shakes out. Just be prepared for maybe having to put everything into the constructor because you want to avoid split-brained code. And then be prepare to ask yourself, "Why do we even have this factory?"

One thing is unclear to me: @esoha-nvidia you mentioned we want the is_perfect check to be available at compile time but in my understanding there can be scenarios where this information is only known after introspecting the data, i.e., at runtime.

Hmm, let me think about this... Yes, you're right. There will be some hashers for which is_perfect could be constexpr but some for which it cannot be. For example, if the SigBits hasher computes more significant bits than there are bits in the result type then it will end up creating a hasher that is not perfect (though it might still be a pretty good hasher!).

Okay, so is_perfect is not always constexpr and, likewise, the check for is_perfect cannot be if constexpr. (That's okay, we didn't really need the if constexpr because if constexpr doesn't actually make code run faster, it just makes code that previously did not compile now compile.)

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

Two final suggestions otherwise the PR is ready to go.

As we discussed offline, discussions and evaluation of the perfect hashing design will be continued in a private repo until it matures.

include/cuco/detail/hash_functions/identity_hash.cuh Outdated Show resolved Hide resolved
include/cuco/detail/hash_functions/identity_hash.cuh Outdated Show resolved Hide resolved
njones93531 and others added 2 commits June 28, 2024 10:20
Co-authored-by: Daniel Jünger <djuenger@nvidia.com>
Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>
@sleeepyjack
Copy link
Collaborator

/ok to test

@PointKernel
Copy link
Member

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docs accordingly to unblock CI and all looks good.

@sleeepyjack sleeepyjack merged commit 84cd528 into NVIDIA:dev Jun 28, 2024
15 checks passed
@njones93531 njones93531 deleted the IdentityHash branch July 1, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review Awaiting reviews before merging type: feature request New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants