Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for hash collisions #842

Merged
merged 4 commits into from
Aug 9, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 9, 2021

Which issue does this PR close?

Closes #841

Rationale for this change

  1. The hash implementation in ahash::RandomState is very good, and thus it is very hard to cause hash collisions in tests
  2. Since it is hard to cause hash collisions, it means we do not have test coverage of this case. This is a problem because checking for hash collisions is often required in the performance critical section of code and thus prone to being optimized out

What changes are included in this PR?

  1. Add a (off by default) feature flag that forces hash collisions
  2. Add a CI test that runs queries using those flags
  3. Updates tests to pass, and filed tickets for those that did not

Are there any user-facing changes?

No

@alamb alamb added the development-process Related to development process of DataFusion label Aug 9, 2021
@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Aug 9, 2021
Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool approach!
I am wondering if we can keep this in all test cases as for bigger joins / aggregates it could slow it down significantly.

Also we might want to add some explicit test for the problem (might be null-related? #843 (comment)) later.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really neat and long-lasting approach!

@Dandandan Dandandan merged commit 30cc674 into apache:master Aug 9, 2021
@alamb alamb deleted the alamb/test_hash_collisions branch August 8, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate development-process Related to development process of DataFusion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests for hash collisions
3 participants