Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move hash_array into hash_utils.rs #807

Merged
merged 1 commit into from
Aug 1, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 1, 2021

Which issue does this PR close?

Re #790 by implementing a new design for group by hash

Rationale for this change

  1. The design in Rework GroupByHash for faster performance and support grouping by nulls #790 calls for hashing the input to group by hash in a manner almost-identical with create_hashes in hash_join, which I hope to reuse
  2. It turns out the hashing algorithm is also used by repartition exec as well
  3. I want to add additional coverage to the hash code (to ensure it is in sync with ScalarValue)

What changes are included in this PR?

Move create_hashes into hash_utils so it can be shared with hash_join and hash_aggregate.

Are there any user-facing changes?

No. This PR just moves code, No functional changes intended.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Aug 1, 2021
@alamb alamb requested a review from Dandandan August 1, 2021 11:23
Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 1173479 into apache:master Aug 1, 2021
@alamb alamb deleted the alamb/hash_array_refactor branch August 8, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants