Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize out bounds checking for joins when the gather map has only valid entries #3799

Merged
merged 1 commit into from
Oct 13, 2021

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Oct 12, 2021

Closes #3798

This PR implements an optimization to not check for bounds when gathering rows from a gather map during joins, for certain types of joins as described in the linked issue.

Overall I see q72 improve around 40 seconds or 12%-19% from the run times we see in spark2a. The overall time for all queries summed is ~ 1 minute less or around 5% improvement overall. There is some noise in the numbers but nothing below 80% on single-digit second queries, so I am not seeing a regression and seems like normal system noise, but we need to quantify this part better.

@abellina abellina changed the title Bounds checker change does not build Optimize out bounds checking for joins when the gather map has only valid entries Oct 12, 2021
valid entries

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina abellina force-pushed the perf/bounds_checker branch from f483226 to 44f4b09 Compare October 12, 2021 14:52
@abellina abellina requested a review from revans2 October 12, 2021 14:53
@abellina
Copy link
Collaborator Author

build

@abellina abellina added the performance A performance related task/issue label Oct 12, 2021
@abellina abellina self-assigned this Oct 12, 2021
@abellina abellina added this to the Oct 4 - Oct 15 milestone Oct 12, 2021
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@abellina abellina merged commit 7f0be50 into NVIDIA:branch-21.12 Oct 13, 2021
@abellina abellina deleted the perf/bounds_checker branch October 13, 2021 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] bounds checking in joins can be expensive
2 participants