Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GpuShuffleCoalesceIterator acquire semaphore after host concat #4396

Merged
merged 3 commits into from
Dec 21, 2021

Conversation

abellina
Copy link
Collaborator

Closes #4395

This is a small optimization that was spotted while looking into NDS Q64 traces. With the change, Q64 can save up to 3 seconds (though it changes quite a bit from run to run). When executing this over all of NDS, it saved ~1 minute for the whole run chipping away at times a few hundred ms, up to 5 seconds for q94.

I saw some queries being slower, with the worst case being q42 (which for 1 sample out of 10 was 2x slower). I have not been able to reproduce this case, with all subsequent runs at 1x or above. This was a 3.8 second in the last weekly run, with the patch it's hovering between 3.6 and 4.5.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina abellina added the performance A performance related task/issue label Dec 20, 2021
@jlowe
Copy link
Member

jlowe commented Dec 20, 2021

build

@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 27cc725 into NVIDIA:branch-22.02 Dec 21, 2021
@abellina abellina deleted the perf/join_sem_tweaks branch December 21, 2021 18:48
abellina added a commit to abellina/spark-rapids that referenced this pull request Dec 30, 2021
…A#4396)

* GpuShuffleCoalesceIterator acquire semaphore after host concat

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

* Add semaphore acquire for batches without columns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator
2 participants