Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ORC reader when using device_read_async while the destination device buffers are not ready #17074

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Oct 14, 2024

This fixes a bug in ORC reader when device_read_async is called while the destination device buffers are not ready to write in. In detail, this bug is because device_read_async does not use the user-provided stream but its own generated stream for data copying. As such, the copying ops could happen before the destination device buffers are being allocated, causing data corruption.

This bug only shows up in certain conditions, and also hard to reproduce. It occurs when copying buffers with small sizes (below gds_threshold) and most likely to show up with setting rmm_mode=async.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Oct 14, 2024
@ttnghia ttnghia self-assigned this Oct 14, 2024
@ttnghia ttnghia changed the title Fix ORC reader when using device_read_async when the destination device buffers are not ready Fix ORC reader when using device_read_async while the destination device buffers are not ready Oct 14, 2024
@ttnghia ttnghia marked this pull request as ready for review October 14, 2024 06:30
@ttnghia ttnghia requested a review from a team as a code owner October 14, 2024 06:30
// Instead, it may use some other stream(s) to sync the H->D memcpy.
// As such, we need to make sure the device buffers in `lvl_stripe_data` are ready first.
if (!stream_synchronized) {
_stream.synchronize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be just sync before the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that works, but would that be inefficient if we don't have any device read? We will sync after the loop if here is only host read.

@ttnghia
Copy link
Contributor Author

ttnghia commented Oct 14, 2024

/merge

@rapids-bot rapids-bot bot merged commit 768fbaa into rapidsai:branch-24.12 Oct 14, 2024
107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants