Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-source reading in JSON byte range reader #15671

Merged
merged 11 commits into from
May 10, 2024

Conversation

shrshi
Copy link
Contributor

@shrshi shrshi commented May 6, 2024

Description

This PR fixes the number of bytes read and corrects the offsets for the delimiters added to the buffer when reading across multiple sources.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 6, 2024
@shrshi shrshi added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels May 6, 2024
@shrshi shrshi marked this pull request as ready for review May 6, 2024 21:26
@shrshi shrshi requested a review from a team as a code owner May 6, 2024 21:26
@shrshi shrshi requested review from bdice, ttnghia and vuule May 6, 2024 21:26
cpp/tests/io/json_chunked_reader.cpp Outdated Show resolved Hide resolved
cpp/src/io/json/read_json.cu Outdated Show resolved Hide resolved
@vuule vuule self-requested a review May 7, 2024 22:55
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; few small comments

cpp/src/io/json/read_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/read_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/read_json.cu Show resolved Hide resolved
@shrshi shrshi requested a review from vuule May 10, 2024 00:47
@shrshi shrshi requested a review from ttnghia May 10, 2024 18:48
cpp/src/io/json/read_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/read_json.cu Outdated Show resolved Hide resolved

// If this is a multi-file source, we scatter the JSON line delimiters between files
if (sources.size() > 1) {
static_assert(num_delimiter_chars == 1,
"Currently only single-character delimiters are supported");
auto const delimiter_source = thrust::make_constant_iterator('\n');
auto const d_delimiter_map = cudf::detail::make_device_uvector_async(
host_span<size_type const>{delimiter_map.data(), delimiter_map.size() - 1},
host_span<size_type const>{delimiter_map.data(), delimiter_map.size()},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed a vector is implicitly converted into host_span.

Suggested change
host_span<size_type const>{delimiter_map.data(), delimiter_map.size()},
delimiter_map,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't work well with templated calls.
Maybe the explicit construction here can be avoiding by specifying the type with make_device_uvector_async<size_type const>. The combination of implicit conversion and templated functions is pretty annoying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought as well, but the implicit conversion works here 😲

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, because it's calling a different overload of make_device_uvector_async:

rmm::device_uvector<typename Container::value_type> make_device_uvector_async(
  Container const& c, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr)

@vuule vuule requested a review from ttnghia May 10, 2024 22:05
@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 10, 2024
@shrshi
Copy link
Contributor Author

shrshi commented May 10, 2024

/merge

@rapids-bot rapids-bot bot merged commit b5a9c4b into rapidsai:branch-24.06 May 10, 2024
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants