-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multi-source reading in JSON byte range reader #15671
Fix multi-source reading in JSON byte range reader #15671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; few small comments
cpp/src/io/json/read_json.cu
Outdated
|
||
// If this is a multi-file source, we scatter the JSON line delimiters between files | ||
if (sources.size() > 1) { | ||
static_assert(num_delimiter_chars == 1, | ||
"Currently only single-character delimiters are supported"); | ||
auto const delimiter_source = thrust::make_constant_iterator('\n'); | ||
auto const d_delimiter_map = cudf::detail::make_device_uvector_async( | ||
host_span<size_type const>{delimiter_map.data(), delimiter_map.size() - 1}, | ||
host_span<size_type const>{delimiter_map.data(), delimiter_map.size()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed a vector is implicitly converted into host_span
.
host_span<size_type const>{delimiter_map.data(), delimiter_map.size()}, | |
delimiter_map, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't work well with templated calls.
Maybe the explicit construction here can be avoiding by specifying the type with make_device_uvector_async<size_type const>
. The combination of implicit conversion and templated functions is pretty annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought as well, but the implicit conversion works here 😲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, because it's calling a different overload of make_device_uvector_async:
rmm::device_uvector<typename Container::value_type> make_device_uvector_async(
Container const& c, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr)
/merge |
Description
This PR fixes the number of bytes read and corrects the offsets for the delimiters added to the buffer when reading across multiple sources.
Checklist