Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove UNKNOWN_NULL_COUNT #13372

Merged
merged 12 commits into from
May 24, 2023

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented May 17, 2023

Description

This is the final PR for removing UNKNOWN_NULL_COUNT and the implicit kernel launch in the null_count methods of column and column_view.

Depends on #13355 and #13341.

Closes #11968

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function breaking Breaking change labels May 17, 2023
@vyasr vyasr self-assigned this May 17, 2023
@github-actions github-actions bot added Java Affects Java cuDF API. Python Affects Python cuDF API. labels May 17, 2023
mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 17, 2023
This is in prep for rapidsai/cudf#11968 and
rapidsai/cudf#13372.

`libcudf` will soon require that all CUDF columns are created with a
known null-count. `UNKNOWN_NULL_COUNT` will no longer be supported,
or even available as a code constant.

This change replicates part of rapidsai/cudf#13355,
as it applies to `row_conversion.cu`. The (single) reference to
the unknown-null-count is replaced with a pre-calculated value.

Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks added a commit to NVIDIA/spark-rapids-jni that referenced this pull request May 18, 2023
This is in prep for rapidsai/cudf#11968 and
rapidsai/cudf#13372.

`libcudf` will soon require that all CUDF columns are created with a
known null-count. `UNKNOWN_NULL_COUNT` will no longer be supported,
or even available as a code constant.

This change replicates part of rapidsai/cudf#13355,
as it applies to `row_conversion.cu`. The (single) reference to
the unknown-null-count is replaced with a pre-calculated value.

Signed-off-by: MithunR <mythrocks@gmail.com>
@vyasr vyasr force-pushed the feat/remove_null_count_kernel branch from a2d444e to 85c146a Compare May 18, 2023 21:32
@vyasr vyasr force-pushed the feat/remove_null_count_kernel branch from 85c146a to 4fde571 Compare May 19, 2023 00:18
@vyasr vyasr marked this pull request as ready for review May 19, 2023 00:19
@vyasr vyasr requested a review from a team as a code owner May 19, 2023 00:19
@vyasr
Copy link
Contributor Author

vyasr commented May 19, 2023

There are a couple of Java failures left that I can reproduce locally. Will attempt to debug them myself, but may need to pull in some advice.

@vyasr vyasr requested a review from a team as a code owner May 19, 2023 18:46
@vyasr vyasr requested a review from davidwendt May 19, 2023 18:53
@vyasr vyasr added 5 - DO NOT MERGE Hold off on merging; see PR for details 3 - Ready for Review Ready for review by team and removed 3 - Ready for Review Ready for review by team labels May 19, 2023
@vyasr
Copy link
Contributor Author

vyasr commented May 19, 2023

I'm blocking merging here until we've had a chance to check on the behavior of the Spark plugin with this change, but it is otherwise ready for review.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on the JNI changes.

Note that spark-rapids does not depend on this code. row_conversion.cu will be removed as part of #13374. The splash damage here should be limited.

cpp/src/column/column.cu Outdated Show resolved Hide resolved
mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 19, 2023
This is a followup to NVIDIA#1148.

`row_conversion.cu` was modified in rapidsai/cudf#13372
to explicitly calculate null-counts for output columns.

This commit replicates the changes in cudf/pull/13372, and adds explicit
null-count calculation for the string offsets column.

Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 22, 2023
This is a followup to NVIDIA#1148.

`row_conversion.cu` was modified in rapidsai/cudf#13372
to explicitly calculate null-counts for output columns.

This commit replicates the changes in cudf/pull/13372, and adds explicit
null-count calculation for the string offsets column.

Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks added a commit to NVIDIA/spark-rapids-jni that referenced this pull request May 23, 2023
* Followup for null count fixup in row_conversion.cu.

This is a followup to #1148.

`row_conversion.cu` was modified in rapidsai/cudf#13372
to explicitly calculate null-counts for output columns.

This commit replicates the changes in cudf/pull/13372, and adds explicit
null-count calculation for the string offsets column.

Signed-off-by: MithunR <mythrocks@gmail.com>
@vyasr vyasr added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 5 - DO NOT MERGE Hold off on merging; see PR for details labels May 23, 2023
@vyasr
Copy link
Contributor Author

vyasr commented May 23, 2023

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Require null counts on column construction
5 participants