Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Purge non-empty nulls when setting validity #13335

Merged
merged 6 commits into from
May 16, 2023

Conversation

razajafri
Copy link
Contributor

Description

When setting the validity buffer in mergeAndSetValidity method, we check to see two things

  1. Purge any non-empty nulls that may have arise due to setting values to null which were previously non-null
  2. If we override the existing null values in the validity buffer then throw an IllegalStateException.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@razajafri razajafri requested a review from a team as a code owner May 11, 2023 00:17
@github-actions github-actions bot added the Java Affects Java cuDF API. label May 11, 2023
@razajafri
Copy link
Contributor Author

Tests pass locally when I run ci/test_java.sh in rapidsai/ci:cuda11.8.0-ubuntu22.04-py3.10

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running ai.rapids.cudf.ColumnViewNonEmptyNullsTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.066 s - in ai.rapids.cudf.ColumnViewNonEmptyNullsTest
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:17 min
[INFO] Finished at: 2023-05-11T05:38:07Z
[INFO] ------------------------------------------------------------------------

@razajafri
Copy link
Contributor Author

I am running benchmarks on a different version of this implementation based on suggestions from here. Will post an update soon

@razajafri
Copy link
Contributor Author

I have run some numbers on my local workstation using different implementations. The script just reads a parquet file with a list[int] column and casts it to a string. The casting in turn calls the mergeAndSetValidity method

Baseline Ifelse Binop CPP Modification Input Size
5,975 ms 7,025 ms 6,013 ms 5,936 ms 5.8 MiB
26,365 ms 35,874 ms 26,335 ms 26,284 ms 52.43 MiB

It looks like the modifications in this PR, listed as cpp modification above, has no performance degradation.

@razajafri razajafri added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels May 16, 2023
@razajafri
Copy link
Contributor Author

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants