Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error when sorting by categorical column in dask-cudf #15788

Merged
merged 9 commits into from
May 20, 2024

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented May 20, 2024

Description

Some dask-cudf tests are currently producing a segfault when sorting by categorical columns. These tests were already marked as "xfail". This PR goes one step further, and raises an error in the top-level sort_values API. This NotImplementedError can be removed as soon as the problem is fixed up-stream (working on this now, but probably won't be available for 24.06).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora added bug Something isn't working 2 - In Progress Currently a work in progress non-breaking Non-breaking change labels May 20, 2024
@rjzamora rjzamora self-assigned this May 20, 2024
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 20, 2024
@rjzamora rjzamora marked this pull request as ready for review May 20, 2024 18:05
@rjzamora rjzamora requested a review from a team as a code owner May 20, 2024 18:05
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels May 20, 2024
@jameslamb
Copy link
Member

@rjzamora could you pull in latest branch-24.06? Now that #15782, if you do that you shouldn't see the libarrow issue any more.

rjzamora and others added 2 commits May 20, 2024 14:21
Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>
Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels May 20, 2024
@galipremsagar
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 4da00ea into rapidsai:branch-24.06 May 20, 2024
71 checks passed
@rjzamora rjzamora deleted the avoid-segfault branch May 20, 2024 23:59
rapids-bot bot pushed a commit that referenced this pull request May 22, 2024
Follow up to #15788

Adds a temporary workaround for sorting on categorical columns in 24.06: We convert only the partitioning column to pandas to calculate divisions.

This is related to #11795, but I don't want to "close" that issue until `RepartitionQuantiles` works with cudf-backed data.

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

URL: #15801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants