Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pandas-2.2 in cudf #15100

Merged
merged 35 commits into from
Feb 26, 2024

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Feb 21, 2024

Description

This PR:

  • Enables pandas-2.2 in cudf by upgrading the upper bound pinnings.
  • Cleans up a lot of dead-code.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function breaking Breaking change labels Feb 21, 2024
@galipremsagar galipremsagar self-assigned this Feb 21, 2024
@galipremsagar galipremsagar requested review from a team as code owners February 21, 2024 01:17
@github-actions github-actions bot added Python Affects Python cuDF API. conda labels Feb 21, 2024
return pd.Series(
host_values,
self.to_arrow(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in noting in the future we should pass pandas a numpy array instead of a pyarrow array and pandas.Series treats pyarrow arrays as objects

Copy link
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If/when we decide to test against the minimum supported pandas version (pandas 2.0) would we have to add any of these checks back in?

@galipremsagar
Copy link
Contributor Author

galipremsagar commented Feb 21, 2024

If/when we decide to test against the minimum supported pandas version (pandas 2.0) would we have to add any of these checks back in?

The current PANDAS_GE_200 code deletions will also hold if we want to test with pandas-2.0 (the lowest supported version), but I think it's highly unlikely we can keep up and maintain such a test suite in CI. Most of the checks being removed in this PR were gating us to distinguish test code from pre-2.0 vs post 2.0 behavior.

@@ -65,7 +65,7 @@ dependencies:
- nvcomp==3.0.5
- nvtx>=0.2.1
- packaging
- pandas>=2.0,<2.1.5dev0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right pinning, or should we go with <2.3.0dev0? Do we expect patch releases to break us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel various fixes are going into 2.2.1 that may have an impact on us, but would defer to @mroeschke on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I fixed various issues @galipremsagar found in pandas 2.2.1 (coming out next week) which there are xfails for so the update will break the test suite. We should probably relax the pin in a follow up PR when that's out

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a question about pinnings that we should resolve before merging.

@github-actions github-actions bot removed the ci label Feb 26, 2024
@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 4d26596 into rapidsai:branch-24.04 Feb 26, 2024
66 checks passed
@galipremsagar
Copy link
Contributor Author

@shwina @raydouglass @mroeschke In the interest of time, I upgraded the pinnings to include the latest patch release of pandas-2.2.1 too. If you have any concerns on these changes, let me know I'll address it in a follow-up pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change dask Dask issue improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] DataFrame.stack with MultiIndex columns does not preserve order
7 participants