Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas21 compat #196

Merged
merged 14 commits into from
Aug 14, 2023
Merged

Pandas21 compat #196

merged 14 commits into from
Aug 14, 2023

Conversation

MichaelTiemannOSC
Copy link
Collaborator

@MichaelTiemannOSC MichaelTiemannOSC commented Aug 7, 2023

  • Closes # (insert issue number)
  • Executed pre-commit run --all-files with no errors
  • The change is fully covered by automated unit tests
  • Documented in docs/ as appropriate
  • Added an entry to the CHANGES file

CI/CD doesn't quite work yet because we don't have a Pandas 2.1 rc to point to. But comments are welcome!

Additional code synchronizations (and the addition of a dtype-preserving map method).  These changes were initially developed to support uncertainties, but the uncertainty changes have all been stripped out to simplify merging of underlying code.  Once these changes are fully synced with a release version of Pandas 2.1, we can look at adding back uncertainties.

These changes also tolerate complex128 as a base type for magnitudes, with one except (under discussion as pandas-dev/pandas#54445).

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
The PintArray map function should return a PintArray if the mapper returns PintQuantities, but otherwise it should just return whatever was the result of the mapper.

We follow the examples from pandas in the definition and use of _get_expected_exception in the test suite.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Collaborator Author

Pandas 2.1rc0 has been released: https://github.com/pandas-dev/pandas/releases/tag/v2.1.0rc0

Fix canonicalization of NaNs to use na_value (with units) rather than pure np.nan.  While np.nan is correctly handled as a value in PintArrays, and work as expected with addition (adding a quantity to a NaN produces a NaN, which remains value), it doesn't work for multiplication (where a quantity times a NaN produces not a NaN but new quantity of NaN magnitude the units of the non-NaN value).  Canonicalizing to na_value provides consistent unit handling with NaNs through all arithmetic operations.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Make black happy.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@andrewgsavage
Copy link
Collaborator

you can add the 2.1rc to https://github.com/hgrecco/pint-pandas/blob/bf84e37ed13d032a13edec948c1990909d920dec/.github/workflows/ci.yml#L11C10-L11C10 to get CI to test against it

@MichaelTiemannOSC MichaelTiemannOSC marked this pull request as draft August 12, 2023 20:07
Fix errors that crept into value_counts concerning na_type.

And canonicalize na_value in quantities a bit more generally.  As stated in a previous commit comment, if we leave naked NA/NaNs in our Quantity array, the code gracefully handles unit addition/subtraction, but unit multiplication/division loses.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Simplify type annotations for _get_expected_exception to make Python 3.9 happy.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Collaborator Author

MichaelTiemannOSC commented Aug 13, 2023

I see that while these changes run happily with later versions of Pint and Pandas, there are many errors in the CI/CD system for Pandas 2.0.2 and Pint 0.21.3. I'll look at those and see how much of a gap there is to bridge, or whether these changes really are only for 2.1 and beyond.

Delete the last vestiges of some old code from a different PR.

Also use Pandas public APIs for `infer_dtype` (used in `_from_factorized`).

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Remove unrelated comment about UFloats.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Collaborator Author

Looking at Pandas 2.0.3 failures:

  • TestIssue174 failures appear fixed by this Pandas commit: ENH: better dtype inference when doing DataFrame reductions (ENH: better dtype inference when doing DataFrame reductions pandas-dev/pandas#52788). I imported this test case into the PintPandas test suite to catch reduce failures early and specifically. I can disable the test when Pandas version < 2.1.
  • test_map is a new extension test added as part of this change set. It fails because earlier versions of Pandas simply raise NotImplementedError when na_action is not None. I'll add a pytest.skip rule to skip the ignore case when Pandas version < 2.1.
  • many test_arith_series_with_scalar tests failing, likely fixed by this Pandas commit: REF: dont pass exception to check_opname pandas-dev/pandas#54365. Since we control the code for calling check_opname we can add (back) the missing exc parameter that was removed when Pandas 2.1 refactored this interface.

I'll update this list with other problems I find, or rewrite text to describe how the fixes were implemented (as a start for the WhatsNew).

Restore some Pandas 2.0.x interfaces and use pandas_version_info from pint_array to condition logic.

Also change some more `self.assert_series_equal` to `tm.assert_series_equal` in the restored code to accommodate Pandas 2.1 BaseExtensionTests behavior.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC MichaelTiemannOSC marked this pull request as ready for review August 13, 2023 13:04
@andrewgsavage
Copy link
Collaborator

this is looking great, thanks for putting this together.

Just needs adding the 2.1rc to the CI and a changes note

Simplify PintArray.map to let `pandas.core.algorithms.map_array` do the mapping and we just clean up the PintArray details if necessary.

Pandas 2.1.0rc0 added to CI

Updated CHANGES

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Collaborator Author

Ugh..the map changes worked for Pandas 2.1, but need fixing for 2.0.2. I'm on it...

Implement `map` for PintArrays also only with Pandas 2.0.2 interfaces.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Until Pandas resolves pandas-dev/pandas#54445 we cannot feed complex128 types to the test_setitem_2d_values test case.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@andrewgsavage
Copy link
Collaborator

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 14, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit c43c18b into hgrecco:master Aug 14, 2023
28 checks passed
MichaelTiemannOSC added a commit to MichaelTiemannOSC/pint-pandas that referenced this pull request Aug 15, 2023
First draft of 100% working test cases after (re)merging with changes extracted from these changes as part of PR hgrecco#196.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants