-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Generalize groupby to better support ExtensionArray #53904
Comments
Adding @jbrockmendel, who seems to know a lot about the Pandas EA world. |
If this can help, I'm wondering if I would have to go back to the code I wrote 14 years ago, so I cannot answer immediately. I just had a superficial look and I'm wondering how much |
@MichaelTiemannOSC can you give an example with traceback of what goes wrong? |
Here's a traceback of the first failure:
|
Thanks. Well the good news is that the line |
Fixes include: * Factorization * NaN handling (several more issues still need to be resolved) * Proper unit declarations in test_offset_concat * Integration of new `numeric_dtype` parameter A major outstanding issue (presently being discussed as pandas-dev/pandas#53904) concerns whether we can make AffineScalarFunc hashable and/or whether other legacy Pandas code (which has been deprecated) can be further removed. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I have written changes to add
uncertainties
to Pint (hgrecco/pint#1615) and Pint-Pandas (hgrecco/pint-pandas#140). New developments in Pint and Pint-Pandas now deeply embrace the ExtensionArray API (which I also encouraged), but it's now causing my changes grief.The uncertainties package uses wrapping functions to interoperate with floats and NumPy (https://pythonhosted.org/uncertainties/index.html). The uncertainty datatype is
<class 'uncertainties.core.AffineScalarFunc'>
, which is not hashable. I have largely been able to work around this within the EA framework, but I'm stuck on how to make them work withgroupby
and related. I wonder whether the groupby functionality can be generalized to work better with unhashable EA types.Feature Description
Here's an example of a small change that allows my EA type to interoperate with
groupby
. Specifically, it does not force the assumption that a NaN value is np.nan, but is whatever valueisna
says is a NaN value. In the case of uncertainties, it's typicallyufloat(np.nan, 0)
, but it could be aUFloat
with either a np.nan nominal value or np.nan error value, or both.But here's the really sticky problem:
In the PintArray world (the ExtensionArray implemented in PintPandas) I've been able to make
factorize
functionality work independently of any Pandas changes, but the factorized results don't survive subsequent groupby actions (that come from splitting). And that's where I'm stuck.@andrewgsavage @rhshadrach @lebigot @hgrecco
Alternative Solutions
If the Pandas test framework could xfail unhashable EA types for groupby tests, that might be a workaround acceptable workaround (need to check with Pint and Pint-Pandas maintainers).
Additional Context
No response
The text was updated successfully, but these errors were encountered: