-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable many complex number tests #54761
Enable many complex number tests #54761
Conversation
These changes put complex128 on an even footing with float64 and int64 as far as numerical testing is concerned. These changes have been tested both against the Pandas test suite as well as the Pint-Pandas testsuite (using complex magnitudes). These changes are a simpler version of a previous pull-request that was destroyed by GitHub's fork synchronize behavior. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Add description of this PR to what's new file. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
False alarm: these simplified changes DO work with uncertainties in Pint...when used with the version of Pint that also supports uncertainties. 😊 |
Tests with uncertainties all pass (with similar |
Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
xfail complex tests, but otherwise defer to parent object to implement test case. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@mroeschke can you have a look at this and let me know if any additional changes are needed? |
pandas/core/dtypes/astype.py
Outdated
@@ -100,6 +100,11 @@ def _astype_nansafe( | |||
elif np.issubdtype(arr.dtype, np.floating) and dtype.kind in "iu": | |||
return _astype_float_to_int_nansafe(arr, dtype, copy) | |||
|
|||
elif np.issubdtype(arr.dtype, np.complexfloating) and is_object_dtype(dtype): | |||
if np.isnan(arr).any(): | |||
res = np.asarray([np.nan if np.isnan(x) else x for x in arr], dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just do res[np.isnan(arr)] = np.nan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res
has to be set to something first (it's the "result"). But I can look at using copy
to decide if we really need something fresh before we do something that may have a side-effect.
Attempt to resolve all comments from @mroeschke Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Replace `request` parameter with `*args, **kwargs` in many places. This allows us to avoid needlessly passing request parameters just to satisfy method signatures. Also remove whatsnew entry as this enhancement to test cases is not really user-visible. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
I strongly suspect the mypy failure was due to unrelated issues in CI/CD. In the intervening weeks, things have moved forward. Is it preferred practice to update via merge or rebase to catch up to latest after sitting idle for 3 weeks? |
Merging in main is sufficient. History is squashed during merging into main |
The _duplicate functions expect complex128 data in ndarrays, not EAs. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
The command `pre-comment run -a` doesn't do all the things that Pandas CI/CD does. This commit fixes two problems found in the mypy section: * Adding some comments to ignore some errors in _ensure_data for complex dtypes * Fix the type signature of _get_expected_exception to match LSP stylings Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Fix indentation errors. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Don't need `*args, **kwargs` changes anymore due to refactoring upstream. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Re-harmonize with upstream. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
It turns out that `data` in `test_sparse.py` never contains complex numbers, and furthermore, that it's quite a lot of extra work to make the sparse tests complex-friendly. So we leave complex number testing out of test_sparse. Contributions welcome, as usual. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Arrow doesn't support complex numbers, so no need to special case tests as if it does. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Restore function accidentally deleted. Intention was to only delete unneeded complex code handling, not the whole function. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Delete `request` parameter no longer needed for `test_fillna_no_op_returns_copy`. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Thanks to some upstream changes, the changes required to support many complex test cases is greatly simplified. Please let me know your feedback on these changes. |
In certain cases where Python throws a TypeError for complex values where it would throw ValueError for real values, transform exception to ValueError. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Can somebody remove the stale tag from this PR? |
I have updated this PR. |
Make ruff happy by raising from `exc`. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Note that the only failure now is Numpy Dev (and most likely "not my fault"). |
pandas/core/arrays/_mixins.py
Outdated
# Don't let Python's handling of `complex` make extra complexity for Pandas | ||
if self._ndarray.dtype.kind == "c": | ||
raise ValueError( | ||
*(x.replace("real", "complex") for x in exc.args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we tinkering with the exception message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is actually a NumPy bug. When value
is an object, NumPy appears to try constructing a complex number by passing the object in as the real component, but that, too fails with the message TypeError: must be a real number, not object
. But that's a wrong message, because what we really want is to report the failure to construct a complex number. Here's a test case of sorts:
import numpy as np
xx = object()
np.asarray(xx).astype('float64')
# *** TypeError: float() argument must be a string or a real number, not 'object'
np.asarray(xx).astype('complex128')
# *** TypeError: must be real number, not object
That's what I'm trying to fix. So I could file an issue against NumPy, remove the above special case, and Pandas will fix itself when NumPy fixes itself. Is that the best approach?
pandas/core/algorithms.py
Outdated
# error: Item "ExtensionDtype" of "Union[Any, ExtensionDtype]" | ||
# has no attribute "itemsize" | ||
if values.dtype.itemsize == 16: # type: ignore[union-attr] | ||
# We have support for complex128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not complex64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to add complex64 to the test suite, but thought it better to do one case (complex128) at a time. I didn't want to add something (values.dtype.itemsize == 8
) that wasn't tested by the test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a deal-breaker, but id prefer to do them both in this pass. or at least leave a comment explaining why one is excluded
Updated code as per code review suggestions. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
`test_fillna_no_op_returns_copy` now works for all cases in Pandas 3.0.0 without special casing. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
The new 3.0.0 build rules make these changes happy. Previous build rules across a variety of merge points in the 2.2.x created problems with inconsistent parameters unrelated to these changes. Perhaps this is now ready to merge? |
# Note: when `self._ndarray.dtype.kind == "c"`, Numpy incorrectly complains | ||
# that `must be real number, not ...` when in reality | ||
# a complex argument is more likely what's expected | ||
raise ValueError(exc.args) from exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment here is helpful, thanks. it suggests to me that we may want to only catch-and-re-raise in a subset of cases? otherwise we'll be re-raising as ValueError more often than we really want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously @mroeschke questioned when ever this would be a TypeError in the first place...the above assignment was expected only ever raise a ValueError. So, instead of trying to also handle TypeError in higher-level code, we translate this "impossible" case into the canonical form that Pandas expects. The comment helps the user understand a completely unhelpful error message that comes from Python. Previously I attempted to edit the error message to something more reasonable, but that was challenged. At the end of the day the question is: how much steering do we want to do for this case vs. just letting the exception raise in the expected way and let users decipher what was wrong with their code in the first place.
Allow all defined numpy complex dtypes to work as well as complex128. Note that by default the test suite only tests the complex128 case, but users can add or alter that default by modifying the test suite source code to test other cases. Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Looks like this PR as gone stale so closing to clear the queue, if interested in contributing please merge in main and we can reopen |
Twice I've tried to get this merged, and twice it's gone stale. I thought it was a good idea, and I'm willing to try again, but only if there's enough interest on the Pandas side to make complex128 a first-class test case. |
These changes put complex128 on an even footing with float64 and int64 as far as numerical testing is concerned. These changes have been tested both against the Pandas test suite as well as the Pint-Pandas testsuite (using complex magnitudes).
These changes are a simpler version of a previous pull-request that was destroyed by GitHub's fork synchronize behavior.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.