-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Remove/deprecate unused/misleading dtype functions #23917
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
@jbrockmendel @h-vetinari : Have a look at #16242 as well. This issue might be resolvable based on what you guys are doing here... |
assert is_datetimetz(s.dtype) | ||
assert not is_datetimetz(np.dtype('float64')) | ||
assert not is_datetimetz(1.0) | ||
with tm.assert_produces_warning(FutureWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If each call is supposed to generate a warning, this will need more context managers.
Thanks for tagging me. There's a couple of things I encountered in #23796, mainly:
|
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -1125,6 +1127,7 @@ Removal of prior version deprecations/changes | |||
- :meth:`SparseSeries.to_dense` has dropped the ``sparse_only`` parameter (:issue:`14686`) | |||
- :meth:`DataFrame.astype` and :meth:`Series.astype` have renamed the ``raise_on_error`` argument to ``errors`` (:issue:`14967`) | |||
- ``is_sequence``, ``is_any_int_dtype``, and ``is_floating_dtype`` have been removed from ``pandas.api.types`` (:issue:`16163`, :issue:`16189`) | |||
- ``is_floating_dtype`` has been removed (:issue:`????`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a dupe of the line above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing whatsnew makes it look like this has already been removed, but it is still present in master, and not listed (as deprecated or removed) in #6581.
The whatsnew also says is_sequence and is_any_int_dtype have been removed, but each of these are still used in a few places internally. Get rid of those usages? (any_int_dtype only has two usages so that one will be easy)
pandas/core/reshape/merge.py
Outdated
@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True): | |||
|
|||
lk = ensure_int64(lk.codes) | |||
rk = ensure_int64(rk) | |||
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk): | |||
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not right, these should be in separate branches here, the datetime / timedeltas can be in a single elif clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the behavior here is wrong then we should also add a test for it. I'll open a separate issue to address this, since I'm not sure off the top of my head what the correct behavior is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #23929
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right these actually this should be 2 clauses of
elif is_integer_dtype(....)
....
elif needs_i8_conversion(...)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't really like changing the way you are doing here.
i was objecting to the object conversion below, not really sure why that is done at all. if you can separate to 2 elif's would be better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'd really rather these be separated in a dedicated PR, but I guess #23929 can be a Needs Test Issue... will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's fine. I think there is another PR about adding EA types here as well (though of course orthogonal). see if this passes and can add tests / perf checking later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to revert use of needs_i8_conversion since it breaks on period_dtype
Codecov Report
@@ Coverage Diff @@
## master #23917 +/- ##
==========================================
+ Coverage 92.31% 92.31% +<.01%
==========================================
Files 161 161
Lines 51488 51471 -17
==========================================
- Hits 47530 47515 -15
+ Misses 3958 3956 -2
Continue to review full report at Codecov.
|
pandas/core/reshape/merge.py
Outdated
@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True): | |||
|
|||
lk = ensure_int64(lk.codes) | |||
rk = ensure_int64(rk) | |||
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk): | |||
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right these actually this should be 2 clauses of
elif is_integer_dtype(....)
....
elif needs_i8_conversion(...)
...
pandas/core/reshape/merge.py
Outdated
@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True): | |||
|
|||
lk = ensure_int64(lk.codes) | |||
rk = ensure_int64(rk) | |||
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk): | |||
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't really like changing the way you are doing here.
i was objecting to the object conversion below, not really sure why that is done at all. if you can separate to 2 elif's would be better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. merge on green.
rebased, also fixed a docstring mistake I made in #23937. |
ok! ping on green. |
Ping |
thanks! |
For "misleading" the leading case is
is_int_or_datetime_dtype
, which includes timedelta64 but excludes datetime64tz.Several branches in
lib.infer_dtype
are unreachable, are removed.is_datetimetz
is redundant withis_datetime64tz_dtype
, is deprecated, and internal uses of it are changed.is_period
is both redundant and misleading, is deprecated (no internal usages outside of tests)maybe_convert_string_to_object
andmaybe_convert_scalar
are never used outside of tests, removedxref #22137
cc: @h-vetinari IIRC you've been looking at other functions in core.dtypes.cast with an eye towards cleanup or de-duplication. Suggestions welcome.