Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Remove/deprecate unused/misleading dtype functions #23917

Merged
merged 25 commits into from
Nov 27, 2018

Conversation

jbrockmendel
Copy link
Member

For "misleading" the leading case is is_int_or_datetime_dtype, which includes timedelta64 but excludes datetime64tz.

Several branches in lib.infer_dtype are unreachable, are removed.

is_datetimetz is redundant with is_datetime64tz_dtype, is deprecated, and internal uses of it are changed.

is_period is both redundant and misleading, is deprecated (no internal usages outside of tests)

maybe_convert_string_to_object and maybe_convert_scalar are never used outside of tests, removed

xref #22137

cc: @h-vetinari IIRC you've been looking at other functions in core.dtypes.cast with an eye towards cleanup or de-duplication. Suggestions welcome.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@gfyoung gfyoung added the Dtype Conversions Unexpected or buggy dtype conversions label Nov 26, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 26, 2018

@jbrockmendel @h-vetinari : Have a look at #16242 as well. This issue might be resolvable based on what you guys are doing here...

@gfyoung gfyoung added the Clean label Nov 26, 2018
assert is_datetimetz(s.dtype)
assert not is_datetimetz(np.dtype('float64'))
assert not is_datetimetz(1.0)
with tm.assert_produces_warning(FutureWarning):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each call is supposed to generate a warning, this will need more context managers.

@h-vetinari
Copy link
Contributor

h-vetinari commented Nov 26, 2018

Thanks for tagging me.

There's a couple of things I encountered in #23796, mainly:

  • astype_intsafe from _libs.lib is only used once in dtypes.cast.astype_nansafe on the object branch, where there's some confusing/contradictory/redundant branches (related to your integer/datetime/timedelta point)
    • the branch for if np.issubdtype(dtype.type, np.integer): catches all the timedeltas (but not the datetimes), both of which would have a dedicated branch further down. However, letting the timedeltas actually hit their dedicated branch means re-calling astype_nansafe after explicitly casting object dtype to timedelta, hitting a dedicated timedelta-dtype-branch, which casts to float. This then contradicts/fails the TestMerge::test_other_timedelta_unit test, because it expects timedelta output.
    • furthermore, that integer-condition is actually relevant for some IntegerArray cases, because there's no dedicated branch for extension array types (on the side of the caller) yet.
  • maybe_promote and maybe_upcast_putmask have some issues, BUG/Internals: maybe_upcast_putmask #23823 / BUG/Internals: maybe_promote #23833 (actually many more than listed in those issues). Both are also completely untested directly, AFAICT. I'm currently in the process of writing tests for the former, and then hoping to fix the behaviour.

doc/source/whatsnew/v0.24.0.rst Show resolved Hide resolved
@@ -1125,6 +1127,7 @@ Removal of prior version deprecations/changes
- :meth:`SparseSeries.to_dense` has dropped the ``sparse_only`` parameter (:issue:`14686`)
- :meth:`DataFrame.astype` and :meth:`Series.astype` have renamed the ``raise_on_error`` argument to ``errors`` (:issue:`14967`)
- ``is_sequence``, ``is_any_int_dtype``, and ``is_floating_dtype`` have been removed from ``pandas.api.types`` (:issue:`16163`, :issue:`16189`)
- ``is_floating_dtype`` has been removed (:issue:`????`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a dupe of the line above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing whatsnew makes it look like this has already been removed, but it is still present in master, and not listed (as deprecated or removed) in #6581.

The whatsnew also says is_sequence and is_any_int_dtype have been removed, but each of these are still used in a few places internally. Get rid of those usages? (any_int_dtype only has two usages so that one will be easy)

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/core/dtypes/common.py Outdated Show resolved Hide resolved
pandas/core/dtypes/common.py Show resolved Hide resolved
@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True):

lk = ensure_int64(lk.codes)
rk = ensure_int64(rk)
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk):
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not right, these should be in separate branches here, the datetime / timedeltas can be in a single elif clause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the behavior here is wrong then we should also add a test for it. I'll open a separate issue to address this, since I'm not sure off the top of my head what the correct behavior is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #23929

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right these actually this should be 2 clauses of

elif is_integer_dtype(....)
....
elif needs_i8_conversion(...)
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really like changing the way you are doing here.

i was objecting to the object conversion below, not really sure why that is done at all. if you can separate to 2 elif's would be better here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'd really rather these be separated in a dedicated PR, but I guess #23929 can be a Needs Test Issue... will change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's fine. I think there is another PR about adding EA types here as well (though of course orthogonal). see if this passes and can add tests / perf checking later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to revert use of needs_i8_conversion since it breaks on period_dtype

pandas/tests/dtypes/test_common.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 26, 2018

Codecov Report

Merging #23917 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23917      +/-   ##
==========================================
+ Coverage   92.31%   92.31%   +<.01%     
==========================================
  Files         161      161              
  Lines       51488    51471      -17     
==========================================
- Hits        47530    47515      -15     
+ Misses       3958     3956       -2
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.43% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.37% <ø> (ø) ⬆️
pandas/io/formats/format.py 97.76% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.47% <100%> (ø) ⬆️
pandas/core/dtypes/common.py 94.77% <100%> (+0.39%) ⬆️
pandas/core/frame.py 97.02% <100%> (ø) ⬆️
pandas/core/algorithms.py 95.11% <100%> (ø) ⬆️
pandas/core/dtypes/concat.py 96.66% <100%> (ø) ⬆️
pandas/core/nanops.py 95.05% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 93.71% <100%> (ø) ⬆️
pandas/core/reshape/merge.py 94.27% <100%> (+0.03%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d53a4cc...5bdc157. Read the comment docs.

@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True):

lk = ensure_int64(lk.codes)
rk = ensure_int64(rk)
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk):
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right these actually this should be 2 clauses of

elif is_integer_dtype(....)
....
elif needs_i8_conversion(...)
...

@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True):

lk = ensure_int64(lk.codes)
rk = ensure_int64(rk)
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk):
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really like changing the way you are doing here.

i was objecting to the object conversion below, not really sure why that is done at all. if you can separate to 2 elif's would be better here

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. merge on green.

@jreback jreback added this to the 0.24.0 milestone Nov 27, 2018
@jbrockmendel
Copy link
Member Author

rebased, also fixed a docstring mistake I made in #23937.

@jreback
Copy link
Contributor

jreback commented Nov 27, 2018

ok! ping on green.

@jbrockmendel
Copy link
Member Author

Ping

@jreback jreback merged commit ee283fa into pandas-dev:master Nov 27, 2018
@jreback
Copy link
Contributor

jreback commented Nov 27, 2018

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants