-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: fix corner case of lib.infer_dtype #23422
Conversation
Hello @h-vetinari! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23422 +/- ##
=======================================
Coverage 92.23% 92.23%
=======================================
Files 161 161
Lines 51197 51197
=======================================
Hits 47220 47220
Misses 3977 3977
Continue to review full report at Codecov.
|
Annoyingly, I still think it's a good idea, but fixing that - depending on the use case - would probably need another keyword that determines whether to prioritize In any case, what remains of this PR is fixing a corner case that is more clearly wrong (returning |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -244,6 +244,8 @@ Backwards incompatible API changes | |||
|
|||
- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`) | |||
- :meth:`Series.str.cat` will now raise if `others` is a `set` (:issue:`23009`) | |||
- The method `pandas._libs.lib.infer_dtype` now returns `'empty'` rather than (sometimes) the dtype of the array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bug fix API-facing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO no, but I was being careful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a public function, remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False): | |||
values = construct_1d_object_array_from_listlike(value) | |||
|
|||
values = getattr(values, 'values', values) | |||
if skipna: | |||
values = values[~isnaobj(values)] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can incorporate this skipna
logic into the for-loop below. Perhaps have an indicator to tell us whether we have seen an element in the values
array that is non-null (when skipna
is True
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that's not directly possible (nor performant), because the line directly below (with _try_infer_map
) will return prematurely as soon as it can grab hold of a dtype.
@gfyoung |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari you need to respond on the issue
this is as expected behavior
@jreback
I've reverted the fix for 1. (after the SQL errors I mentioned above), but kept 2. This is also not about the issue you linked #17261, but about an oversight of #17066. All-NA |
]) | ||
def test_object_empty(self, dtype, skipna, expected): | ||
# GH 23421 | ||
arr = pd.Series([np.nan, np.nan], dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test with None passed in (for object dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -244,6 +244,8 @@ Backwards incompatible API changes | |||
|
|||
- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`) | |||
- :meth:`Series.str.cat` will now raise if `others` is a `set` (:issue:`23009`) | |||
- The method `pandas._libs.lib.infer_dtype` now returns `'empty'` rather than (sometimes) the dtype of the array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a public function, remove
@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False): | |||
values = construct_1d_object_array_from_listlike(value) | |||
|
|||
values = getattr(values, 'values', values) | |||
if skipna: | |||
values = values[~isnaobj(values)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a python and not a cimport, why are you not using checknull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checknull only returns a single bint, and not an array. I would have liked to cimport
isnaobj, but that didn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this isn array, ok, then add isnaobj
to missing.pxd
and make it a cpdef
. then you can cimport it. (and you need to type return value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for review
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -244,6 +244,8 @@ Backwards incompatible API changes | |||
|
|||
- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`) | |||
- :meth:`Series.str.cat` will now raise if `others` is a `set` (:issue:`23009`) | |||
- The method `pandas._libs.lib.infer_dtype` now returns `'empty'` rather than (sometimes) the dtype of the array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False): | |||
values = construct_1d_object_array_from_listlike(value) | |||
|
|||
values = getattr(values, 'values', values) | |||
if skipna: | |||
values = values[~isnaobj(values)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checknull only returns a single bint, and not an array. I would have liked to cimport
isnaobj, but that didn't work.
]) | ||
def test_object_empty(self, dtype, skipna, expected): | ||
# GH 23421 | ||
arr = pd.Series([np.nan, np.nan], dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False): | |||
values = construct_1d_object_array_from_listlike(value) | |||
|
|||
values = getattr(values, 'values', values) | |||
if skipna: | |||
values = values[~isnaobj(values)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this isn array, ok, then add isnaobj
to missing.pxd
and make it a cpdef
. then you can cimport it. (and you need to type return value)
@jreback |
can you rebase |
pandas/_libs/missing.pxd
Outdated
@@ -1,8 +1,14 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
from numpy cimport ndarray, uint8_t | |||
|
|||
from tslibs.nattype cimport is_null_datetimelike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you added this back in rebase. pls remove
pandas/_libs/missing.pxd
Outdated
from numpy cimport ndarray, uint8_t | ||
|
||
from tslibs.nattype cimport is_null_datetimelike | ||
|
||
cpdef bint checknull(object val) | ||
cpdef bint checknull_old(object val) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no extra blank lines
thanks |
git diff upstream/master -u -- "*.py" | flake8 --diff
Working towards #23167 needs fixing of the type inference for some corner cases (especially not inferring an object column full of NaNs to be
'floating'
).