API: fix corner case of lib.infer_dtype #23422

h-vetinari · 2018-10-30T17:54:25Z

closes API: fix corner cases of lib.infer_dtype #23421
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Working towards #23167 needs fixing of the type inference for some corner cases (especially not inferring an object column full of NaNs to be 'floating').

pep8speaks · 2018-10-30T17:54:29Z

Hello @h-vetinari! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/tests/dtypes/test_inference.py !

codecov · 2018-10-30T20:49:27Z

Codecov Report

Merging #23422 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23422   +/-   ##
=======================================
  Coverage   92.23%   92.23%           
=======================================
  Files         161      161           
  Lines       51197    51197           
=======================================
  Hits        47220    47220           
  Misses       3977     3977

Flag	Coverage Δ
#multiple	`90.61% <ø> (ø)`	⬆️
#single	`42.27% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d78bd7a...19cf2dd. Read the comment docs.

h-vetinari · 2018-10-30T20:52:23Z

Annoyingly, io.sql.SQLTable.create_table needs to have the dtype returned even for empty columns, otherwise the test for empty tables fails. I had been able to fix the pytables failures that appeared in the CI, but I'm now reverting that (breaking) change.

I still think it's a good idea, but fixing that - depending on the use case - would probably need another keyword that determines whether to prioritize 'empty' over the dtype or vice versa.

In any case, what remains of this PR is fixing a corner case that is more clearly wrong (returning 'floating' for an all-NA object column if skipna=True).

gfyoung · 2018-10-30T21:19:01Z

doc/source/whatsnew/v0.24.0.txt

@@ -244,6 +244,8 @@ Backwards incompatible API changes

 - A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`)
 - :meth:`Series.str.cat` will now raise if `others` is a `set` (:issue:`23009`)
+- The method `pandas._libs.lib.infer_dtype` now returns `'empty'` rather than (sometimes) the dtype of the array,


Is this bug fix API-facing?

IMO no, but I was being careful here.

this is not a public function, remove

gfyoung · 2018-10-30T21:24:13Z

pandas/_libs/lib.pyx

@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False):
        values = construct_1d_object_array_from_listlike(value)

    values = getattr(values, 'values', values)
+    if skipna:
+        values = values[~isnaobj(values)]
+


I wonder if we can incorporate this skipna logic into the for-loop below. Perhaps have an indicator to tell us whether we have seen an element in the values array that is non-null (when skipna is True).

Unfortunately, that's not directly possible (nor performant), because the line directly below (with _try_infer_map) will return prematurely as soon as it can grab hold of a dtype.

h-vetinari · 2018-10-30T22:51:48Z

@gfyoung
Could you please restart the failed travis job. It was a hypothesis timeout, nothing else.

jreback

@h-vetinari you need to respond on the issue
this is as expected behavior

h-vetinari · 2018-10-30T23:09:59Z

@jreback
There are two aspects to #23421:

whether lib.infer_dtype(pd.Series([])) returns 'floating' vs. 'empty'
whether lib.infer_dtype(pd.Series([np.nan, np.nan], dtype=object), skipna=True) returns 'floating' vs. 'empty'

I've reverted the fix for 1. (after the SQL errors I mentioned above), but kept 2. This is also not about the issue you linked #17261, but about an oversight of #17066. All-NA object column + skipna=True should clearly not return 'floating' (and this is causing errors in #23167).

jreback · 2018-10-31T12:05:49Z

pandas/tests/dtypes/test_inference.py

+    ])
+    def test_object_empty(self, dtype, skipna, expected):
+        # GH 23421
+        arr = pd.Series([np.nan, np.nan], dtype=dtype)


can you also test with None passed in (for object dtype)

jreback · 2018-10-31T12:06:06Z

doc/source/whatsnew/v0.24.0.txt

@@ -244,6 +244,8 @@ Backwards incompatible API changes

 - A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`)
 - :meth:`Series.str.cat` will now raise if `others` is a `set` (:issue:`23009`)
+- The method `pandas._libs.lib.infer_dtype` now returns `'empty'` rather than (sometimes) the dtype of the array,


this is not a public function, remove

jreback · 2018-10-31T12:06:30Z

pandas/_libs/lib.pyx

@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False):
        values = construct_1d_object_array_from_listlike(value)

    values = getattr(values, 'values', values)
+    if skipna:
+        values = values[~isnaobj(values)]


this is a python and not a cimport, why are you not using checknull?

checknull only returns a single bint, and not an array. I would have liked to cimport isnaobj, but that didn't work.

so this isn array, ok, then add isnaobj to missing.pxd and make it a cpdef. then you can cimport it. (and you need to type return value)

pandas/tests/dtypes/test_inference.py

h-vetinari

thanks for review

h-vetinari · 2018-10-31T15:20:07Z

doc/source/whatsnew/v0.24.0.txt

@@ -244,6 +244,8 @@ Backwards incompatible API changes

 - A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`)
 - :meth:`Series.str.cat` will now raise if `others` is a `set` (:issue:`23009`)
+- The method `pandas._libs.lib.infer_dtype` now returns `'empty'` rather than (sometimes) the dtype of the array,


h-vetinari · 2018-10-31T15:21:07Z

pandas/_libs/lib.pyx

@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False):
        values = construct_1d_object_array_from_listlike(value)

    values = getattr(values, 'values', values)
+    if skipna:
+        values = values[~isnaobj(values)]


checknull only returns a single bint, and not an array. I would have liked to cimport isnaobj, but that didn't work.

pandas/tests/dtypes/test_inference.py

h-vetinari · 2018-10-31T16:14:17Z

pandas/tests/dtypes/test_inference.py

+    ])
+    def test_object_empty(self, dtype, skipna, expected):
+        # GH 23421
+        arr = pd.Series([np.nan, np.nan], dtype=dtype)


jreback · 2018-11-01T00:40:30Z

pandas/_libs/lib.pyx

@@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False):
        values = construct_1d_object_array_from_listlike(value)

    values = getattr(values, 'values', values)
+    if skipna:
+        values = values[~isnaobj(values)]


so this isn array, ok, then add isnaobj to missing.pxd and make it a cpdef. then you can cimport it. (and you need to type return value)

h-vetinari · 2018-11-01T08:19:07Z

@jreback
All green.

jreback · 2018-11-02T14:23:25Z

can you rebase

jreback · 2018-11-03T14:12:57Z

pandas/_libs/missing.pxd

@@ -1,8 +1,14 @@
 # -*- coding: utf-8 -*-

+from numpy cimport ndarray, uint8_t
+
+from tslibs.nattype cimport is_null_datetimelike


you added this back in rebase. pls remove

jreback · 2018-11-03T14:13:11Z

pandas/_libs/missing.pxd

+from numpy cimport ndarray, uint8_t
+
+from tslibs.nattype cimport is_null_datetimelike
+
 cpdef bint checknull(object val)
 cpdef bint checknull_old(object val)



no extra blank lines

jreback · 2018-11-04T15:53:32Z

thanks

API: fix corner case of lib.infer_dtype

8e1e97e

Do not prioritize 'empty' over dtype

3f988e8

Update whatsnew

08d67e8

h-vetinari added a commit to h-vetinari/pandas that referenced this pull request Oct 30, 2018

API: fix corner case of lib.infer_dtype (pandas-dev#23422)

c8bd2b9

h-vetinari mentioned this pull request Oct 30, 2018

API: Series.str-accessor infers dtype (and Index.str does not raise on all-NA) #23167

Merged

3 tasks

gfyoung added Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation labels Oct 30, 2018

gfyoung reviewed Oct 30, 2018

View reviewed changes

jreback requested changes Oct 30, 2018

View reviewed changes

h-vetinari mentioned this pull request Oct 30, 2018

API: fix corner cases of lib.infer_dtype #23421

Closed

Retrigger CI due to hypothesis timeout

bcc481b

jreback requested changes Oct 31, 2018

View reviewed changes

h-vetinari commented Oct 31, 2018

View reviewed changes

Review (jreback)

c0ded96

jreback requested changes Nov 1, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Nov 1, 2018

Review (jreback)

7e5d453

h-vetinari added a commit to h-vetinari/pandas that referenced this pull request Nov 1, 2018

API: fix corner case of lib.infer_dtype (pandas-dev#23422)

a190910

Merge remote-tracking branch 'upstream/master' into infer_empty

8ace6c7

h-vetinari added a commit to h-vetinari/pandas that referenced this pull request Nov 2, 2018

API: fix corner case of lib.infer_dtype (pandas-dev#23422)

3cc2fae

jreback requested changes Nov 3, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into infer_empty

3ce5776

h-vetinari added 2 commits November 3, 2018 16:42

Fix rebase oversight

a533ed8

Retrigger CircleCI

19cf2dd

jreback approved these changes Nov 4, 2018

View reviewed changes

jreback merged commit aaaac86 into pandas-dev:master Nov 4, 2018

h-vetinari deleted the infer_empty branch November 5, 2018 00:14

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

API: fix corner case of lib.infer_dtype (pandas-dev#23422)

e938ed4

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

API: fix corner case of lib.infer_dtype (pandas-dev#23422)

1347658

h-vetinari mentioned this pull request Dec 4, 2018

DEPR: deprecate default of skipna=False in infer_dtype #24050

Merged

4 tasks

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: fix corner case of lib.infer_dtype (pandas-dev#23422)

eb7e80f

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: fix corner case of lib.infer_dtype (pandas-dev#23422)

aa44d3e

groutr mentioned this pull request Dec 11, 2019

infer_dtype() function slower in latest version #28814

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: fix corner case of lib.infer_dtype #23422

API: fix corner case of lib.infer_dtype #23422

h-vetinari commented Oct 30, 2018

pep8speaks commented Oct 30, 2018

codecov bot commented Oct 30, 2018 •

edited

Loading

h-vetinari commented Oct 30, 2018 •

edited

Loading

gfyoung Oct 30, 2018

h-vetinari Oct 30, 2018

jreback Oct 31, 2018

h-vetinari Oct 31, 2018

gfyoung Oct 30, 2018

h-vetinari Oct 30, 2018

h-vetinari commented Oct 30, 2018

jreback left a comment

h-vetinari commented Oct 30, 2018

jreback Oct 31, 2018

h-vetinari Oct 31, 2018

jreback Oct 31, 2018

jreback Oct 31, 2018

h-vetinari Oct 31, 2018

jreback Nov 1, 2018

h-vetinari left a comment

h-vetinari Oct 31, 2018

h-vetinari Oct 31, 2018

h-vetinari Oct 31, 2018

jreback Nov 1, 2018

h-vetinari commented Nov 1, 2018

jreback commented Nov 2, 2018

jreback Nov 3, 2018

jreback Nov 3, 2018

jreback commented Nov 4, 2018

API: fix corner case of lib.infer_dtype #23422

API: fix corner case of lib.infer_dtype #23422

Conversation

h-vetinari commented Oct 30, 2018

pep8speaks commented Oct 30, 2018

codecov bot commented Oct 30, 2018 • edited Loading

Codecov Report

h-vetinari commented Oct 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari commented Oct 30, 2018

jreback left a comment

Choose a reason for hiding this comment

h-vetinari commented Oct 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari commented Nov 1, 2018

jreback commented Nov 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 4, 2018

codecov bot commented Oct 30, 2018 •

edited

Loading

h-vetinari commented Oct 30, 2018 •

edited

Loading