Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: fix corner cases of lib.infer_dtype #23421

Closed
h-vetinari opened this issue Oct 30, 2018 · 4 comments · Fixed by #23422
Closed

API: fix corner cases of lib.infer_dtype #23421

h-vetinari opened this issue Oct 30, 2018 · 4 comments · Fixed by #23422
Labels
Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation
Milestone

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Oct 30, 2018

Encountering this while working on #23167

There a few inconsistencies in pandas._libs.lib.infer_dtype, e.g.

>>> import pandas as pd
>>> import numpy as np
>>> import pandas._libs.lib as lib
>>>
>>> lib.infer_dtype(pd.Series([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([], dtype=object))
'empty'
>>> lib.infer_dtype(pd.Index([]))
'empty'
>>> lib.infer_dtype(pd.Series([]))
'floating'  <--- why not empty?

and similarly for

>>> lib.infer_dtype(pd.Series([np.nan, np.nan], dtype=object), skipna=True)
'floating'  <-- wrong
>>> lib.infer_dtype(pd.Index([np.nan, np.nan], dtype=object), skipna=True)
'floating'  <-- wrong
>>> lib.infer_dtype(pd.Series([np.nan, np.nan]), skipna=True)
'floating'  <-- debatable
>>> lib.infer_dtype(pd.Index([np.nan, np.nan]), skipna=True)
'floating'  <-- debatable

In the context of object columns, an all-NA column with skipna=True should definitely not return 'floating' (imagine a column of strings where all values happen to be missing for a given selection / after a join / whatever). I'd argue that 'empty' all-NA for float-type should also infer to 'empty' in case of skipna=True.

The skipna parameter was introduced in #17066 in v.0.21. As a side note, this also promised that the default will be changed from False to True. I wonder if this even needs a deprecation cycle as this is explicitly private by being in _libs.lib.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2018

there is an issue about this

empty Series are by definition floating currently

@gfyoung gfyoung added Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation labels Oct 30, 2018
@h-vetinari
Copy link
Contributor Author

@jreback
I checked, but didn't find anything that looked relevant:
https://github.com/pandas-dev/pandas/issues?q=is%3Aissue+infer+empty+is%3Aopen

@jreback
Copy link
Contributor

jreback commented Oct 30, 2018

#17261

@h-vetinari
Copy link
Contributor Author

@jreback from #23422

@h-vetinari you need to respond on the issue
this is as expected behavior

Also responding here as requested. Even if expected, this is inconsistent, see

>>> lib.infer_dtype(pd.Index([]))
'empty'
>>> lib.infer_dtype(pd.Series([]))
'floating'

It's related, but different from the issue you linked: #17261

The second aspect is an oversight of #17066. All-NA object column + skipna=True should clearly not return 'floating' (and this is causing errors in #23167).

@jreback jreback added this to the 0.24.0 milestone Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants