Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: lib.infer_type broken for IntervalArray / PeriodArray #23553

Closed
h-vetinari opened this issue Nov 7, 2018 · 8 comments · Fixed by #37367
Closed

BUG: lib.infer_type broken for IntervalArray / PeriodArray #23553

h-vetinari opened this issue Nov 7, 2018 · 8 comments · Fixed by #37367
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@h-vetinari
Copy link
Contributor

While working on #23167, I found a regression in IntervalArray / PeriodArray - I'm guessing it's related to the whole EA effort. @jorisvandenbossche @TomAugspurger

>>> import pandas as pd
>>> import pandas._libs.lib as lib
>>> s = pd.Series([pd.Period(2013), pd.Period(2018)], dtype='category')
>>> lib.infer_dtype(s.values.categories)
Traceback (most recent call last):
  File "pandas\_libs\lib.pyx", line 1166, in pandas._libs.lib.infer_dtype
    values = getattr(value, '_values', getattr(value, 'values', value))
TypeError: Cannot convert PeriodArray to numpy.ndarray
>>>
>>> t = pd.Series([pd.Interval(0, 1), pd.Interval(0, 2)], dtype='category')
>>> lib.infer_dtype(t.values.categories)
Traceback (most recent call last):
  File "pandas\_libs\lib.pyx", line 1166, in pandas._libs.lib.infer_dtype
    values = getattr(value, '_values', getattr(value, 'values', value))
TypeError: Cannot convert IntervalArray to numpy.ndarray

This was still working in v0.23.4:

>>> import pandas as pd
>>> import pandas._libs.lib as lib
>>> pd.__version__
'0.23.4'
>>> s = pd.Series([pd.Period(2013), pd.Period(2018)], dtype='category')
>>> lib.infer_dtype(s.values.categories)
'period'
>>>
>>> t = pd.Series([pd.Interval(0, 1), pd.Interval(0, 2)], dtype='category')
>>> lib.infer_dtype(t.values.categories)
'interval'

This is due to the fact that lib.infer_dtype tries
values = getattr(value, '_values', getattr(value, 'values', value))
and both IntervalArray / PeriodArray do not return a numpy array for _values anymore. For PeriodArray, it would work if _values and values were switched, but for IntervalArray, both _values and values return the same (non-numpy-array) result.

@gfyoung gfyoung added Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version ExtensionArray Extending pandas with custom dtypes or arrays. and removed Regression Functionality that used to work in a prior pandas version labels Nov 8, 2018
@gfyoung gfyoung added this to the 0.24.0 milestone Nov 8, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 8, 2018

While technically not a regression for us (regression means that it broke in the most recent release but was working in the one before that), this should be "restored" to what it was before we release 0.24.0.

A git bisect would be good to determine the cause.

@gfyoung gfyoung added the Bug label Nov 8, 2018
@jorisvandenbossche
Copy link
Member

Not that I want to discuss semantics too much, but this is for sure a regression :)
Just one that does not need to end up in an actual release if we fix it now, thanks to @h-vetinari discovering it on master.

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version and removed Bug labels Nov 8, 2018
@jorisvandenbossche
Copy link
Member

I am not very familiar with the infer_dtype code, but I would think we should add 'period' (and the names of our other internal dtypes as well) to _TYPE_MAP, so the ExtensionArray will be properly recognized, just as Categorical is already recognized.

@TomAugspurger TomAugspurger mentioned this issue Nov 9, 2018
2 tasks
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 10, 2018

What's the expected dtype for

>>> lib.infer_dtype(integer_array([1, np.nan]))

Should we basically expect infer_dtype to throw away .dtype information, and treat it the same as [1, np.nan] (so mixed-integer-float)? Or do we add a new return value for integer-na?

cc @jreback I suspect you're the most familiar with this.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2018

so for this example we would return Int64 (because it’s an actual array) but for a list i would expect the same as now (mixed integer float)

@TomAugspurger
Copy link
Contributor

Would you say 'integer-na' or 'nullable-integer', rather than Int64 to be more consistent with the other return values, which aren't dtypes (no itemsize)?

@jreback
Copy link
Contributor

jreback commented Nov 10, 2018

i think we could eventually add those
but the inference is used in indexing so i am not sure if changing it now would work

@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Nov 25, 2018
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 0.24.0 Dec 10, 2018
@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Jan 5, 2019
@h-vetinari
Copy link
Contributor Author

milestone ping pong, haha. ;-)

Would this still be acceptable during RC phase? I might be able to make some time.

@jreback jreback added this to the 1.2 milestone Nov 15, 2020
@jreback jreback modified the milestones: 1.2, Contributions Welcome Nov 25, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Regression Functionality that used to work in a prior pandas version
Projects
None yet
6 participants