Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All-Nan MultiIndex level has different dtype than all-NaN flat Index #17929

Closed
toobaz opened this issue Oct 20, 2017 · 5 comments · Fixed by #17934
Closed

All-Nan MultiIndex level has different dtype than all-NaN flat Index #17929

toobaz opened this issue Oct 20, 2017 · 5 comments · Fixed by #17934
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Oct 20, 2017

Code Sample, a copy-pastable example if possible

In [3]: values = [np.nan, np.nan]

In [4]: pd.Index(values).dtype
Out[4]: dtype('float64')

In [5]: pd.MultiIndex.from_arrays([values]).levels[0].dtype
Out[5]: dtype('O')

In [6]: pd.MultiIndex.from_arrays([values, [2, 3]]).levels[0].dtype
Out[6]: dtype('O')

Problem description

Yes, I know, "who cares?". But this is biting me in fixing #17924.

Expected Output

The same - and I tend to think dtype('float64') is both preferred and more backwards-compatible.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 51c5f4d
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.21.0rc1+26.g51c5f4d2a.dirty
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jorisvandenbossche
Copy link
Member

I am not sure this is possible to solve. NaNs are not really first class citizens in MIs, which means the level is actually empty, and the dtype of something empty is object.
(at least this is the case for Index, not yet for Series #17261)

@toobaz
Copy link
Member Author

toobaz commented Oct 21, 2017

NaNs are not really first class citizens in MIs, which means the level is actually empty,

Here I follow you

and the dtype of something empty is object.

... here I don't: I think we should be able to have pd.MultiIndex.from_product([[], []]) (0-length level) stored as empty object array but my example above (>0-length level with only missing values) stored as empy float array.

@jorisvandenbossche
Copy link
Member

But how do you distinguish a 'real' empty MI or an MI with only NaNs ? As the actual level is the same for both: an empty index

toobaz added a commit to toobaz/pandas that referenced this issue Oct 21, 2017
@toobaz
Copy link
Member Author

toobaz commented Oct 21, 2017

But how do you distinguish a 'real' empty MI or an MI with only NaNs ?

I would do it at initialization - see #17934

@jreback
Copy link
Contributor

jreback commented Oct 21, 2017

In [13]: pd.MultiIndex.from_arrays([[pd.NaT, pd.NaT]]).levels[0]
Out[13]: DatetimeIndex([], dtype='datetime64[ns]', freq=None)

In [14]: pd.MultiIndex.from_arrays([[np.nan, np.nan]]).levels[0]
Out[14]: Index([], dtype='object')

In [15]: pd.MultiIndex.from_arrays([[None, None]]).levels[0]
Out[15]: Index([], dtype='object')

there is some ambiguity here, e.. whether [14] and [15] should match (since we use np,nan generically). Would be ok with making [14] float.

@sinhrks sinhrks added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex labels Oct 23, 2017
@jreback jreback added this to the 0.22.0 milestone Oct 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants