Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index.get_indexer() throws when None and at least one of np.nan, pd.NaT are present in input #22332

Closed
realead opened this issue Aug 14, 2018 · 1 comment · Fixed by #22296
Closed
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@realead
Copy link
Contributor

realead commented Aug 14, 2018

Code Sample

arr=pd.unique(np.array([pd.NaT, None], dtype=np.object))
index=pd.Index(arr, dtype=np.object).get_indexer([])

Problem description

throws "Reindexing only valid with uniquely valued Index objects".

This is also the case for [np.nan, None]

Expected Output

It should not crash. One would also expect, that the array returned by pd.unique() is really considered to be consisting of unique elements.

That is not the only problem in , but one blocker for #22305.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.28.3
numpy: 1.13.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

[paste the output of pd.show_versions() here below this line]

@realead realead changed the title Index.get_index() throws when at least two of np.nan,None,pd.NaT are present Index.get_index() throws when None and at least one of np.nan, pd.NaT are present in input Aug 14, 2018
@realead
Copy link
Contributor Author

realead commented Aug 14, 2018

The inconsistency is between which NA values are mangled in unique:

            # `val is None` below is exception to prevent mangling of None and

            # other NA values; note however that other NA values (ex: pd.NaT
            # and np.nan) will still get mangled, so many not be a permanent
            # solution; see GH 20866
            if not checknull(val) or val is None:
                k = kh_get_pymap(self.table, <PyObject*>val)
                if k == self.table.n_buckets:
                    kh_put_pymap(self.table, <PyObject*>val, &ret)
                    uniques.append(val)
            elif not seen_na:
                seen_na = 1
                uniques.append(nan)

and otherwise in the class, in particular in map_locations:

if val != val or val is None:
          val = na_sentinel

@realead realead changed the title Index.get_index() throws when None and at least one of np.nan, pd.NaT are present in input Index.get_indexer() throws when None and at least one of np.nan, pd.NaT are present in input Aug 16, 2018
@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Sep 5, 2018
@jreback jreback added this to the 0.24.0 milestone Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants