-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd.to_datetime() throws if caching is on with Null-like arguments #22305
Comments
Thanks for the report! Looking into it / a fix is certainly welcome. |
We have this test:
Where currently this test with |
In the end it comes down to the following difference between While unique mangles
I ask myself, whether this This issue kind of blocks PR #22296, because PR #22296 fixes the mangling of |
This test should probably pass once it is fixed:
|
Hello guys, It looks like this bug is back in business in the latest version, but a bit harder to trigger: pandas versions>>> pd.show_versions()INSTALLED VERSIONScommit : f2ca0a2 pandas : 1.1.1 How to reproduce: import pandas as pd
s = pd.Series([pd.NaT] * 2000 + [None] * 2000, dtype='object')
pd.to_datetime(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sbrochet/venvs/tmp-fa372ee62ee9bef/lib64/python3.8/site-packages/pandas/core/tools/datetimes.py", line 801, in to_datetime
result = arg.map(cache_array)
File "/home/sbrochet/venvs/tmp-fa372ee62ee9bef/lib64/python3.8/site-packages/pandas/core/series.py", line 3970, in map
new_values = super()._map_values(arg, na_action=na_action)
File "/home/sbrochet/venvs/tmp-fa372ee62ee9bef/lib64/python3.8/site-packages/pandas/core/base.py", line 1131, in _map_values
indexer = mapper.index.get_indexer(values)
File "/home/sbrochet/venvs/tmp-fa372ee62ee9bef/lib64/python3.8/site-packages/pandas/core/indexes/base.py", line 2980, in get_indexer
raise InvalidIndexError(
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects The key here is to have enough entries in the Series to trigger the caching system. |
pls raise a new issue with the example |
Code Sample, a copy-pastable example if possible
Problem description
It results in error:
Expected Output
The same as
result = pd.to_datetime([pd.NaT, None],cache=False)
:DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None)
Output of
pd.show_versions()
pandas: 0.23.4
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.28.3
numpy: 1.13.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None
[paste the output of
pd.show_versions()
here below this line]The text was updated successfully, but these errors were encountered: