Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame to_dict method raise Out of bounds nanosecond timestamp #39389

Closed
3 tasks done
aniaan opened this issue Jan 25, 2021 · 11 comments
Closed
3 tasks done

BUG: DataFrame to_dict method raise Out of bounds nanosecond timestamp #39389

aniaan opened this issue Jan 25, 2021 · 11 comments
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action

Comments

@aniaan
Copy link
Contributor

aniaan commented Jan 25, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pandas as pd
from datetime import datetime

series = pd.Series([datetime(year=4172, month=12, day=31)])
series.to_dict()

# raise pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: 
# Out of bounds nanosecond timestamp: 4172-12-31 00:00:00
d = {'d1': [datetime(year=4172, month=12, day=31)]}
df = pd.DataFrame(data=d)
df.to_dict(orient='records')

Problem description

The maximum default value of date in oracle database is 4172-12-31 00:00:00, so there is a lot of such data, we read the data by pandas + sqlalchemy and then to_dict() returns the result to the front-end display. But here an exception occurs.

I think the to_dict operation does not need to check the date range, because the date type itself can exist is the language level allows, to_dict after getting the python object, python are allowed, we do not need to add restrictions here.

into_c((k, maybe_box_datetimelike(v)) for k, v in row.items())

#37648 and #37571 may have something to do with it

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]
INSTALLED VERSIONS

commit : 37b5800
python : 3.8.6.final.0
python-bits : 64
OS : Darwin
OS-release : 20.2.0
Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.0.dev0+529.g37b5800af
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 21.0
setuptools : 49.2.1
Cython : 0.29.21
pytest : 6.2.1
hypothesis : 6.0.3
sphinx : 3.4.3
blosc : 1.10.2
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.5
fastparquet : 0.5.0
gcsfs : 0.7.1
matplotlib : 3.3.3
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : 0.5.2
scipy : 1.6.0
sqlalchemy : 1.3.22
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.52.0

@aniaan aniaan added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2021
@arw2019
Copy link
Member

arw2019 commented Jan 25, 2021

This isn't to do with to_dict per se. You're getting the error because to_dict will return pd.Timestamp and pd.Timestamp overflows for the example you gave:

In [10]: pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))
---------------------------------------------------------------------------
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-10-3ca54af082ef> in <module>
----> 1 pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4172-12-31 00:00:00

@arw2019 arw2019 added Needs Discussion Requires discussion from core team before further action Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2021
@arw2019
Copy link
Member

arw2019 commented Jan 25, 2021

we decided it's ok to return Timestamp in #37571 (comment)

cc @jreback do we plan to change behavior here? Otherwise IMO no action on this

@aniaan
Copy link
Contributor Author

aniaan commented Jan 25, 2021

This isn't to do with to_dict per se. You're getting the error because to_dict will return pd.Timestamp and pd.Timestamp overflows for the example you gave:

In [10]: pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))
---------------------------------------------------------------------------
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-10-3ca54af082ef> in <module>
----> 1 pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4172-12-31 00:00:00

Yes, it's essentially an overflow of pd.Timestamp because of the layer of processing done inside maybe_box_datetimelike. But what I'm trying to say is that the operation to_dict itself shouldn't have an exception here.

@asishm
Copy link
Contributor

asishm commented Jan 25, 2021

we decided it's ok to return Timestamp in #37571 (comment)

Imo that decision may not have considered limitations to pandas Timestamp (wrt OutofBoundsErrors).

Would be good if python native datetimes were returned. OP of the original issue also added a comment requesting explicitly that (#21256 (comment))

@arw2019
Copy link
Member

arw2019 commented Jan 25, 2021

Happy to do that within the open PR that addresses to_dict return types pending opinions from @jreback and other core devs

@arw2019
Copy link
Member

arw2019 commented Jan 25, 2021

I wonder if one reason to keep timestamp is for timezone handling

Anyways will wait for others to chime in

@simonjayhawkins
Copy link
Member

we decided it's ok to return Timestamp in #37571 (comment)

cc @jreback do we plan to change behavior here? Otherwise IMO no action on this

I think #29824 covers this. so maybe can close this as duplicate and continue discussion there.

@arw2019
Copy link
Member

arw2019 commented Jan 31, 2021

I think #29824 covers this. so maybe can close this as duplicate and continue discussion there.

Agreed. I'll reference this discussion there - closing this

@BEANNAN @asishm feel free to ping us on #29824 if we're slow to repond

@arw2019
Copy link
Member

arw2019 commented Feb 7, 2021

FYI @BEANNAN @asishm from #37648 (comment)

Timestamp are python types. (as they are a subclass of datetime.datetime)

@asishm
Copy link
Contributor

asishm commented Feb 8, 2021

They very well might be subclasses of datetime.datetime but datetime.datetime doesn't have the limitation of being limited to pd.Timestamp.min and pd.Timestamp.max. So this would be a bug in case the datetime falls outside of the range.

@jrounds
Copy link

jrounds commented Oct 1, 2021

+1 was going to bug report but was already reported. Only work around at the moment is to cast out of a datetime correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

5 participants