BUG: DataFrame to_dict method raise Out of bounds nanosecond timestamp #39389

aniaan · 2021-01-25T03:52:35Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pandas as pd
from datetime import datetime

series = pd.Series([datetime(year=4172, month=12, day=31)])
series.to_dict()

# raise pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: 
# Out of bounds nanosecond timestamp: 4172-12-31 00:00:00
d = {'d1': [datetime(year=4172, month=12, day=31)]}
df = pd.DataFrame(data=d)
df.to_dict(orient='records')

Problem description

The maximum default value of date in oracle database is 4172-12-31 00:00:00, so there is a lot of such data, we read the data by pandas + sqlalchemy and then to_dict() returns the result to the front-end display. But here an exception occurs.

I think the to_dict operation does not need to check the date range, because the date type itself can exist is the language level allows, to_dict after getting the python object, python are allowed, we do not need to add restrictions here.

pandas/pandas/core/frame.py

Line 1632 in 37b5800

into_c((k, maybe_box_datetimelike(v)) for k, v in row.items())

#37648 and #37571 may have something to do with it

Expected Output

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here leaving a blank line after the details tag]
INSTALLED VERSIONS

commit : 37b5800
python : 3.8.6.final.0
python-bits : 64
OS : Darwin
OS-release : 20.2.0
Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.0.dev0+529.g37b5800af
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 21.0
setuptools : 49.2.1
Cython : 0.29.21
pytest : 6.2.1
hypothesis : 6.0.3
sphinx : 3.4.3
blosc : 1.10.2
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.5
fastparquet : 0.5.0
gcsfs : 0.7.1
matplotlib : 3.3.3
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : 0.5.2
scipy : 1.6.0
sqlalchemy : 1.3.22
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.52.0

The text was updated successfully, but these errors were encountered:

arw2019 · 2021-01-25T04:36:22Z

This isn't to do with to_dict per se. You're getting the error because to_dict will return pd.Timestamp and pd.Timestamp overflows for the example you gave:

In [10]: pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))
---------------------------------------------------------------------------
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-10-3ca54af082ef> in <module>
----> 1 pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4172-12-31 00:00:00

arw2019 · 2021-01-25T04:40:36Z

we decided it's ok to return Timestamp in #37571 (comment)

cc @jreback do we plan to change behavior here? Otherwise IMO no action on this

aniaan · 2021-01-25T05:26:34Z

This isn't to do with to_dict per se. You're getting the error because to_dict will return pd.Timestamp and pd.Timestamp overflows for the example you gave:

In [10]: pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))
---------------------------------------------------------------------------
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-10-3ca54af082ef> in <module>
----> 1 pd.Timestamp(datetime.datetime(year=4172, month=12, day=31))

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4172-12-31 00:00:00

Yes, it's essentially an overflow of pd.Timestamp because of the layer of processing done inside maybe_box_datetimelike. But what I'm trying to say is that the operation to_dict itself shouldn't have an exception here.

asishm · 2021-01-25T05:50:14Z

we decided it's ok to return Timestamp in #37571 (comment)

Imo that decision may not have considered limitations to pandas Timestamp (wrt OutofBoundsErrors).

Would be good if python native datetimes were returned. OP of the original issue also added a comment requesting explicitly that (#21256 (comment))

arw2019 · 2021-01-25T05:55:31Z

Happy to do that within the open PR that addresses to_dict return types pending opinions from @jreback and other core devs

arw2019 · 2021-01-25T06:01:44Z

I wonder if one reason to keep timestamp is for timezone handling

Anyways will wait for others to chime in

simonjayhawkins · 2021-01-25T11:51:37Z

we decided it's ok to return Timestamp in #37571 (comment)

cc @jreback do we plan to change behavior here? Otherwise IMO no action on this

I think #29824 covers this. so maybe can close this as duplicate and continue discussion there.

arw2019 · 2021-01-31T16:45:21Z

I think #29824 covers this. so maybe can close this as duplicate and continue discussion there.

Agreed. I'll reference this discussion there - closing this

@BEANNAN @asishm feel free to ping us on #29824 if we're slow to repond

arw2019 · 2021-02-07T20:02:17Z

FYI @BEANNAN @asishm from #37648 (comment)

Timestamp are python types. (as they are a subclass of datetime.datetime)

asishm · 2021-02-08T04:45:45Z

They very well might be subclasses of datetime.datetime but datetime.datetime doesn't have the limitation of being limited to pd.Timestamp.min and pd.Timestamp.max. So this would be a bug in case the datetime falls outside of the range.

jrounds · 2021-10-01T17:31:00Z

+1 was going to bug report but was already reported. Only work around at the moment is to cast out of a datetime correct?

aniaan added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2021

arw2019 added Needs Discussion Requires discussion from core team before further action Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2021

arw2019 mentioned this issue Jan 31, 2021

BUG: Series.to_dict does not return native Python types #37648

Merged

5 tasks

arw2019 closed this as completed Jan 31, 2021

arw2019 mentioned this issue Jan 31, 2021

using to_dict with the 'records' orient produces different results from the default one #29824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame to_dict method raise Out of bounds nanosecond timestamp #39389

BUG: DataFrame to_dict method raise Out of bounds nanosecond timestamp #39389

aniaan commented Jan 25, 2021 •

edited

Loading

[paste the output of `pd.show_versions()` here leaving a blank line after the details tag]
INSTALLED VERSIONS

arw2019 commented Jan 25, 2021

arw2019 commented Jan 25, 2021

aniaan commented Jan 25, 2021

asishm commented Jan 25, 2021 •

edited

Loading

arw2019 commented Jan 25, 2021

arw2019 commented Jan 25, 2021

simonjayhawkins commented Jan 25, 2021

arw2019 commented Jan 31, 2021

arw2019 commented Feb 7, 2021

asishm commented Feb 8, 2021 •

edited

Loading

jrounds commented Oct 1, 2021

BUG: DataFrame to_dict method raise Out of bounds nanosecond timestamp #39389

BUG: DataFrame to_dict method raise Out of bounds nanosecond timestamp #39389

Comments

aniaan commented Jan 25, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag] INSTALLED VERSIONS

arw2019 commented Jan 25, 2021

arw2019 commented Jan 25, 2021

aniaan commented Jan 25, 2021

asishm commented Jan 25, 2021 • edited Loading

arw2019 commented Jan 25, 2021

arw2019 commented Jan 25, 2021

simonjayhawkins commented Jan 25, 2021

arw2019 commented Jan 31, 2021

arw2019 commented Feb 7, 2021

asishm commented Feb 8, 2021 • edited Loading

jrounds commented Oct 1, 2021

aniaan commented Jan 25, 2021 •

edited

Loading

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here leaving a blank line after the details tag]
INSTALLED VERSIONS

asishm commented Jan 25, 2021 •

edited

Loading

asishm commented Feb 8, 2021 •

edited

Loading