You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In [1]: importnumpyasnp; importpandasaspd; pd.__version__Out[1]: '0.25.0.dev0+180.g016141d'In [2]: iv=pd.Interval(pd.Timestamp.min, pd.Timestamp.max)
In [3]: ii=pd.IntervalIndex([iv])
In [4]: ii.length---------------------------------------------------------------------------OverflowError: Overflowinint64additionIn [5]: ii.mid---------------------------------------------------------------------------OverflowError: Overflowinint64addition
Note that these operations work on the Interval object itself, albeit with the length being cast to datetime.timedelta:
In [6]: iv.lengthOut[6]: datetime.timedelta(213503, 84873, 709550)
In [7]: iv.midOut[7]: Timestamp('1970-01-01 00:00:00')
More generally, this can silently occur with integer endpoints. For example with mid:
In [8]: ii64=np.iinfo(np.int64)
In [9]: iv2=pd.Interval(ii64.max-1, ii64.max)
In [10]: ii2=pd.IntervalIndex([iv2])
In [11]: ii2.midOut[11]: Float64Index([-1.5], dtype='float64')
In [12]: iv2.midOut[12]: 9.223372036854776e+18
And with length:
In [13]: iv3=pd.Interval(ii64.min, ii64.max)
In [14]: ii3=pd.IntervalIndex([iv3])
In [15]: ii3.lengthOut[15]: Int64Index([-1], dtype='int64')
In [16]: iv3.lengthOut[16]: 18446744073709551615
Note that all examples above have the same behavior with IntervalArray in place of IntervalIndex.
Problem description
IntervalIndex and IntervalArray implementations can cause an OverflowError, with the overflow sometimes occurring silently.
Expected Output
It might be possible to for Out[5] to return the correct value with a different implementation.
I'm not sure how much can be done for the other errors given that the correct values are out of bounds for their given dtype. For the integer examples it may be possible to get the correct answer by having the output be an array of uint64 dtype , but I'm not sure off the top of my head how that would be implemented.
Code Sample, a copy-pastable example if possible
xref #25485
Note that these operations work on the
Interval
object itself, albeit with the length being cast todatetime.timedelta
:More generally, this can silently occur with integer endpoints. For example with
mid
:And with
length
:Note that all examples above have the same behavior with
IntervalArray
in place ofIntervalIndex
.Problem description
IntervalIndex
andIntervalArray
implementations can cause anOverflowError
, with the overflow sometimes occurring silently.Expected Output
It might be possible to for
Out[5]
to return the correct value with a different implementation.I'm not sure how much can be done for the other errors given that the correct values are out of bounds for their given dtype. For the integer examples it may be possible to get the correct answer by having the output be an array of
uint64
dtype , but I'm not sure off the top of my head how that would be implemented.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: 016141d
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.29-galliumos
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.25.0.dev0+180.g016141d
pytest: 4.0.2
pip: 18.1
setuptools: 40.6.3
Cython: 0.28.3
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: