Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Overflow in IntervalIndex/IntervalArray mid/length #25499

Open
jschendel opened this issue Mar 1, 2019 · 0 comments
Open

BUG: Overflow in IntervalIndex/IntervalArray mid/length #25499

jschendel opened this issue Mar 1, 2019 · 0 comments
Labels
Bug Interval Interval data type

Comments

@jschendel
Copy link
Member

Code Sample, a copy-pastable example if possible

xref #25485

In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '0.25.0.dev0+180.g016141d'

In [2]: iv = pd.Interval(pd.Timestamp.min, pd.Timestamp.max)

In [3]: ii = pd.IntervalIndex([iv])

In [4]: ii.length
---------------------------------------------------------------------------
OverflowError: Overflow in int64 addition

In [5]: ii.mid
---------------------------------------------------------------------------
OverflowError: Overflow in int64 addition

Note that these operations work on the Interval object itself, albeit with the length being cast to datetime.timedelta:

In [6]: iv.length
Out[6]: datetime.timedelta(213503, 84873, 709550)

In [7]: iv.mid
Out[7]: Timestamp('1970-01-01 00:00:00')

More generally, this can silently occur with integer endpoints. For example with mid:

In [8]: ii64 = np.iinfo(np.int64)                                                                                           

In [9]: iv2 = pd.Interval(ii64.max - 1, ii64.max)                                                                           

In [10]: ii2 = pd.IntervalIndex([iv2])                                                                                      

In [11]: ii2.mid                                                                                                            
Out[11]: Float64Index([-1.5], dtype='float64')

In [12]: iv2.mid                                                                                                            
Out[12]: 9.223372036854776e+18

And with length:

In [13]: iv3 = pd.Interval(ii64.min, ii64.max)                                                                              

In [14]: ii3 = pd.IntervalIndex([iv3])                                                                                      

In [15]: ii3.length                                                                                                         
Out[15]: Int64Index([-1], dtype='int64')

In [16]: iv3.length                                                                                                         
Out[16]: 18446744073709551615

Note that all examples above have the same behavior with IntervalArray in place of IntervalIndex.

Problem description

IntervalIndex and IntervalArray implementations can cause an OverflowError, with the overflow sometimes occurring silently.

Expected Output

It might be possible to for Out[5] to return the correct value with a different implementation.

I'm not sure how much can be done for the other errors given that the correct values are out of bounds for their given dtype. For the integer examples it may be possible to get the correct answer by having the output be an array of uint64 dtype , but I'm not sure off the top of my head how that would be implemented.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 016141d
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.29-galliumos
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.25.0.dev0+180.g016141d
pytest: 4.0.2
pip: 18.1
setuptools: 40.6.3
Cython: 0.28.3
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jschendel jschendel added Numeric Operations Arithmetic, Comparison, and Logical operations Interval Interval data type labels Mar 1, 2019
@mroeschke mroeschke added the Bug label Apr 27, 2020
@mroeschke mroeschke removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interval Interval data type
Projects
None yet
Development

No branches or pull requests

2 participants