-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix overflow error in cartesian_product #15265
Conversation
When the numbers in `X` are large it can cause an overflow error on windows machine where the native `int` is 32 bit. Switching to np.intp alleviates this problem. Other fixes would include switching to np.uint32 or np.uint64.
Codecov Report@@ Coverage Diff @@
## 0.19.x #15265 +/- ##
=======================================
Coverage 85.27% 85.27%
=======================================
Files 144 144
Lines 50946 50946
=======================================
Hits 43444 43444
Misses 7502 7502
Continue to review full report at Codecov.
|
need a test for this (IOW something simple that fails w/o the fix and passes with it). best way is to step thru your code until you hit this and see what the inputs are (to cartesian product) and can use that as a test. need a whats new (bug fix) as well. |
X = np.arange(65536) | ||
Y = np.arange(65535) | ||
result1, result2 = cartesian_product([X, Y]) | ||
expected_size = X.size * Y.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this is still quite large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't think of another way to cause the error without adding checking code to the cartesian_product
function itself, e.g.
if np.any(cumprodX < 0):
raise RuntimeError(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, let's take the test out (just verify what you were doing works on windows).
pls add a whatsnew note and can merge
note that this is still buggy, because we do the cartesian product in the first place (this is #14942), so I think this will still break for you (just in a different place now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I switched to an altered version of numpy's histogramdd
for this problem.
ping on green. |
thanks @david-hoffman FYI, you were actually branched off of the 0.19.x branch, rather than master, but I picked the commit. |
When the numbers in `X` are large it can cause an overflow error on windows machine where the native `int` is 32 bit. Switching to np.intp alleviates this problem. Other fixes would include switching to np.uint32 or np.uint64. closes pandas-dev#15234 Author: David Hoffman <dave.p.hoffman@gmail.com> Closes pandas-dev#15265 from david-hoffman/patch-1 and squashes the following commits: c9c8d5e [David Hoffman] Update v0.19.2.txt d54583e [David Hoffman] Remove `test_large_input` because it's too big 47a6c6c [David Hoffman] Update test so that it will actually run on "normal" machine 7aeee85 [David Hoffman] Added tests for large numbers b196878 [David Hoffman] Fix overflow error in cartesian_product
When the numbers in
X
are large it can cause an overflow error on windows machine where the nativeint
is 32 bit. Switching to np.intp alleviates this problem.Other fixes would include switching to np.uint32 or np.uint64.
#15234