Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix overflow error in cartesian_product #15265

Closed
wants to merge 5 commits into from
Closed

Fix overflow error in cartesian_product #15265

wants to merge 5 commits into from

Conversation

david-hoffman
Copy link
Contributor

When the numbers in X are large it can cause an overflow error on windows machine where the native int is 32 bit. Switching to np.intp alleviates this problem.

Other fixes would include switching to np.uint32 or np.uint64.

#15234

When the numbers in `X` are large it can cause an overflow error on windows machine where the native `int` is 32 bit. Switching to np.intp alleviates this problem.

Other fixes would include switching to np.uint32 or np.uint64.
@codecov-io
Copy link

codecov-io commented Jan 30, 2017

Codecov Report

Merging #15265 into 0.19.x will not impact coverage.

@@           Coverage Diff           @@
##           0.19.x   #15265   +/-   ##
=======================================
  Coverage   85.27%   85.27%           
=======================================
  Files         144      144           
  Lines       50946    50946           
=======================================
  Hits        43444    43444           
  Misses       7502     7502
Impacted Files Coverage Δ
pandas/tools/util.py 96.77% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ead784f...c9c8d5e. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jan 30, 2017

need a test for this (IOW something simple that fails w/o the fix and passes with it).

best way is to step thru your code until you hit this and see what the inputs are (to cartesian product) and can use that as a test.

need a whats new (bug fix) as well.

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Windows Windows OS labels Jan 30, 2017
X = np.arange(65536)
Y = np.arange(65535)
result1, result2 = cartesian_product([X, Y])
expected_size = X.size * Y.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is still quite large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of another way to cause the error without adding checking code to the cartesian_product function itself, e.g.

if np.any(cumprodX < 0):
    raise RuntimeError(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, let's take the test out (just verify what you were doing works on windows).

pls add a whatsnew note and can merge

note that this is still buggy, because we do the cartesian product in the first place (this is #14942), so I think this will still break for you (just in a different place now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I switched to an altered version of numpy's histogramdd for this problem.

@jreback jreback added this to the 0.20.0 milestone Jan 30, 2017
@jreback
Copy link
Contributor

jreback commented Jan 30, 2017

ping on green.

@jreback jreback closed this in 48fc9d6 Feb 1, 2017
@jreback
Copy link
Contributor

jreback commented Feb 1, 2017

thanks @david-hoffman

FYI, you were actually branched off of the 0.19.x branch, rather than master, but I picked the commit.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
When the numbers in `X` are large it can cause an overflow error on
windows machine where the native `int` is 32 bit. Switching to np.intp
alleviates this problem.    Other fixes would include switching to
np.uint32 or np.uint64.

closes pandas-dev#15234

Author: David Hoffman <dave.p.hoffman@gmail.com>

Closes pandas-dev#15265 from david-hoffman/patch-1 and squashes the following commits:

c9c8d5e [David Hoffman] Update v0.19.2.txt
d54583e [David Hoffman] Remove `test_large_input` because it's too big
47a6c6c [David Hoffman] Update test so that it will actually run on "normal" machine
7aeee85 [David Hoffman] Added tests for large numbers
b196878 [David Hoffman] Fix overflow error in cartesian_product
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Windows Windows OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants