-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: clarify Index integer conversions when dtype is specified in construction #15187
Comments
cc @gfyoung |
For references, here are the same inputs with >>> Series([np.nan],dtype='int64') # NO MATCH
...
ValueError: cannot convert float NaN to integer
>>>
>>> Series([np.nan],dtype='uint64') # NO MATCH
...
ValueError: cannot convert float NaN to integer
>>>
>>> Series([np.iinfo(np.int64).max-1],dtype='int64') # MATCH
0 9223372036854775806
dtype: int64
>>>
>>> Series([np.iinfo(np.int64).max-1],dtype='int64') # MATCH
0 9223372036854775806
dtype: uint64
>>>
>>> Series([np.iinfo(np.uint64).max-1],dtype='int64') # MATCH
...
OverflowError: Python int too large to convert to C long
>>>
>>> Series([-1], dtype='int64') # MATCH
0 -1
dtype: int64
>>>
>>> Series([-1], dtype='uint64') # MATCH
0 18446744073709551615
dtype: uint64
>>>
>>> Series([np.iinfo(np.int32).max+1], dtype='int64') # MATCH
0 2147483648
dtype: int64 If we are going to resolve this, the behavior should be consistent across the board. Also, I'm not entirely sure now if I want to change the behavior when we pass in negative numbers and specify |
On the NaN case: I would be in favor of raising an error when passing |
@jorisvandenbossche : What do you mean by "converted" ? Casting modulo the maximum value of the integer data-type could be argued to be a conversion. Or are you saying that we just go with |
Yes, I know, the exact interpretation of that is the debatable part. |
@jorisvandenbossche : Yes, I agree with you regarding |
I think the general philosophy is to raise when presented with a dtype that is incompat with the input, essentially this is going to do an
I also find this error message not very informative (we are passing thru the numpy message). |
Yes, we can agree on raising when Again, I'm not too sure right now about raising on negative integers specified with an unsigned integer |
It's true that the above raises, but this is not very consistent:
At least, we could make |
Maybe we can start with a PR to raise with |
@gfyoung sure partially closing this would be great. |
sure when numpy makes sense. Given a wrap-around with negative numbers for |
@jreback : That wouldn't be the only case you would have to consider of "wrap-around" BTW. Look at all of the examples @jorisvandenbossche provided above. Essentially casting to |
@gfyoung I don't think this is very complicated.. This is only when the dtype is actually specified. You check for negative values then cast to This is the point of having a Everything is already divorced but numpy and wrapped in the blocks which have convenient API for values (and casting). We do exactly these types of things for example in trying to set nan with an integer block. The only part of this which is not wrapped up are some construction validation routines (which happen way before block creation). We do lots of inference to figure out what the user is passing, see https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L2843 |
Here's a possible way to patch the
All of these tests incorporate |
I think you need to do the 2nd part first. IOW, if the floats == integers when casted it is ok, but then need to raise on the ValueError (rather than converting to a float index)
can you show the test that is problematic? IOW an example of it. |
I'm not sure I 100% followed what you said there. Also, all the tests I mentioned that failed fail with my |
I said that your change is actually not necessary, rather, https://github.com/pandas-dev/pandas/blob/master/pandas/indexes/base.py#L260 I think all you need to do is remove the |
No, that's definitely the wrong place. Note that |
This is getting more difficult to change, as I realized that >>> from pandas import Index
>>> Index([1, 2, 3]).where([False, True, True])
...
ValueError: cannot convert float NaN to integer |
Partial addresses pandas-devgh-15187.
Partially addresses pandas-devgh-15187.
Partially addresses pandas-devgh-15187.
Partially addresses pandas-devgh-15187.
Partially addresses pandas-devgh-15187.
thanks! we can make a checklist in the issue if that helps? (pls put it up and ill copy to the top if you can) |
@jreback : Can you edit the original issue to add this PR to the checklist? |
yes can u enumerate the open issues (as i see them) and will make checkboxes (even better is for you to post below and i'll update the top) |
so I think that fixed [10] and [11], but [12] and [15] remain |
@jreback : So the behavior of |
@gfyoung yes you are correct. Ok then. I will close this one. Let's however open a new one for Index/Series fixes for [12], [15] (with checkboxes). |
xref pandas-dev#15187. Author: gfyoung <gfyoung17@gmail.com> Closes pandas-dev#15616 from gfyoung/nan-int-index and squashes the following commits: 195b830 [gfyoung] BUG: Error when specifying int index containing NaN
Yep, done: #15832 |
xref #15162
so [8], and [9] are our current model, IOW, we make an effort to convert to the specified type, but will coerce to an available type if the data is not valid for that dtype.
ideally we would also be consistent w.r.t. #15145, IOW Series construction with a specified dtype (not that we upcast to the available Index types but don't do this for Series, e.g. [18])
So we should be consistent on this.
The text was updated successfully, but these errors were encountered: