-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.loc on DataFrame returning coerced dtype for single rows #11617
Comments
hmm, that is a bit inconsistent. I would expect all of these to give the same result (and not coerce), adding
if you'd like to dig in would be great! |
Wow ... I have been deep down in the 5k LOC internals.py... I don't think I wanna go there again :-) So, indeed a creation of a In the following, I used the latest release for tracing but I do point into the master codebase. Perhaps If you have installed pandas master you could try if this still applies (I think yes). I have traced it so far as first a Series is created for the first key in the tuple In the creation of the Series, the blocks are still correct: But then in the dtype is determined by I think this is how pandas Series are defined (they must contain just one type). But the question is if the creation of the series should perhaps better be done after the second key (in this example the column Not sure if this is still |
@samueljohn haha, indexing is pretty complex! We don't distinguish between all scalar keys upfront, hence the serial conversions. Easiest thing to do is try changing and see if your tests for this behavior (and original tests pass). That is the part about indexing, preserving the API when making changes. |
Hi @jreback , @samueljohn. I also encountered this problem today. After a little digging around, the following may help: Firstly, the dataframe behaves correctly if there is a non-numeric object in the dataframe:
Secondly, this may be fixed by simply changing how the Series constructor is called:
Any thoughts on possible performance hits if Cheers, |
returning as |
I propose a fix in _interleaved_dtype(blocks). I think there are use cases for both scenarios:
Maybe it could be added as a pandas option, perhaps 'mode.coerce_numerical_dtypes'? |
Try to address the specific change of verifying that all of the cases above return the same dtype. Doing something more complicated like returning an Doing what you are suggesting above is not going to back-compat and likely break lots of things. Start small. Adding an option is also a non-starter. |
I think I will submit a separate issue. I currently require a way of retrieving a row of a DataFrame that preserves numerical dtypes, which is separate to this issue, but very related. |
I suppose a:
|
That could work, but would require more thought into how it plugs in to other data frame methods (e.g. df.apply(..., axis=1, coerce=True)) I will have a go at it, but it might take me some time. |
handle iterator handle NamedTuple .loc retuns scalar selection dtypes correctly, closes pandas-dev#11617 xref pandas-dev#15113
handle iterator handle NamedTuple .loc retuns scalar selection dtypes correctly, closes pandas-dev#11617 xref pandas-dev#15113
handle iterator handle NamedTuple .loc retuns scalar selection dtypes correctly, closes pandas-dev#11617 xref pandas-dev#15113 Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#15120 from jreback/indexing and squashes the following commits: 801c8d9 [Jeff Reback] BUG: indexing changes to .loc for compat to .ix for several situations
xref #14205
The
.loc
method ofDataFrame
with different dtypes yields coerced type even if the resulting slice does only contain elements from one type. This happens only when selecting a single row.I can guess that this might be intended because the implementation of
loc
seems to first lookup the row as a single Series, doing the coercion and then applying the second (column) indexer.However, when the column indexer narrows down the selection such that the upcasting would not have been necessary in the first place, it can be very surprising and may even cause bugs (on user-side) if it goes unnoticed. (Like, "I was sure that those column was
int64
").Feel free to close if the behavior s intended. Maybe this this a "bug" or an suggested API change. I dunno.
Perhaps related to #10503, #9519, #9269, #11594 ?
The text was updated successfully, but these errors were encountered: