Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series / DataFrame reductions convert string data to numeric #34671

Closed
3 tasks done
TomAugspurger opened this issue Jun 9, 2020 · 6 comments · Fixed by #52281
Closed
3 tasks done

BUG: Series / DataFrame reductions convert string data to numeric #34671

TomAugspurger opened this issue Jun 9, 2020 · 6 comments · Fixed by #52281
Labels
Bug Needs Discussion Requires discussion from core team before further action Reduction Operations sum, mean, min, max, etc.

Comments

@TomAugspurger
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

In [34]: pd.Series(['1', '2', '3']).median()
Out[34]: 2.0

In [36]: pd.DataFrame({"A": ['1', '2', '3']}).median()
Out[36]:
A    2.0
dtype: float64

Problem description

median should not convert the input types. We seem to explicitly convert all non-float dtypes in nanmedian. Do we want to do that?

Expected Output

TypeError or ValueError. Not sure

@TomAugspurger TomAugspurger added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Needs Discussion Requires discussion from core team before further action labels Jun 9, 2020
@jorisvandenbossche
Copy link
Member

I think we clearly don't want to do this, for the above case.
But I suppose this way it enables numeric values in object dtype in general? And there might be use cases for that which would be broken if we remove this conversion to float .. (eg decimals, etc)

@TomAugspurger
Copy link
Contributor Author

But I suppose this way it enables numeric values in object dtype in general?

Agreed. At the very least, we'll want to ensure we have a test that something like pd.Series([1, 2], dtype=object).median() continues to work.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2020

isn’t this the same treatment for min/max

median is an ordering lookup not dependent on the type (except for ties)

@TomAugspurger
Copy link
Contributor Author

min / max doesn't first convert to float

In [14]: pd.Series(['1', '2']).min()
Out[14]: '1'

mean does convert to float.

In [15]: pd.Series(['1', '2']).mean()
Out[15]: 6.0

(how does it get 6.0 there though? 😄)

@TomAugspurger TomAugspurger changed the title BUG: Series / DataFrame.median converts string data to numeric BUG: Series / DataFrame reductions convert string data to numeric Jun 10, 2020
@jorisvandenbossche
Copy link
Member

(how does it get 6.0 there though? smile)

Ah boy .. But so it's not only median.

BTW, just checked, the reason for 6 is actually because "1" + "2" is "12", and then we convert to numeric 12, and divide by the count ... :)

@jreback
Copy link
Contributor

jreback commented Jun 10, 2020

i believe we handle object dtype like this to make this work if we happen to have numeric in object type

we should infer in object dtype for a more specific numeric type before performing these ops

@jbrockmendel jbrockmendel added the Reduction Operations sum, mean, min, max, etc. label Sep 21, 2020
@mroeschke mroeschke removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Aug 7, 2021
@jreback jreback added this to the Contributions Welcome milestone Dec 23, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Discussion Requires discussion from core team before further action Reduction Operations sum, mean, min, max, etc.
Projects
None yet
5 participants