Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. #14162

bgatessucks · 2016-09-06T07:31:50Z

Related to #6898.

I find it very convenient to use a DataFrame of ufloat from the uncertainties package. Each entry consists of (value, error) and could represent the result of Monte Carlo simulations or an experiment.

At present taking sums along both axes gives the expected result, but taking the mean does not.

import pandas as pd
import numpy as np
from uncertainties import unumpy

value = np.arange(12).reshape(3,4)
err = 0.01 * np.arange(12).reshape(3,4) + 0.005

data = unumpy.uarray(value, err)

df = pd.DataFrame(data, index=['r1', 'r2', 'r3'], columns=['c1', 'c2', 'c3', 'c4'])

Examples:

print (df)
               c1             c2             c3             c4
r1  0.000+/-0.005  1.000+/-0.015  2.000+/-0.025  3.000+/-0.035
r2    4.00+/-0.04    5.00+/-0.06    6.00+/-0.07    7.00+/-0.08
r3    8.00+/-0.09    9.00+/-0.10   10.00+/-0.11   11.00+/-0.12

df.sum(axis=0) # This works

c1    12.00+/-0.10
c2    15.00+/-0.11
c3    18.00+/-0.13
c4    21.00+/-0.14
dtype: object

df.sum(axis=1) # This works

r1     6.00+/-0.05
r2    22.00+/-0.12
r3    38.00+/-0.20
dtype: object

df.mean(axis=0) # This does not work

Series([], dtype: float64)

Expected (`df.apply(lambda x: x.sum() / x.size)`)

c1    4.000+/-0.032
c2      5.00+/-0.04
c3      6.00+/-0.04
c4      7.00+/-0.05
dtype: object

df.mean(axis=1) # This does not work

r1   NaN
r2   NaN
r3   NaN
dtype: float64

Expected (`df.T.apply(lambda x: x.sum() / x.size)`)

r1    1.500+/-0.011
r2    5.500+/-0.031
r3      9.50+/-0.05
dtype: object

The text was updated successfully, but these errors were encountered:

jreback · 2016-09-06T10:36:09Z

this is very much like #13446 . Since pandas doesn't know that an uncertainity is numeric it cannot deal with it, similar to Decimal.

Without a custom dtype, or special support baked into object dtypes, this is not supported.

If someone wanted to contribute this functionaility then that would be great. Conceptually this is very easy, but there are lots of implementation details.

lebigot · 2016-09-06T14:51:47Z

@jreback Do I understand correctly that there is nothing that the uncertainties module can do to solve this issue?

jreback · 2016-09-06T15:01:20Z

I have no idea
if u want t dig in and see would be great

shoyer · 2016-09-06T16:21:04Z

A useful first step would be to see if you can reproduce the issue with numpy alone (not using pandas).

bgatessucks · 2016-09-06T18:50:13Z

@shoyer No issue with numpy alone:

import pandas as pd
import numpy as np
from uncertainties import unumpy

value = np.arange(12).reshape(3,4)
err = 0.01 * np.arange(12).reshape(3,4) + 0.005

data = unumpy.uarray(value, err)

df = pd.DataFrame(data, index=['r1', 'r2', 'r3'], columns=['c1', 'c2', 'c3', 'c4'])

print (df.apply(lambda x: x.sum() / x.size).values), "\n"

print (data.mean(axis=0)), "\n"

print (df.T.apply(lambda x: x.sum() / x.size).values), "\n"

print (data.mean(axis=1))

shoyer · 2016-09-06T18:55:09Z

@bgatessucks what is the type/dtype of unumpy.uarray? Is it a numpy array with dtype=object?

bgatessucks · 2016-09-06T18:59:41Z

@shoyer

type(data) is <type 'numpy.ndarray'>.

shoyer · 2016-09-06T19:00:42Z

And data.dtype?

shoyer · 2016-09-06T19:04:36Z

I just wanted to be sure that you're not using subclassing or something else like that.

In any case, I think this is probably a pandas bug (but would need someone to work through/figure out). We should have a fallback implementation of mean (like NumPy's mean) that works on object arrays.

bgatessucks · 2016-09-06T19:07:10Z

@shoyer Sorry I had missed that:

data.dtype is object.

rth · 2016-09-07T14:45:26Z

For what it's worth, the same example as above works with a DataFrame initialized with a numpy array of dtype='object' containing floats.

import pandas as pd
import numpy as np
from IPython.display import display

data = np.arange(12).reshape(3,4).astype('object')

df = pd.DataFrame(data, index=['r1', 'r2', 'r3'],
                 columns=['c1', 'c2', 'c3', 'c4'], dtype='object')

display(df.sum(axis=0))
display(df.sum(axis=1))
display(df.mean(axis=0))
display(df.mean(axis=1))

so I guess that pandas is able to correctly infer in this case that an array of dtype="object" contains numbers (floats) unlike with the array containing ufloat elements from the uncertainties package.

lebigot · 2016-09-07T15:36:39Z

Seen from the outside, it looks like in both cases Pandas decrees that the result of mean() should be of type float64: in @rth's example above the NumPy array actually contains integers, that are converted to float64 (which is doable); in the case of uncertainties.UFloat numbers with uncertainty, forcing the result to float64 is mostly meaningless (as this would get rid of the uncertainty) and mean() does not produce the expected result.

In contrast, as the original post shows, Pandas is more open on the data type of sum(), which is, correctly, object, for uncertainties.UFloat objects.

I think that it is desirable that since Pandas is able to sum(), it be able to get the mean() too (since the mean is not much more than a sum).

chicolucio · 2020-02-28T20:36:13Z

Is there any news on this subject? Same problem here, with pandas version 1.0.1.

marcus-r-kelly · 2020-05-18T17:44:11Z

I have the same issue with pandas version 1.0.3

MichaelTiemannOSC · 2022-10-17T08:48:40Z

Was this removed from the Someday milestone because it's more definitive than that now? I've just done a bunch of work to make uncertainties work with Pint and Pint-Pandas, and am seeing that some work needs to be done in Pandas as well. Just taking the temperature on how open that door might be.

hgrecco/pint#1615
hgrecco/pint-pandas#140

jbrockmendel · 2023-04-22T14:34:02Z

Was this removed from the Someday milestone because it's more definitive than that now

We stopped using the "Someday" label entirely.

I'm getting the same behavior on main as in the OP. Looks like the data is an object-type np.ndarray. As jreback said in 2016, this would need some special handling (probably in core.nanops). A PR would be welcome.

Something like pint-pandas would probably be a better user experience than an object-dtype.

jbrockmendel · 2023-06-13T15:22:09Z

@topper-123 this might be closed by your reduce_wrap PR?

topper-123 · 2023-06-19T06:32:35Z

Sorry for the slow reply, I had a big project before going on a family vacation (which will last until the end of this week). but yes, #52788 will allow extension arrays like pint-pandas to use _reduce_wrap to control the dtype of reduction results.

mroeschke · 2023-07-13T16:41:01Z

Closed by #52788

jreback added Dtype Conversions Unexpected or buggy dtype conversions Difficulty Advanced labels Sep 6, 2016

jreback added this to the Someday milestone Sep 6, 2016

jreback added the Enhancement label Sep 6, 2016

bgatessucks mentioned this issue Sep 6, 2016

Support for mean in pandas DataFrame of ufloat lmfit/uncertainties#58

Closed

jbrockmendel removed Difficulty Advanced labels Oct 21, 2019

jbrockmendel added Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc. Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Sep 20, 2020

mroeschke added ExtensionArray Extending pandas with custom dtypes or arrays. and removed Dtype Conversions Unexpected or buggy dtype conversions Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels May 1, 2021

mroeschke removed this from the Someday milestone Oct 13, 2022

topper-123 mentioned this issue Jun 19, 2023

ENH: better dtype inference when doing DataFrame reductions #52788

Merged

1 task

mroeschke closed this as completed Jul 13, 2023

MichaelTiemannOSC mentioned this issue Nov 14, 2023

Add support for UFloat in PintArray (#139) hgrecco/pint-pandas#140

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. #14162

Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. #14162

bgatessucks commented Sep 6, 2016

jreback commented Sep 6, 2016

lebigot commented Sep 6, 2016

jreback commented Sep 6, 2016

shoyer commented Sep 6, 2016

bgatessucks commented Sep 6, 2016

shoyer commented Sep 6, 2016

bgatessucks commented Sep 6, 2016

shoyer commented Sep 6, 2016

shoyer commented Sep 6, 2016

bgatessucks commented Sep 6, 2016

rth commented Sep 7, 2016 •

edited

Loading

lebigot commented Sep 7, 2016 •

edited

Loading

chicolucio commented Feb 28, 2020

marcus-r-kelly commented May 18, 2020

MichaelTiemannOSC commented Oct 17, 2022

jbrockmendel commented Apr 22, 2023

jbrockmendel commented Jun 13, 2023

topper-123 commented Jun 19, 2023

mroeschke commented Jul 13, 2023

Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. #14162

Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. #14162

Comments

bgatessucks commented Sep 6, 2016

jreback commented Sep 6, 2016

lebigot commented Sep 6, 2016

jreback commented Sep 6, 2016

shoyer commented Sep 6, 2016

bgatessucks commented Sep 6, 2016

shoyer commented Sep 6, 2016

bgatessucks commented Sep 6, 2016

shoyer commented Sep 6, 2016

shoyer commented Sep 6, 2016

bgatessucks commented Sep 6, 2016

rth commented Sep 7, 2016 • edited Loading

lebigot commented Sep 7, 2016 • edited Loading

chicolucio commented Feb 28, 2020

marcus-r-kelly commented May 18, 2020

MichaelTiemannOSC commented Oct 17, 2022

jbrockmendel commented Apr 22, 2023

jbrockmendel commented Jun 13, 2023

topper-123 commented Jun 19, 2023

mroeschke commented Jul 13, 2023

rth commented Sep 7, 2016 •

edited

Loading

lebigot commented Sep 7, 2016 •

edited

Loading