resample with numpy.product gives unexpected results #5586

ruidc · 2013-11-25T17:04:46Z

import datetime
import numpy
import pandas

index = pandas.DatetimeIndex(start=datetime.date(2012, 1, 31), freq='M', periods=12)

ts = pandas.Series(range(12), index=index)
df = pandas.DataFrame(dict(A=ts, B=ts+2))

print df

# Correct
print df.resample('M', how=numpy.sum)
# Incorrect
print df.resample('M', how=numpy.product)
# Correct
print df.resample('Q', how=numpy.sum)
# Incorrect
print df.resample('Q', how=numpy.product)

for print df.resample('Q', how=numpy.product), was expecting:

2012-03-31   2*1*0   4*3*2
2012-06-30   5*4*3   7*6*5
2012-09-30   8*7*6  10*9*8
2012-12-31  11*10*9  13*12*11

The text was updated successfully, but these errors were encountered:

jreback · 2013-11-25T17:15:43Z

I think this just needs a mapping like np.prod has (which works correcly). These numpy functions are mapped internally to cython functions.

dsm054 · 2014-06-21T18:33:36Z

I had just fixed this when I realized there are lots of other numpy functions which will behave in the same way, e.g.

>>> df.groupby("a").agg(np.max)
    b
a    
10  2
20  3
30  5

but

>>> df.groupby("a").agg(np.nanmax)
     b
a     
10  10
20  20
30  30

Is whitelisting them one by one really our best option?

jreback · 2014-06-21T18:37:31Z

the nan ones are somewhat newer

I think there are only a few more to add
not really sure of a better way l

maybe could try some sort of regex matching on the name of the function?
(and see what comes up and just map them)

FYI since for example nanmean doesn't exist in older numpies have to conditionally add them

dsm054 · 2014-06-21T18:55:25Z

Maybe we could go the other way, and blacklist the ones we haven't done. Admittedly that wouldn't catch .agg(lambda x: np.some_new_function(x)), but it would protect against .agg(np.some_new_function) if we hadn't handled it yet.

But couldn't we just catch anything else in numpy and pass it np.asarray(x), though?

First, check to see if it's been special cased. If so, use it.
Next, check to see if it's a numpy function we haven't whitelisted. If so, call it with np.asarray(x); more likely to give the expected answer.
If not, pass it x.

Vaguely like

>>> np_funcs = {f for n, f in inspect.getmembers(np) if callable(f)}
>>> f = np.product
>>> df.groupby("a").agg(np.prod)
     b
a     
10   2
20   3
30  20
>>> df.groupby("a").agg(np.product)
        b
a        
10    200
20     60
30  18000
>>> f = np.product
>>> df.groupby("a").agg(f if f not in np_funcs else (lambda x: f(np.asarray(x))))
     b
a     
10   2
20   3
30  20

jreback · 2014-06-21T19:06:21Z

yes I think that how could definitly be more strict in that you could evaluate on cases:

if its in our white list map to a pandas fast function and go
if its on black list raise
if its a lamba pass thru
if its a numpy function that is not the above, the use np.asarray (and maybe show a warning?)

dsm054 · 2014-06-21T19:29:12Z

I like the warning idea, though I'm not sure what the appropriate label would be: SemiImplementedWarning, maybe? :^)

SemiImplementedWarning: numpy function {} does not have a fast native pandas
equivalent yet and is not fully supported; coercing the groups to ndarrays

Whatever we do, I think the current approach of special-casing a few and letting the rest silently give incorrect results -- even though they're called in exactly the same way the ones which work do -- should be replaced with something noisier.

jreback · 2014-06-21T19:32:50Z

you can say that you should use lambda x: np.function_that_we_do_fast as this will pass thru and just let the warning happen (until we put an explicity put it on the whitelist with either a fast version or a simple passthru)

mroeschke · 2019-02-20T07:08:17Z

Looks fixed in master. Could use a test.

jbrockmendel · 2019-12-12T00:23:17Z

@mroeschke what would we be testing? the df.resample("M", np.product) usage from the OP is no longer supported

mroeschke · 2019-12-12T00:46:13Z

df.resample('M').apply(np.product) produces the correct result IIRC.

baevpetr · 2019-12-18T21:56:07Z

So are tests required here now? if so, could I try to implement them?

jbrockmendel · 2019-12-18T23:37:44Z

@baevpetr yes, that applies to any issue with the "Needs Tests" label

jreback modified the milestones: 0.15.0, 0.14.0 Apr 6, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

mroeschke added Needs Tests Unit test(s) needed to prevent regressions good first issue and removed API Design Bug labels Sep 29, 2019

mroeschke mentioned this issue Dec 24, 2019

TST: Add tests for fixed issues #30444

Merged

6 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Dec 26, 2019

jreback closed this as completed in #30444 Dec 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resample with numpy.product gives unexpected results #5586

resample with numpy.product gives unexpected results #5586

ruidc commented Nov 25, 2013

jreback commented Nov 25, 2013

dsm054 commented Jun 21, 2014

jreback commented Jun 21, 2014

dsm054 commented Jun 21, 2014

jreback commented Jun 21, 2014

dsm054 commented Jun 21, 2014

jreback commented Jun 21, 2014

mroeschke commented Feb 20, 2019

jbrockmendel commented Dec 12, 2019

mroeschke commented Dec 12, 2019

baevpetr commented Dec 18, 2019

jbrockmendel commented Dec 18, 2019 •

edited

Loading

resample with numpy.product gives unexpected results #5586

resample with numpy.product gives unexpected results #5586

Comments

ruidc commented Nov 25, 2013

jreback commented Nov 25, 2013

dsm054 commented Jun 21, 2014

jreback commented Jun 21, 2014

dsm054 commented Jun 21, 2014

jreback commented Jun 21, 2014

dsm054 commented Jun 21, 2014

jreback commented Jun 21, 2014

mroeschke commented Feb 20, 2019

jbrockmendel commented Dec 12, 2019

mroeschke commented Dec 12, 2019

baevpetr commented Dec 18, 2019

jbrockmendel commented Dec 18, 2019 • edited Loading

jbrockmendel commented Dec 18, 2019 •

edited

Loading