Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling.mean vs rolling.construct.mean #2010

Closed
mathause opened this issue Mar 23, 2018 · 8 comments
Closed

rolling.mean vs rolling.construct.mean #2010

mathause opened this issue Mar 23, 2018 · 8 comments

Comments

@mathause
Copy link
Collaborator

Code Sample, a copy-pastable example if possible

import xarray as xr
import numpy as np
x = np.arange(4.)
ds = xr.DataArray(x, dims=('dim', ))
ds.rolling(dim=3, center=True).mean()
# RESULT: array([nan,  1.,  2., nan])
ds.rolling(dim=3, center=True).construct('window').mean('window')
# RESULT: array([0.5, 1. , 2. , 2.5])

Problem description

ds.rolling(...).mean() and ds.rolling(...).construct().mean() yields different results. Because mean does skipna=True per default.

Expected Output

I would expect both ways to yield the same result.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.114-42-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

xarray: 0.10.2+dev6.g9261601
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.17.1
distributed: 1.21.3
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.2
conda: None
pytest: 3.4.2
IPython: 6.2.1
sphinx: None

@fujiisoup
Copy link
Member

Hi, @mathause. Thanks for reporting.

I agree that this behavior is a little surprising, but this is something expected.
The difference is not skipna=True (both use this), but actually in min_periods.
If you specify min_periods=1, then the results become identical.

In [7]: ds.rolling(dim=3, center=True, min_periods=1).mean()
Out[7]: 
<xarray.DataArray (dim: 4)>
array([0.5, 1. , 2. , 2.5])
Dimensions without coordinates: dim

I am not sure how we can make both the behaviors the same with keeping the generality of construct method.

@mathause mathause mentioned this issue Mar 23, 2018
4 tasks
@fujiisoup
Copy link
Member

I think we need to document more clearly that min_periods option is just neglected if using construct method.

@mathause
Copy link
Collaborator Author

Ok, so ds.rolling.mean allows for min_periods while ds.rolling.construct.mean allows for fill_value?

On the other hand, using skipna=False resores the default behaviour or ds.rolling

ds.rolling(dim=3, center=True).construct('window').mean('window', skipna=False)
# RESULT: array([nan,  1.,  2., nan])

Actually I'm sure that min_periods and fill_value are not the only differences between the two - but that's how far my understanding got at the moment.

@fujiisoup
Copy link
Member

Ok, so ds.rolling.mean allows for min_periods while ds.rolling.construct.mean allows for fill_value?

Yes. Originally, construct method is designed for more advanced use cases, such as strided moving average, weighted mean, short time FFT, etc.

On the other hand, using skipna=False resores the default behaviour or ds.rolling

It is not strictly true. min_periods also considers nan values that already exist in the original array.
This behavior is the same with pandas rolling.

@max-sixty
Copy link
Collaborator

I think we need to document more clearly that min_periods option is just neglected if using construct method.

I also think we could label construct as advanced / experimental. It's extremely cool, but for traditional cases, it's probably more cool than required, and less well tested than the standard methods.

At the moment we say "You can use this for more advanced rolling operations, such as strided rolling, windowed rolling, convolution, short-time FFT, etc.";
I'm imagining adding something like "For standard operations above, there is no need to use construct". Or adding an Experimental marker

@shoyer
Copy link
Member

shoyer commented Mar 26, 2018

Would it make sense to support min_periods in construct by masking the result? That would arguably be consistent with how we do iteration over rolling objects, though to be honest I'm not sure how useful it really is.

@fujiisoup
Copy link
Member

I think that masking the construct-ed array would make the full copy of this, while originally construct returns a view of the original array.
Another option would be to support an additional method (get_mask?), but as this can be realized quite easily (rolling.count() > rolling.min_periods) I am not sure how worth it is.

@stale
Copy link

stale bot commented Feb 24, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Feb 24, 2020
@stale stale bot closed this as completed Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants