-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataArray.rolling().mean() is way slower than it should be #1993
Comments
Quick discovery - it looks like it's spending all the time creating 9m self.window_indices = [slice(start, stop)
for start, stop in zip(starts, stops)] |
+1 from me on making the window creation lazy. (I don't know who wrote this code but woof!) |
I don't think we use anything created in We could:
|
🙈 Edit: Whoever wrote that code is a hero for contributing so much code - the likeliest sources of bad code are the most prolific and valuable contributors |
I had to improve this in #1837, but I did not notice that just creating slices takes so long >_< |
Code Sample, a copy-pastable example if possible
From @RayPalmerTech in pydata/bottleneck#186:
This results in:
Somehow xarray is way slower than pandas and bottleneck, even though it's using bottleneck under the hood!
Problem description
Profiling shows that the majority of time is spent in
xarray.core.rolling.DataArrayRolling._setup_windows
. Monkey-patching that method with a dummy rectifies the issue:Now we obtain:
The solution is to make setting up windows done lazily (in
__iter__
), instead of doing it in the constructor.Output of
xr.show_versions()
xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 0.19.1
netCDF4: None
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: 0.7.1
setuptools: 36.2.7
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None
The text was updated successfully, but these errors were encountered: