Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray.rolling().mean() is way slower than it should be #1993

Closed
shoyer opened this issue Mar 15, 2018 · 5 comments · Fixed by #1994
Closed

DataArray.rolling().mean() is way slower than it should be #1993

shoyer opened this issue Mar 15, 2018 · 5 comments · Fixed by #1994

Comments

@shoyer
Copy link
Member

shoyer commented Mar 15, 2018

Code Sample, a copy-pastable example if possible

From @RayPalmerTech in pydata/bottleneck#186:

import numpy as np
import pandas as pd
import time
import bottleneck as bn
import xarray
import matplotlib.pyplot as plt

N = 30000200 # Number of datapoints
Fs = 30000 # sample rate
T=1/Fs # sample period
duration = N/Fs # duration in s
t = np.arange(0,duration,T) # time vector
DATA = np.random.randn(N,)+5*np.sin(2*np.pi*0.01*t) # Example noisy sine data and window size
w = 330000

def using_bottleneck_mean(data,width):
  return bn.move_mean(a=data,window=width,min_count = 1)

def using_pandas_rolling_mean(data,width):
  return np.asarray(pd.DataFrame(data).rolling(window=width,center=True,min_periods=1).mean()).ravel()

def using_xarray_mean(data,width):
  return xarray.DataArray(data,dims='x').rolling(x=width,min_periods=1, center=True).mean()

start=time.time()
A = using_bottleneck_mean(DATA,w)
print('Bottleneck: ', time.time()-start, 's')
start=time.time()
B = using_pandas_rolling_mean(DATA,w)
print('Pandas: ',time.time()-start,'s')
start=time.time()
C = using_xarray_mean(DATA,w)
print('Xarray: ',time.time()-start,'s')

This results in:

Bottleneck:  0.0867006778717041 s
Pandas:  0.563546895980835 s
Xarray:  25.133142709732056 s

Somehow xarray is way slower than pandas and bottleneck, even though it's using bottleneck under the hood!

Problem description

Profiling shows that the majority of time is spent in xarray.core.rolling.DataArrayRolling._setup_windows. Monkey-patching that method with a dummy rectifies the issue:

xarray.core.rolling.DataArrayRolling._setup_windows = lambda *args: None

Now we obtain:

Bottleneck:  0.06775331497192383 s
Pandas:  0.48262882232666016 s
Xarray:  0.1723031997680664 s

The solution is to make setting up windows done lazily (in __iter__), instead of doing it in the constructor.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.96+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 0.19.1
netCDF4: None
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: 0.7.1
setuptools: 36.2.7
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None

@max-sixty
Copy link
Collaborator

Quick discovery - it looks like it's spending all the time creating 9m slice objects here:

        self.window_indices = [slice(start, stop)
                               for start, stop in zip(starts, stops)]

@jhamman
Copy link
Member

jhamman commented Mar 15, 2018

+1 from me on making the window creation lazy. (I don't know who wrote this code but woof!)

@max-sixty
Copy link
Collaborator

I don't think we use anything created in _setup_windows() until we call __iter__, which IIUC is only called on non-vectorized methods.

We could:

  • Combine _setup_windows and __iter__: only runs the code when you need it, but would run each time __iter__ was called
  • Combine _setup_windows and __iter__ and cache the result: more code but only need to run the setup code once
  • Convert it to a generator: lower up-front cost, but more engineering and doesn't make it cheaper assuming you consumed the whole iterator anyway

@max-sixty
Copy link
Collaborator

max-sixty commented Mar 15, 2018

(I don't know who wrote this code but woof!)

🙈

Edit: Whoever wrote that code is a hero for contributing so much code - the likeliest sources of bad code are the most prolific and valuable contributors

@fujiisoup
Copy link
Member

I had to improve this in #1837, but I did not notice that just creating slices takes so long >_<
Sent a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants