DataArray.rolling().mean() is way slower than it should be #1993

shoyer · 2018-03-15T20:10:22Z

Code Sample, a copy-pastable example if possible

From @RayPalmerTech in pydata/bottleneck#186:

import numpy as np
import pandas as pd
import time
import bottleneck as bn
import xarray
import matplotlib.pyplot as plt

N = 30000200 # Number of datapoints
Fs = 30000 # sample rate
T=1/Fs # sample period
duration = N/Fs # duration in s
t = np.arange(0,duration,T) # time vector
DATA = np.random.randn(N,)+5*np.sin(2*np.pi*0.01*t) # Example noisy sine data and window size
w = 330000

def using_bottleneck_mean(data,width):
  return bn.move_mean(a=data,window=width,min_count = 1)

def using_pandas_rolling_mean(data,width):
  return np.asarray(pd.DataFrame(data).rolling(window=width,center=True,min_periods=1).mean()).ravel()

def using_xarray_mean(data,width):
  return xarray.DataArray(data,dims='x').rolling(x=width,min_periods=1, center=True).mean()

start=time.time()
A = using_bottleneck_mean(DATA,w)
print('Bottleneck: ', time.time()-start, 's')
start=time.time()
B = using_pandas_rolling_mean(DATA,w)
print('Pandas: ',time.time()-start,'s')
start=time.time()
C = using_xarray_mean(DATA,w)
print('Xarray: ',time.time()-start,'s')

This results in:

Bottleneck:  0.0867006778717041 s
Pandas:  0.563546895980835 s
Xarray:  25.133142709732056 s

Somehow xarray is way slower than pandas and bottleneck, even though it's using bottleneck under the hood!

Problem description

Profiling shows that the majority of time is spent in xarray.core.rolling.DataArrayRolling._setup_windows. Monkey-patching that method with a dummy rectifies the issue:

xarray.core.rolling.DataArrayRolling._setup_windows = lambda *args: None

Now we obtain:

Bottleneck:  0.06775331497192383 s
Pandas:  0.48262882232666016 s
Xarray:  0.1723031997680664 s

The solution is to make setting up windows done lazily (in __iter__), instead of doing it in the constructor.

Output of `xr.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.96+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 0.19.1
netCDF4: None
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: 0.7.1
setuptools: 36.2.7
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None

The text was updated successfully, but these errors were encountered:

max-sixty · 2018-03-15T22:09:47Z

Quick discovery - it looks like it's spending all the time creating 9m slice objects here:

        self.window_indices = [slice(start, stop)
                               for start, stop in zip(starts, stops)]

jhamman · 2018-03-15T22:15:14Z

+1 from me on making the window creation lazy. (I don't know who wrote this code but woof!)

max-sixty · 2018-03-15T22:17:27Z

I don't think we use anything created in _setup_windows() until we call __iter__, which IIUC is only called on non-vectorized methods.

We could:

Combine _setup_windows and __iter__: only runs the code when you need it, but would run each time __iter__ was called
Combine _setup_windows and __iter__ and cache the result: more code but only need to run the setup code once
Convert it to a generator: lower up-front cost, but more engineering and doesn't make it cheaper assuming you consumed the whole iterator anyway

max-sixty · 2018-03-15T22:18:52Z

(I don't know who wrote this code but woof!)

🙈

Edit: Whoever wrote that code is a hero for contributing so much code - the likeliest sources of bad code are the most prolific and valuable contributors

fujiisoup · 2018-03-15T23:17:37Z

I had to improve this in #1837, but I did not notice that just creating slices takes so long >_<
Sent a PR.

shoyer mentioned this issue Mar 15, 2018

Possible to implement a Centered moving average? pydata/bottleneck#186

Open

shoyer added the topic-performance label Mar 15, 2018

fujiisoup mentioned this issue Mar 15, 2018

Make constructing slices lazily. #1994

Merged

3 tasks

fujiisoup closed this as completed in #1994 Mar 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataArray.rolling().mean() is way slower than it should be #1993

DataArray.rolling().mean() is way slower than it should be #1993

shoyer commented Mar 15, 2018

max-sixty commented Mar 15, 2018

jhamman commented Mar 15, 2018

max-sixty commented Mar 15, 2018

max-sixty commented Mar 15, 2018 •

edited

Loading

fujiisoup commented Mar 15, 2018

DataArray.rolling().mean() is way slower than it should be #1993

DataArray.rolling().mean() is way slower than it should be #1993

Comments

shoyer commented Mar 15, 2018

Code Sample, a copy-pastable example if possible

Problem description

Output of xr.show_versions()

max-sixty commented Mar 15, 2018

jhamman commented Mar 15, 2018

max-sixty commented Mar 15, 2018

max-sixty commented Mar 15, 2018 • edited Loading

fujiisoup commented Mar 15, 2018

Output of `xr.show_versions()`

max-sixty commented Mar 15, 2018 •

edited

Loading