Efficient rolling 'trick' #1978

max-sixty · 2018-03-10T00:29:33Z

Based off http://www.rigtorp.se/2011/01/01/rolling-statistics-numpy.html, we wrote up a function that 'tricks' numpy into presenting an array that looks rolling, but without the O^2 memory requirements

Would people be interested in this going into xarray?

It seems to work really well on a few use-cases, but I imagine it's enough trickery that we might not want to support it in xarray.
And, to be clear, it's strictly worse where we have rolling algos. But where we don't, you get a rolling apply without the python loops.

def rolling_window_numpy(a, window):
    """ Make an array appear to be rolling, but using only a view
    http://www.rigtorp.se/2011/01/01/rolling-statistics-numpy.html
    """
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)


def rolling_window(da, span, dim=None, new_dim='dim_0'):
    """ Adds a rolling dimension to a DataArray using only a view """
    original_dims = da.dims
    da = da.transpose(*tuple(d for d in da.dims if d != dim) + (dim,))

    result = apply_ufunc(
        rolling_window_numpy,
        da,
        output_core_dims=((new_dim,),),
        kwargs=(dict(window=span)))

    return result.transpose(*(original_dims + (new_dim,)))

# tests

import numpy as np
import pandas as pd
import pytest
import xarray as xr


@pytest.fixture
def da(dims):
    return xr.DataArray(
        np.random.rand(5, 10, 15), dims=(list('abc'))).transpose(*dims)


@pytest.fixture(params=[
    list('abc'),
    list('bac'),
    list('cab'),
])
def dims(request):
    return request.param


def test_iterate_imputation_fills_missing(sample_data):
    sample_data.iloc[2, 2] = pd.np.nan
    result = iterate_imputation(sample_data)
    assert result.shape == sample_data.shape
    assert result.notnull().values.all()


def test_rolling_window(da, dims):

    result = rolling_window(da, 3, dim='c', new_dim='x')

    assert result.transpose(*list('abcx')).shape == (5, 10, 13, 3)

    # should be a view, so doesn't have any larger strides
    assert np.max(result.values.strides) == 10 * 15 * 8


def test_rolling_window_values():

    da = xr.DataArray(np.arange(12).reshape(2, 6), dims=('item', 'date'))

    rolling = rolling_window(da, 3, dim='date', new_dim='rolling_date')

    expected = sum([11, 10, 9])
    result = rolling.sum('rolling_date').isel(item=1, date=-1)
    assert result == expected

The text was updated successfully, but these errors were encountered:

jhamman · 2018-03-10T00:45:03Z

I'll let @fujiisoup / @shoyer confirm but this looks eerily similar to #1837.

shoyer · 2018-03-10T00:51:42Z

This is exactly what @fujiisoup just implemented in #1837.

It's a nice trick. If the window is large, it's not quite as fast using a clever algorithm for rolling sums like bottleneck, but it still gives something like a 100x performance boost over the naive loop we used to use.

fujiisoup · 2018-03-10T01:18:11Z

Yes.

In [7]: da.rolling(date=3).construct('rolling_date')
Out[7]: 
<xarray.DataArray (item: 2, date: 6, rolling_date: 3)>
array([[[nan, nan,  0.],
        [nan,  0.,  1.],
        [ 0.,  1.,  2.],
        [ 1.,  2.,  3.],
        [ 2.,  3.,  4.],
        [ 3.,  4.,  5.]],

       [[nan, nan,  6.],
        [nan,  6.,  7.],
        [ 6.,  7.,  8.],
        [ 7.,  8.,  9.],
        [ 8.,  9., 10.],
        [ 9., 10., 11.]]])
Dimensions without coordinates: item, date, rolling_date

does the similar thing (rolling_window in your example).

FYI, using sum without skipna option for such a strided DataArray (in your test_rolling_window_values) is not a good idea.
We internally use np.nansum and this copies the entire array once. It throws away the advantage of the strided trick.
sum(skipna=False) is memory efficient.

max-sixty · 2018-03-10T01:23:06Z

this looks eerily similar to #1837

🤦‍♂️ I think my memory is getting worse every day! I glanced at this myself back in Jan.

Closing!

max-sixty closed this as completed Mar 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient rolling 'trick' #1978

Efficient rolling 'trick' #1978

max-sixty commented Mar 10, 2018

jhamman commented Mar 10, 2018

shoyer commented Mar 10, 2018

fujiisoup commented Mar 10, 2018

max-sixty commented Mar 10, 2018

Efficient rolling 'trick' #1978

Efficient rolling 'trick' #1978

Comments

max-sixty commented Mar 10, 2018

jhamman commented Mar 10, 2018

shoyer commented Mar 10, 2018

fujiisoup commented Mar 10, 2018

max-sixty commented Mar 10, 2018