Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling window with as_strided #1837

Merged
merged 82 commits into from
Mar 1, 2018
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
789134c
Rolling_window for np.ndarray
fujiisoup Jan 16, 2018
fa4e857
Add pad method to Variable
fujiisoup Jan 17, 2018
52915f3
Added rolling_window to DataArray and Dataset
fujiisoup Jan 17, 2018
b622007
remove pad_value option. Support dask.rolling_window
fujiisoup Jan 18, 2018
36a1fe9
Refactor rolling.reduce
fujiisoup Jan 18, 2018
71fed0f
add as_strided to npcompat. Tests added for reduce(np.nanmean)
fujiisoup Jan 18, 2018
3960134
Support boolean in maybe_promote
fujiisoup Jan 18, 2018
4bd38f3
move rolling_window into duck_array_op. Make DataArray.rolling_window…
fujiisoup Jan 19, 2018
af8362e
Added to_dataarray and to_dataset to rolling object.
fujiisoup Jan 19, 2018
76db6b5
Use pad in rolling to make compatible to pandas. Expose pad_with_fill…
fujiisoup Jan 20, 2018
87f53af
Refactor rolling
fujiisoup Jan 20, 2018
c23cedb
flake8
fujiisoup Jan 20, 2018
9547c57
Added a comment for dask's pad.
fujiisoup Jan 20, 2018
1f71cff
Use fastpath in rolling.to_dataarray
fujiisoup Jan 20, 2018
724776f
Merge branch 'master' into rolling_window
fujiisoup Jan 20, 2018
73862eb
Doc added.
fujiisoup Jan 20, 2018
859bb5c
Revert not to use fastpath
fujiisoup Jan 20, 2018
d5fc24e
Merge branch 'master' into rolling_window
fujiisoup Jan 21, 2018
05c72f0
Remove maybe_prompt for Boolean. Some improvements based on @shoyer's…
fujiisoup Jan 21, 2018
d55e498
Update test.
fujiisoup Jan 21, 2018
9393eb2
Bug fix in test_rolling_count_correct
fujiisoup Jan 21, 2018
9c71a50
fill_value for boolean array
fujiisoup Jan 21, 2018
54975b4
rolling_window(array, axis, window) -> rolling_window(array, window, …
fujiisoup Jan 21, 2018
e907fdf
support stride in rolling.to_dataarray
fujiisoup Jan 21, 2018
6482536
flake8
fujiisoup Jan 21, 2018
b8def4f
Improve doc. Add DataArrayRolling to api.rst
fujiisoup Jan 21, 2018
ff31589
Improve docs in common.rolling.
fujiisoup Jan 21, 2018
6c011cb
Expose groupby docs to public
fujiisoup Jan 21, 2018
684145a
Default fill_value=dtypes.NA, stride=1. Add comment for DataArrayRollig.
fujiisoup Jan 21, 2018
3a7526e
Default fill_value=dtypes.NA, stride=1. Add comment for DataArrayRollig.
fujiisoup Jan 21, 2018
a0968d6
Add fill_value option to rolling.to_dataarray
fujiisoup Jan 22, 2018
ac4f00e
Convert non-numeric array in reduce.
fujiisoup Jan 22, 2018
fbfc262
Fill_value = False for boolean array in rolling.reduce
fujiisoup Jan 22, 2018
c757986
Support old numpy plus bottleneck combination. Suppress warning for a…
fujiisoup Jan 22, 2018
8fd5fa3
flake8
fujiisoup Jan 22, 2018
ade5ba2
Add benchmark
fujiisoup Jan 22, 2018
2d6897f
Dataset.count. Benchmark
fujiisoup Jan 23, 2018
6461f84
Classize benchmark
fujiisoup Jan 23, 2018
aece1c4
Decoratorize for asv benchmark
fujiisoup Jan 24, 2018
d5ad4a0
Merge branch 'master' into rolling_window
fujiisoup Jan 24, 2018
4189d71
Classize benchmarks/indexing.py
fujiisoup Jan 24, 2018
081c928
Working with nanreduce
fujiisoup Jan 27, 2018
75c1d7d
Support .sum for object dtype.
fujiisoup Jan 30, 2018
452b219
Remove unused if-statements.
fujiisoup Jan 30, 2018
c5490c4
Default skipna for rolling.reduce
fujiisoup Jan 30, 2018
ab91394
Pass tests. Test added to make sure the consistency to pandas' behavior.
fujiisoup Jan 30, 2018
9fa0812
Delete duplicate file. flake8
fujiisoup Jan 30, 2018
0c1d49a
flake8 again
fujiisoup Jan 30, 2018
9463937
Working with numpy<1.13
fujiisoup Jan 30, 2018
dce4e37
Revert "Classize benchmarks/indexing.py"
fujiisoup Feb 10, 2018
b3050cb
rolling_window with dask.ghost
fujiisoup Feb 10, 2018
22f6d4a
Merge branch 'rolling_window_dask' into rolling_window
fujiisoup Feb 10, 2018
19e0fca
Merge branch 'master' into rolling_window
fujiisoup Feb 15, 2018
d3b1e2b
Optimize rolling.count.
fujiisoup Feb 15, 2018
2d06ec9
Merge branch 'master' into rolling_window
fujiisoup Feb 15, 2018
734da93
Fixing style errors.
stickler-ci Feb 15, 2018
1a000b8
Remove unused npcompat.nansum etc
fujiisoup Feb 15, 2018
27ff67c
flake8
fujiisoup Feb 16, 2018
a2c7141
require_dask -> has_dask
fujiisoup Feb 16, 2018
35dee9d
npcompat -> np
fujiisoup Feb 16, 2018
137709f
flake8
fujiisoup Feb 16, 2018
cc82cdc
Skip tests for old numpy.
fujiisoup Feb 16, 2018
b246411
Improve doc. Optmize missing._get_valid_fill_mask
fujiisoup Feb 17, 2018
b3a2105
to_dataarray -> construct
fujiisoup Feb 18, 2018
b80fbfd
remove assert_allclose_with_nan
fujiisoup Feb 18, 2018
3c010ae
Fixing style errors.
stickler-ci Feb 18, 2018
ab82f75
typo
fujiisoup Feb 18, 2018
b9f10cd
`to_dataset` -> `construct`
fujiisoup Feb 18, 2018
cc9c3d6
Update doc
fujiisoup Feb 18, 2018
52cc48d
Merge branch 'master' into rolling_window
fujiisoup Feb 18, 2018
2954cdf
Change boundary and add comments for dask_rolling_window.
fujiisoup Feb 18, 2018
f19e531
Refactor dask_array_ops.rolling_window and np_utils.rolling_window
fujiisoup Feb 24, 2018
a074df3
flake8
fujiisoup Feb 24, 2018
f6f78a5
Simplify tests
fujiisoup Feb 24, 2018
0ec8aba
flake8 again.
fujiisoup Feb 25, 2018
0261cfe
cleanup roling_window for dask.
fujiisoup Feb 25, 2018
a91c27f
Merge branch 'master' into rolling_window
fujiisoup Feb 26, 2018
c83d588
remove duplicates
fujiisoup Feb 26, 2018
3bb4668
remvove duplicate
fujiisoup Feb 26, 2018
d0d89ce
flake8
fujiisoup Feb 26, 2018
eaba563
delete unnecessary file.
fujiisoup Feb 26, 2018
aeabdf5
Merge branch 'master' into rolling_window
fujiisoup Feb 28, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions xarray/core/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ def maybe_promote(dtype):
fill_value = np.datetime64('NaT')
elif np.issubdtype(dtype, np.timedelta64):
fill_value = np.timedelta64('NaT')
elif dtype.kind == 'b':
fill_value = False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is convenient for me, but it is not very clear whether False is equivalent to nan for boolean arrays.
If anyone has objections, I will consider different approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, let's consider other options here. This is used for the default value when reindexing/aligning.

else:
dtype = object
fill_value = np.nan
Expand Down
22 changes: 22 additions & 0 deletions xarray/core/duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import pandas as pd

from . import npcompat
from . import nputils
from . import dtypes
from .pycompat import dask_array_type
from .nputils import nanfirst, nanlast
Expand Down Expand Up @@ -263,3 +264,24 @@ def last(values, axis, skipna=None):
_fail_on_dask_array_input_skipna(values)
return nanlast(values, axis)
return take(values, -1, axis=axis)


def rolling_window(array, axis, window):
"""
Make an ndarray with a rolling window of axis-th dimension.
The rolling dimension will be placed at the last dimension.
"""
if isinstance(array, dask_array_type):
if window < 1:
raise ValueError(
"`window` must be at least 1. Given : {}".format(window))
if window > array.shape[axis]:
raise ValueError("`window` is too long. Given : {}".format(window))

axis = nputils._validate_axis(array, axis)
size = array.shape[axis] - window + 1
arrays = [array[(slice(None), ) * axis + (slice(w, size + w), )]
for w in range(window)]
return da.stack(arrays, axis=-1)
else: # np.ndarray
return nputils.rolling_window(array, axis, window)
12 changes: 12 additions & 0 deletions xarray/core/npcompat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@
from __future__ import division
from __future__ import print_function
import numpy as np
from distutils.version import LooseVersion


if LooseVersion(np.__version__) < LooseVersion('1.12'):
def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
array = np.lib.stride_tricks.as_strided(x, shape, strides, subok)
array.setflags(write=writeable)
return array

else:
as_strided = np.lib.stride_tricks.as_strided


try:
from numpy import nancumsum, nancumprod, flip
Expand Down
50 changes: 50 additions & 0 deletions xarray/core/nputils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import numpy as np
import pandas as pd
import warnings
from . import npcompat


def _validate_axis(data, axis):
Expand Down Expand Up @@ -133,3 +134,52 @@ def __setitem__(self, key, value):
mixed_positions, vindex_positions = _advanced_indexer_subspaces(key)
self._array[key] = np.moveaxis(value, vindex_positions,
mixed_positions)


def rolling_window(a, axis, window):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small point, but can you swap the arguments for this function? That would let you set a default axis.

Bottleneck uses default arguments like move_sum(array, window, axis=-1) which I think is a nice convention:
https://kwgoodman.github.io/bottleneck-doc/reference.html#moving-window-functions

"""
Make an ndarray with a rolling window along axis.

Parameters
----------
a : array_like
Array to add rolling window to
axis: int
axis position along which rolling window will be applied.
window : int
Size of rolling window

Returns
-------
Array that is a view of the original array with a added dimension
of size w.

Examples
--------
>>> x=np.arange(10).reshape((2,5))
>>> np.rolling_window(x, 3, axis=-1)
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
[[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

Calculate rolling mean of last dimension:
>>> np.mean(np.rolling_window(x, 3, axis=-1), -1)
array([[ 1., 2., 3.],
[ 6., 7., 8.]])

This function is taken from https://github.com/numpy/numpy/pull/31
but slightly modified to accept axis option.
"""
axis = _validate_axis(a, axis)
a = np.swapaxes(a, axis, -1)

if window < 1:
raise ValueError(
"`window` must be at least 1. Given : {}".format(window))
if window > a.shape[-1]:
raise ValueError("`window` is too long. Given : {}".format(window))

shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
rolling = npcompat.as_strided(a, shape=shape, strides=strides,
writeable=False)
return np.swapaxes(rolling, -2, axis)
145 changes: 89 additions & 56 deletions xarray/core/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@
from distutils.version import LooseVersion

from .pycompat import OrderedDict, zip, dask_array_type
from .common import full_like
from .combine import concat
from .ops import (inject_bottleneck_rolling_methods,
inject_datasetrolling_methods, has_bottleneck, bn)
from .dask_array_ops import dask_rolling_wrapper
Expand Down Expand Up @@ -127,62 +125,75 @@ class DataArrayRolling(Rolling):
def __init__(self, obj, min_periods=None, center=False, **windows):
super(DataArrayRolling, self).__init__(obj, min_periods=min_periods,
center=center, **windows)
self._windows = None
self._valid_windows = None
self.window_indices = None
self.window_labels = None

self._setup_windows()

@property
def windows(self):
if self._windows is None:
self._windows = OrderedDict(zip(self.window_labels,
self.window_indices))
return self._windows

def __iter__(self):
for (label, indices, valid) in zip(self.window_labels,
self.window_indices,
self._valid_windows):

for (label, indices) in zip(self.window_labels, self.window_indices):
window = self.obj.isel(**{self.dim: indices})

if not valid:
window = full_like(window, fill_value=True, dtype=bool)
counts = window.count(dim=self.dim)
window = window.where(counts >= self._min_periods)

yield (label, window)

def _setup_windows(self):
"""
Find the indices and labels for each window
"""
from .dataarray import DataArray

self.window_labels = self.obj[self.dim]

window = int(self.window)

dim_size = self.obj[self.dim].size

stops = np.arange(dim_size) + 1
starts = np.maximum(stops - window, 0)

if self._min_periods > 1:
valid_windows = (stops - starts) >= self._min_periods
else:
# No invalid windows
valid_windows = np.ones(dim_size, dtype=bool)
self._valid_windows = DataArray(valid_windows, dims=(self.dim, ),
coords=self.obj[self.dim].coords)

self.window_indices = [slice(start, stop)
for start, stop in zip(starts, stops)]

def _center_result(self, result):
"""center result"""
shift = (-self.window // 2) + 1
return result.shift(**{self.dim: shift})
def to_dataarray(self, window_dim):
"""
Convert this rolling object to xr.DataArray,
where the window dimension is stacked as a new dimension

Parameters
----------
window_dim: str
New name of the window dimension.

Returns
-------
DataArray that is a view of the original array.

Note
----
The return array is not writeable.

Examples
--------
>>> da = DataArray(np.arange(8).reshape(2, 4), dims=('a', 'b'))

>>> da.rolling_window(x, 'b', 4, 'window_dim')
<xarray.DataArray (a: 2, b: 4, window_dim: 3)>
array([[[np.nan, np.nan, 0], [np.nan, 0, 1], [0, 1, 2], [1, 2, 3]],
[[np.nan, np.nan, 4], [np.nan, 4, 5], [4, 5, 6], [5, 6, 7]]])
Dimensions without coordinates: a, b, window_dim

>>> da.rolling_window(x, 'b', 4, 'window_dim', center=True)
<xarray.DataArray (a: 2, b: 4, window_dim: 3)>
array([[[np.nan, 0, 1], [0, 1, 2], [1, 2, 3], [2, 3, np.nan]],
[[np.nan, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, np.nan]]])
Dimensions without coordinates: a, b, window_dim
"""

from .dataarray import DataArray

window = self.obj.variable.rolling_window(self.dim, self.window,
window_dim, self.center)
return DataArray(window, dims=self.obj.dims + (window_dim,),
coords=self.obj.coords)

def reduce(self, func, **kwargs):
"""Reduce the items in this group by applying `func` along some
Expand All @@ -203,26 +214,18 @@ def reduce(self, func, **kwargs):
Array with summarized data.
"""

windows = [window.reduce(func, dim=self.dim, **kwargs)
for _, window in self]

# Find valid windows based on count
if self.dim in self.obj.coords:
concat_dim = self.window_labels
else:
concat_dim = self.dim
counts = concat([window.count(dim=self.dim) for _, window in self],
dim=concat_dim)
result = concat(windows, dim=concat_dim)
# restore dim order
result = result.transpose(*self.obj.dims)
windows = self.to_dataarray('_rolling_window_dim')
result = windows.reduce(func, dim='_rolling_window_dim', **kwargs)

# Find valid windows based on count.
# We do not use `reduced.count()` because it constructs a larger array
# (notice that `windows` is just a view)
counts = (~self.obj.isnull()).rolling(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For formatting long chains of method calls, I like to add extra parentheses and break every operation at the start of the line, e.g.,

counts = ((~self.obj.isnull())
          .rolling(center=self.center, **{self.dim: self.window})
          .to_dataarray('_rolling_window_dim')
          .sum(dim='_rolling_window_dim'))

I find this makes it easier to read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a short-cut here that doesn't bother to compute counts if the array's dtype cannot hold NaN? I think that would solve the issue with changing maybe_promote for booleans.

You could add a utility function to determine this based on whether the result of maybe_promote() has the same dtype as the input.

center=self.center, **{self.dim: self.window}).to_dataarray(
'_rolling_window_dim').sum(dim='_rolling_window_dim')
result = result.where(counts >= self._min_periods)

if self.center:
result = self._center_result(result)

return result
# restore dim order
return result.transpose(*self.obj.dims)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to restore dimension order any more. The result should already be calculated correctly.


@classmethod
def _reduce_method(cls, func):
Expand Down Expand Up @@ -254,19 +257,24 @@ def wrapped_func(self, **kwargs):

axis = self.obj.get_axis_num(self.dim)

if isinstance(self.obj.data, dask_array_type):
padded = self.obj.variable
if self.center:
shift = (-self.window // 2) + 1
padded = padded.pad_with_fill_value(**{self.dim: (0, -shift)})
valid = (slice(None), ) * axis + (slice(-shift, None), )

if isinstance(padded.data, dask_array_type):
values = dask_rolling_wrapper(func, self.obj.data,
window=self.window,
min_count=min_count,
axis=axis)
else:
values = func(self.obj.data, window=self.window,
values = func(padded.data, window=self.window,
min_count=min_count, axis=axis)

result = DataArray(values, self.obj.coords)

if self.center:
result = self._center_result(result)
values = values[valid]
result = DataArray(values, self.obj.coords)

return result
return wrapped_func
Expand Down Expand Up @@ -373,6 +381,31 @@ def wrapped_func(self, **kwargs):
return Dataset(reduced, coords=self.obj.coords)
return wrapped_func

def to_dataset(self, window_dim):
"""
Convert this rolling object to xr.Dataset,
where the window dimension is stacked as a new dimension

Parameters
----------
window_dim: str
New name of the window dimension.

Returns
-------
Dataset with variables converted from rolling object.
"""

from .dataset import Dataset

dataset = OrderedDict()
for key, da in self.obj.data_vars.items():
if self.dim in da.dims:
dataset[key] = self.rollings[key].to_dataarray(window_dim)
else:
dataset[key] = da
return Dataset(dataset, coords=self.obj.coords)


inject_bottleneck_rolling_methods(DataArrayRolling)
inject_datasetrolling_methods(DatasetRolling)
Loading