Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling window with as_strided #1837

Merged
merged 82 commits into from
Mar 1, 2018
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
789134c
Rolling_window for np.ndarray
fujiisoup Jan 16, 2018
fa4e857
Add pad method to Variable
fujiisoup Jan 17, 2018
52915f3
Added rolling_window to DataArray and Dataset
fujiisoup Jan 17, 2018
b622007
remove pad_value option. Support dask.rolling_window
fujiisoup Jan 18, 2018
36a1fe9
Refactor rolling.reduce
fujiisoup Jan 18, 2018
71fed0f
add as_strided to npcompat. Tests added for reduce(np.nanmean)
fujiisoup Jan 18, 2018
3960134
Support boolean in maybe_promote
fujiisoup Jan 18, 2018
4bd38f3
move rolling_window into duck_array_op. Make DataArray.rolling_window…
fujiisoup Jan 19, 2018
af8362e
Added to_dataarray and to_dataset to rolling object.
fujiisoup Jan 19, 2018
76db6b5
Use pad in rolling to make compatible to pandas. Expose pad_with_fill…
fujiisoup Jan 20, 2018
87f53af
Refactor rolling
fujiisoup Jan 20, 2018
c23cedb
flake8
fujiisoup Jan 20, 2018
9547c57
Added a comment for dask's pad.
fujiisoup Jan 20, 2018
1f71cff
Use fastpath in rolling.to_dataarray
fujiisoup Jan 20, 2018
724776f
Merge branch 'master' into rolling_window
fujiisoup Jan 20, 2018
73862eb
Doc added.
fujiisoup Jan 20, 2018
859bb5c
Revert not to use fastpath
fujiisoup Jan 20, 2018
d5fc24e
Merge branch 'master' into rolling_window
fujiisoup Jan 21, 2018
05c72f0
Remove maybe_prompt for Boolean. Some improvements based on @shoyer's…
fujiisoup Jan 21, 2018
d55e498
Update test.
fujiisoup Jan 21, 2018
9393eb2
Bug fix in test_rolling_count_correct
fujiisoup Jan 21, 2018
9c71a50
fill_value for boolean array
fujiisoup Jan 21, 2018
54975b4
rolling_window(array, axis, window) -> rolling_window(array, window, …
fujiisoup Jan 21, 2018
e907fdf
support stride in rolling.to_dataarray
fujiisoup Jan 21, 2018
6482536
flake8
fujiisoup Jan 21, 2018
b8def4f
Improve doc. Add DataArrayRolling to api.rst
fujiisoup Jan 21, 2018
ff31589
Improve docs in common.rolling.
fujiisoup Jan 21, 2018
6c011cb
Expose groupby docs to public
fujiisoup Jan 21, 2018
684145a
Default fill_value=dtypes.NA, stride=1. Add comment for DataArrayRollig.
fujiisoup Jan 21, 2018
3a7526e
Default fill_value=dtypes.NA, stride=1. Add comment for DataArrayRollig.
fujiisoup Jan 21, 2018
a0968d6
Add fill_value option to rolling.to_dataarray
fujiisoup Jan 22, 2018
ac4f00e
Convert non-numeric array in reduce.
fujiisoup Jan 22, 2018
fbfc262
Fill_value = False for boolean array in rolling.reduce
fujiisoup Jan 22, 2018
c757986
Support old numpy plus bottleneck combination. Suppress warning for a…
fujiisoup Jan 22, 2018
8fd5fa3
flake8
fujiisoup Jan 22, 2018
ade5ba2
Add benchmark
fujiisoup Jan 22, 2018
2d6897f
Dataset.count. Benchmark
fujiisoup Jan 23, 2018
6461f84
Classize benchmark
fujiisoup Jan 23, 2018
aece1c4
Decoratorize for asv benchmark
fujiisoup Jan 24, 2018
d5ad4a0
Merge branch 'master' into rolling_window
fujiisoup Jan 24, 2018
4189d71
Classize benchmarks/indexing.py
fujiisoup Jan 24, 2018
081c928
Working with nanreduce
fujiisoup Jan 27, 2018
75c1d7d
Support .sum for object dtype.
fujiisoup Jan 30, 2018
452b219
Remove unused if-statements.
fujiisoup Jan 30, 2018
c5490c4
Default skipna for rolling.reduce
fujiisoup Jan 30, 2018
ab91394
Pass tests. Test added to make sure the consistency to pandas' behavior.
fujiisoup Jan 30, 2018
9fa0812
Delete duplicate file. flake8
fujiisoup Jan 30, 2018
0c1d49a
flake8 again
fujiisoup Jan 30, 2018
9463937
Working with numpy<1.13
fujiisoup Jan 30, 2018
dce4e37
Revert "Classize benchmarks/indexing.py"
fujiisoup Feb 10, 2018
b3050cb
rolling_window with dask.ghost
fujiisoup Feb 10, 2018
22f6d4a
Merge branch 'rolling_window_dask' into rolling_window
fujiisoup Feb 10, 2018
19e0fca
Merge branch 'master' into rolling_window
fujiisoup Feb 15, 2018
d3b1e2b
Optimize rolling.count.
fujiisoup Feb 15, 2018
2d06ec9
Merge branch 'master' into rolling_window
fujiisoup Feb 15, 2018
734da93
Fixing style errors.
stickler-ci Feb 15, 2018
1a000b8
Remove unused npcompat.nansum etc
fujiisoup Feb 15, 2018
27ff67c
flake8
fujiisoup Feb 16, 2018
a2c7141
require_dask -> has_dask
fujiisoup Feb 16, 2018
35dee9d
npcompat -> np
fujiisoup Feb 16, 2018
137709f
flake8
fujiisoup Feb 16, 2018
cc82cdc
Skip tests for old numpy.
fujiisoup Feb 16, 2018
b246411
Improve doc. Optmize missing._get_valid_fill_mask
fujiisoup Feb 17, 2018
b3a2105
to_dataarray -> construct
fujiisoup Feb 18, 2018
b80fbfd
remove assert_allclose_with_nan
fujiisoup Feb 18, 2018
3c010ae
Fixing style errors.
stickler-ci Feb 18, 2018
ab82f75
typo
fujiisoup Feb 18, 2018
b9f10cd
`to_dataset` -> `construct`
fujiisoup Feb 18, 2018
cc9c3d6
Update doc
fujiisoup Feb 18, 2018
52cc48d
Merge branch 'master' into rolling_window
fujiisoup Feb 18, 2018
2954cdf
Change boundary and add comments for dask_rolling_window.
fujiisoup Feb 18, 2018
f19e531
Refactor dask_array_ops.rolling_window and np_utils.rolling_window
fujiisoup Feb 24, 2018
a074df3
flake8
fujiisoup Feb 24, 2018
f6f78a5
Simplify tests
fujiisoup Feb 24, 2018
0ec8aba
flake8 again.
fujiisoup Feb 25, 2018
0261cfe
cleanup roling_window for dask.
fujiisoup Feb 25, 2018
a91c27f
Merge branch 'master' into rolling_window
fujiisoup Feb 26, 2018
c83d588
remove duplicates
fujiisoup Feb 26, 2018
3bb4668
remvove duplicate
fujiisoup Feb 26, 2018
d0d89ce
flake8
fujiisoup Feb 26, 2018
eaba563
delete unnecessary file.
fujiisoup Feb 26, 2018
aeabdf5
Merge branch 'master' into rolling_window
fujiisoup Feb 28, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 21 additions & 4 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -158,20 +158,37 @@ Aggregation and summary methods can be applied directly to the ``Rolling`` objec
r.mean()
r.reduce(np.std)

Note that rolling window aggregations are much faster (both asymptotically and
because they avoid a loop in Python) when bottleneck_ is installed. Otherwise,
we fall back to a slower, pure Python implementation.
Note that rolling window aggregations are faster when bottleneck_ is installed.

.. _bottleneck: https://github.com/kwgoodman/bottleneck/

Finally, we can manually iterate through ``Rolling`` objects:
We can also manually iterate through ``Rolling`` objects:

.. ipython:: python

@verbatim
for label, arr_window in r:
# arr_window is a view of x

Finally, the rolling object has ``to_dataarray`` method
(``to_dataset`` method for Rolling objects from Dataset), which gives a
view of the original ``DataArray`` with the windowed dimension attached to
the last position.
You can use this for more advanced rolling operations, such as strided rolling,
windowed rolling, convolution, short-time FFT, etc.

.. ipython:: python

rolling_da = r.to_dataarray('window_dim')
rolling_da
# rolling mean with 2-point stride
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show the example by calling construct('window_dim', stride=2) instead?

rolling_da.isel(y=slice(None, None, 2)).mean('window_dim')

Note that although the ``DataArray`` obtained by
``r.to_dataarray('window_dim')`` has an additional dimension,
it does not consume too much memory as it is just a view of
the original array.

.. _compute.broadcasting:

Broadcasting by dimension name
Expand Down
12 changes: 11 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,13 @@ Documentation

Enhancements
~~~~~~~~~~~~
- reduce methods such as :py:func:`DataArray.sum()` now accepts ``dtype``
- Improve :py:func:`~xarray.DataArray.rooling` logic for speed up.
:py:func:`~xarray.DataArrayRolling` object now support ``to_dataarray``
method that returns a view of the DataArray object with the rolling-window
dimension added to the last position. This enables more flexible operation,
such as strided rolling, windowed rolling, ND-rolling, and convolution.
(:issue:`1831`, :issue:`1142`, :issue:`819`)
- reduce methods such as :py:func:`DataArray.sum()` now accept ``dtype``
arguments. (:issue:`1838`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
- Added nodatavals attribute to DataArray when using :py:func:`~xarray.open_rasterio`. (:issue:`1736`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a bug fix note for the aggregations of the last element with center=True?

Expand Down Expand Up @@ -68,6 +74,10 @@ Enhancements

Bug fixes
~~~~~~~~~
- Rolling aggregation with ``center=True`` option now gives the same result
with pandas including the last element (:issue:`1046`).
By `Keisuke Fujii <https://github.com/fujiisoup>`_.

- Added warning in api.py of a netCDF4 bug that occurs when
the filepath has 88 characters (:issue:`1745`).
By `Liam Brannigan <https://github.com/braaannigan>` _.
Expand Down
22 changes: 22 additions & 0 deletions xarray/core/duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import pandas as pd

from . import npcompat
from . import nputils
from . import dtypes
from .pycompat import dask_array_type
from .nputils import nanfirst, nanlast
Expand Down Expand Up @@ -265,3 +266,24 @@ def last(values, axis, skipna=None):
_fail_on_dask_array_input_skipna(values)
return nanlast(values, axis)
return take(values, -1, axis=axis)


def rolling_window(array, axis, window):
"""
Make an ndarray with a rolling window of axis-th dimension.
The rolling dimension will be placed at the last dimension.
"""
if isinstance(array, dask_array_type):
if window < 1:
raise ValueError(
"`window` must be at least 1. Given : {}".format(window))
if window > array.shape[axis]:
raise ValueError("`window` is too long. Given : {}".format(window))

axis = nputils._validate_axis(array, axis)
size = array.shape[axis] - window + 1
arrays = [array[(slice(None), ) * axis + (slice(w, size + w), )]
for w in range(window)]
return da.stack(arrays, axis=-1)
else: # np.ndarray
return nputils.rolling_window(array, axis, window)
12 changes: 12 additions & 0 deletions xarray/core/npcompat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@
from __future__ import division
from __future__ import print_function
import numpy as np
from distutils.version import LooseVersion


if LooseVersion(np.__version__) < LooseVersion('1.12'):
def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
array = np.lib.stride_tricks.as_strided(x, shape, strides, subok)
array.setflags(write=writeable)
return array

else:
as_strided = np.lib.stride_tricks.as_strided


try:
from numpy import nancumsum, nancumprod, flip
Expand Down
50 changes: 50 additions & 0 deletions xarray/core/nputils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import numpy as np
import pandas as pd
import warnings
from . import npcompat


def _validate_axis(data, axis):
Expand Down Expand Up @@ -133,3 +134,52 @@ def __setitem__(self, key, value):
mixed_positions, vindex_positions = _advanced_indexer_subspaces(key)
self._array[key] = np.moveaxis(value, vindex_positions,
mixed_positions)


def rolling_window(a, axis, window):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small point, but can you swap the arguments for this function? That would let you set a default axis.

Bottleneck uses default arguments like move_sum(array, window, axis=-1) which I think is a nice convention:
https://kwgoodman.github.io/bottleneck-doc/reference.html#moving-window-functions

"""
Make an ndarray with a rolling window along axis.

Parameters
----------
a : array_like
Array to add rolling window to
axis: int
axis position along which rolling window will be applied.
window : int
Size of rolling window

Returns
-------
Array that is a view of the original array with a added dimension
of size w.

Examples
--------
>>> x=np.arange(10).reshape((2,5))
>>> np.rolling_window(x, 3, axis=-1)
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
[[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

Calculate rolling mean of last dimension:
>>> np.mean(np.rolling_window(x, 3, axis=-1), -1)
array([[ 1., 2., 3.],
[ 6., 7., 8.]])

This function is taken from https://github.com/numpy/numpy/pull/31
but slightly modified to accept axis option.
"""
axis = _validate_axis(a, axis)
a = np.swapaxes(a, axis, -1)

if window < 1:
raise ValueError(
"`window` must be at least 1. Given : {}".format(window))
if window > a.shape[-1]:
raise ValueError("`window` is too long. Given : {}".format(window))

shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
rolling = npcompat.as_strided(a, shape=shape, strides=strides,
writeable=False)
return np.swapaxes(rolling, -2, axis)
16 changes: 2 additions & 14 deletions xarray/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,20 +227,8 @@ def func(self, *args, **kwargs):

def rolling_count(rolling):

not_null = rolling.obj.notnull()
instance_attr_dict = {'center': rolling.center,
'min_periods': rolling.min_periods,
rolling.dim: rolling.window}
rolling_count = not_null.rolling(**instance_attr_dict).sum()

if rolling.min_periods is None:
return rolling_count

# otherwise we need to filter out points where there aren't enough periods
# but not_null is False, and so the NaNs don't flow through
# array with points where there are enough values given min_periods
enough_periods = rolling_count >= rolling.min_periods

rolling_count = rolling._counts()
enough_periods = rolling_count > rolling._min_periods - 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5 is a little strange to see when the other veggies are integers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other variables, not veggies (I blame autocorrect!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I will fix.

return rolling_count.where(enough_periods)


Expand Down
Loading