Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cftime.datetime coordinates with coarsen #2778

Merged
merged 1 commit into from
Mar 6, 2019

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Feb 19, 2019

  • Tests added
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

For now I've held off on making these changes dask-compatible (I could do it, but I'm not sure it is worth the extra complexity).

@jbusecke
Copy link
Contributor

jbusecke commented Mar 6, 2019

Oh sweet, I just encountered this problem. Would this work on a large dask array with a non-dask time dimension?

@spencerkclark
Copy link
Member Author

spencerkclark commented Mar 6, 2019

Oh, I should have been a little clearer!

For now I've held off on making these changes dask-compatible (I could do it, but I'm not sure it is worth the extra complexity)

This comment only applies to the changes regarding duck_array_ops.mean, which is used by default on the coordinates involved in coarsen. Since indexes are always loaded into memory, i.e. backed by NumPy arrays, we don't really need to worry about dask-compatibility there. In other words with this PR a DataArray can hold dask array data indexed by a cftime time coordinate, and coarsen will work just fine:

In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: data = np.random.random((10, 5))

In [4]: times = xr.cftime_range('2000', periods=10)

In [5]: da = xr.DataArray(data, coords={'time': times}, dims=['time', 'x'])

In [6]: da = da.chunk({'time': 1, 'x': 1})

In [7]: da
Out[7]:
<xarray.DataArray (time: 10, x: 5)>
dask.array<shape=(10, 5), dtype=float64, chunksize=(1, 1)>
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-01-10 00:00:00
Dimensions without coordinates: x

In [8]: da.coarsen(time=2).mean()
Out[8]:
<xarray.DataArray (time: 5, x: 5)>
dask.array<shape=(5, 5), dtype=float64, chunksize=(1, 1)>
Coordinates:
  * time     (time) object 2000-01-01 12:00:00 ... 2000-01-09 12:00:00
Dimensions without coordinates: x

This would only come up as a possible issue if you tried to lazily take the mean of a DataArray of cftime objects, e.g.:

In [9]: da = xr.DataArray(times, dims=['t']).chunk()

In [10]: da
Out[10]:
<xarray.DataArray (t: 10)>
dask.array<shape=(10,), dtype=object, chunksize=(10,)>
Coordinates:
  * t        (t) object 2000-01-01 00:00:00 ... 2000-01-10 00:00:00

In [11]: da.mean()
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-19-c02402258881> in <module>
----> 1 da.mean()

~/xarray-dev/xarray/xarray/core/common.py in wrapped_func(self, dim, axis, skipna, **kwargs)
     23                              **kwargs):
     24                 return self.reduce(func, dim, axis,
---> 25                                    skipna=skipna, allow_lazy=True, **kwargs)
     26         else:
     27             def wrapped_func(self, dim=None, axis=None,  # type: ignore

~/xarray-dev/xarray/xarray/core/dataarray.py in reduce(self, func, dim, axis, keep_attrs, **kwargs)
   1603         """
   1604
-> 1605         var = self.variable.reduce(func, dim, axis, keep_attrs, **kwargs)
   1606         return self._replace_maybe_drop_dims(var)
   1607

~/xarray-dev/xarray/xarray/core/variable.py in reduce(self, func, dim, axis, keep_attrs, allow_lazy, **kwargs)
   1366             data = func(input_data, axis=axis, **kwargs)
   1367         else:
-> 1368             data = func(input_data, **kwargs)
   1369
   1370         if getattr(data, 'shape', ()) == self.shape:

~/xarray-dev/xarray/xarray/core/duck_array_ops.py in mean(array, axis, skipna, **kwargs)
    348         if isinstance(array, dask_array_type):
    349             raise NotImplementedError(
--> 350                 'Computing the mean of an array containing '
    351                 'cftime.datetime objects is not yet implemented on '
    352                 'dask arrays.')

NotImplementedError: Computing the mean of an array containing cftime.datetime objects is not yet implemented on dask arrays.

but I think that's a pretty rare use case, hence why I've held off on adding that support for now.

@jbusecke
Copy link
Contributor

jbusecke commented Mar 6, 2019

Oh yeah, that seems totally fair to me. Thanks for clarifying. Cant wait to have this functionality!
Thanks @spencerkclark

@fujiisoup
Copy link
Member

Thanks for the follow up pr. Merging.

@fujiisoup fujiisoup merged commit c770eec into pydata:master Mar 6, 2019
@spencerkclark spencerkclark deleted the cftime-coarsen branch March 6, 2019 19:48
dcherian added a commit to yohai/xarray that referenced this pull request Mar 18, 2019
* upstream/master:
  Rework whats-new for 0.12
  Add whats-new for 0.12.1
  Release 0.12.0
  enable loading remote hdf5 files (pydata#2782)
  Push back finalizing deprecations for 0.12 (pydata#2809)
  Drop failing tests writing multi-dimensional arrays as attributes (pydata#2810)
  some docs updates (pydata#2746)
  Add support for cftime.datetime coordinates with coarsen (pydata#2778)
  Don't use deprecated np.asscalar() (pydata#2800)
  Improve name concat (pydata#2792)
  Add `Dataset.drop_dims` (pydata#2767)
  Quarter offset implemented (base is now latest pydata-master). (pydata#2721)
  Add use_cftime option to open_dataset (pydata#2759)
  Bugfix/reduce no axis (pydata#2769)
  'standard' now refers to 'gregorian' in cftime_range (pydata#2771)
pletchm pushed a commit to pletchm/xarray that referenced this pull request Mar 21, 2019
pletchm pushed a commit to pletchm/xarray that referenced this pull request Mar 21, 2019
shoyer pushed a commit that referenced this pull request Mar 26, 2019
…ns with size>1 (#2757)

* Quarter offset implemented (base is now latest pydata-master). (#2721)

* Quarter offset implemented (base is now latest pydata-master).

* Fixed issues raised in review (#2721 (review))

* Updated whats-new.rst with info on quarter offset support.

* Updated whats-new.rst with info on quarter offset support.

* Update doc/whats-new.rst

Co-Authored-By: jwenfai <jwenfai@gmail.com>

* Added support for quarter frequencies when resampling CFTimeIndex. Less redundancy in CFTimeIndex resampling tests.

* Removed normalization code (unnecessary for cftime_range) in cftime_offsets.py. Removed redundant lines in whats-new.rst.

* Removed invalid option from _get_day_of_month docstring. Added tests back in that raises ValueError when resampling (base=24 when resampling to daily freq, e.g., '8D').

* Minor edits to docstrings/comments

* lint

* Add `Dataset.drop_dims` (#2767)

* ENH: Add Dataset.drop_dims()

* Drops full dimensions and any corresponding variables in a
  Dataset
* Fixes GH1949

* DOC: Add Dataset.drop_dims() documentation

* Improve name concat (#2792)

* Added tests of desired name inferring behaviour

* Infers names

* updated what's new

* Don't use deprecated np.asscalar() (#2800)

It got deprecated in numpy 1.16 and throws a ton of warnings due to
that.
All the function does is returning .item() anyway, which is why it got
deprecated.

* Add support for cftime.datetime coordinates with coarsen (#2778)

* some docs updates (#2746)

* Friendlier io title.

* Fix lists.

* Fix *args, **kwargs

"inline emphasis..."

* misc

* Reference xarray_extras for csv writing. Closes #2289

* Add metpy accessor. Closes #461

* fix transpose docstring. Closes #2576

* Revert "Fix lists."

This reverts commit 39983a5.

* Revert "Fix *args, **kwargs"

This reverts commit 1b9da35.

* Add MetPy to related projects.

* Add Weather and Climate specific page.

* Add hvplot.

* Note open_dataset, mfdataset open files as read-only (closes #2345).

* Update metpy 1

Co-Authored-By: dcherian <dcherian@users.noreply.github.com>

* Update doc/weather-climate.rst

Co-Authored-By: dcherian <dcherian@users.noreply.github.com>

* Drop failing tests writing multi-dimensional arrays as attributes (#2810)

These aren't valid for netCDF files.

Fixes GH2803

* Push back finalizing deprecations for 0.12 (#2809)

0.12 will already have a big change in dropping Python 2.7 support. I'd rather
wait a bit longer to finalize these deprecations to minimize the impact on
users.

* enable loading remote hdf5 files (#2782)

* attempt at loading remote hdf5

* added a couple tests

* rewind bytes after reading header

* addressed comments for tests and error message

* fixed pep8 formatting

* created _get_engine_from_magic_number function, new tests

* added description in whats-new

* fixed test failure on windows

* same error on windows and nix

* Release 0.12.0

* Add whats-new for 0.12.1

* Rework whats-new for 0.12

* DOC: Update donation links

* DOC: remove outdated warning (#2818)

* Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 (#2757)

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5

* Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 (#2757)

 * use .size attribute to determine the size of a dimension, rather than converting to a list, which can be slow for large iterables

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5

* Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 (#2757)

 * Move enhancement description up to 0.12.1

 * use .size attribute to determine the size of a dimension, rather than converting to a list, which can be slow for large iterables

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants