Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_ufunc: Add meta kwarg + bump dask to 2.2 #3660

Merged
merged 18 commits into from
Jan 22, 2020

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Jan 2, 2020

This makes vectorize=True works with functions that add new dimensions.

cc @smartass101 @jbusecke

@dcherian
Copy link
Contributor Author

dcherian commented Jan 9, 2020

The test failed because meta was not valid for dask==1.2

Our minimum versions policy says we can bump dask to 2.1.0.
dask 1.2 (2019-04-13) 2.1 (2019-07-08) <

So I've bumped the min dask version and added meta as a kwarg to apply_ufunc.

@dcherian dcherian changed the title apply_func: Set meta=np.ndarray when vectorize=True and dask="parallelized" apply_func: Add meta kwarg Jan 9, 2020
@dcherian
Copy link
Contributor Author

dcherian commented Jan 9, 2020

min-all-deps fails with this weird error:


    def test_append_with_new_variable(self):
    
        ds, ds_to_append, ds_with_new_var = create_append_test_data()
    
        # check append mode for new variable
        with self.create_zarr_target() as store_target:
            xr.concat([ds, ds_to_append], dim="time").to_zarr(store_target, mode="w")
            ds_with_new_var.to_zarr(store_target, mode="a")
            combined = xr.concat([ds, ds_to_append], dim="time")
            combined["new_var"] = ds_with_new_var["new_var"]
>           assert_identical(combined, xr.open_zarr(store_target))

xarray/tests/test_backends.py:1876: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
xarray/core/dataset.py:1354: in identical
    other, "identical"
xarray/core/dataset.py:1302: in _all_compat
    self._variables, other._variables, compat=compat
xarray/core/utils.py:338: in dict_equiv
    if k not in second or not compat(first[k], second[k]):
xarray/core/dataset.py:1299: in compat
    return getattr(x, compat_str)(y)
xarray/core/variable.py:1670: in identical
    other, equiv=equiv
xarray/core/variable.py:1647: in equals
    self._data is other._data or equiv(self.data, other.data)
xarray/core/duck_array_ops.py:224: in array_equiv
    flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2))
/usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/dask/array/core.py:1739: in __eq__
    return elemwise(operator.eq, self, other)
/usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/dask/array/core.py:3749: in elemwise
    **blockwise_kwargs
/usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/dask/array/blockwise.py:145: in blockwise
    chunkss, arrays = unify_chunks(*args)
/usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/dask/array/core.py:3020: in unify_chunks
    (asanyarray(a) if ind is not None else a, ind) for a, ind in partition(2, args)
/usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/dask/array/core.py:3020: in <listcomp>
    (asanyarray(a) if ind is not None else a, ind) for a, ind in partition(2, args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

a = array(['2019-01-01T00:00:00.000000000', '2019-01-02T00:00:00.000000000',
       '2019-01-03T00:00:00.000000000', '2019-01-04T00:00:00.000000000',
       '2019-01-05T00:00:00.000000000'], dtype='datetime64[ns]')

    def asanyarray(a):
        """Convert the input to a dask array.
    
        Subclasses of ``np.ndarray`` will be passed through as chunks unchanged.
    
        Parameters
        ----------
        a : array-like
            Input data, in any form that can be converted to a dask array.
    
        Returns
        -------
        out : dask array
            Dask array interpretation of a.
    
        Examples
        --------
        >>> import dask.array as da
        >>> import numpy as np
        >>> x = np.arange(3)
        >>> da.asanyarray(x)
        dask.array<array, shape=(3,), dtype=int64, chunksize=(3,)>
    
        >>> y = [[1, 2, 3], [4, 5, 6]]
        >>> da.asanyarray(y)
        dask.array<array, shape=(2, 3), dtype=int64, chunksize=(2, 3)>
        """
        if isinstance(a, Array):
            return a
        elif hasattr(a, "to_dask_array"):
            return a.to_dask_array()
>       elif hasattr(a, "data") and type(a).__module__.startswith("xarray."):
E       ValueError: cannot include dtype 'M' in a buffer

@dcherian dcherian changed the title apply_func: Add meta kwarg apply_ufunc: Add meta kwarg + bump dask to 2.2 Jan 15, 2020
* upstream/master:
  Add an example notebook using apply_ufunc to vectorize 1D functions (pydata#3629)
  Use encoding['dtype'] over data.dtype when possible within CFMaskCoder.encode (pydata#3652)
  allow passing any iterable to drop when dropping variables (pydata#3693)
  Typo on DataSet/DataArray.to_dict documentation (pydata#3692)
  Fix mypy type checking tests failure in ds.merge (pydata#3690)
  Explicitly convert result of pd.to_datetime to a timezone-naive type (pydata#3688)
  ds.merge(da) bugfix (pydata#3677)
  fix docstring for combine_first: returns a Dataset (pydata#3683)
  Add option to choose mfdataset attributes source. (pydata#3498)
  How do I add a new variable to dataset. (pydata#3679)
  Add map_blocks example to whats-new (pydata#3682)
  Make dask names change when chunking Variables by different amounts. (pydata#3584)
  raise an error when renaming dimensions to existing names (pydata#3645)
  Support swap_dims to dimension names that are not existing variables (pydata#3636)
  Add map_blocks example to docs. (pydata#3667)
  add multiindex level name checking to .rename() (pydata#3658)
@dcherian dcherian mentioned this pull request Jan 17, 2020
11 tasks
@dcherian
Copy link
Contributor Author

Does anyone have thoughts on what might be happening here?

@shoyer
Copy link
Member

shoyer commented Jan 20, 2020

This looks like an dask bug that we fixed a somewhat recently (6 months ago?).

I wouldn’t worry about it too much about it. Supporting old versions of dependencies is a best effort, not a mandate. I would just skip it on old versions of dask and move on.

This reverts commit 2b22470.
@keewis keewis mentioned this pull request Jan 20, 2020
3 tasks
…e-meta

* 'master' of github.com:pydata/xarray:
  Feature/align in dot (pydata#3699)
  ENH: enable `H5NetCDFStore` to work with already open h5netcdf.File a… (pydata#3618)
  One-off isort run (pydata#3705)
  hardcoded xarray.__all__ (pydata#3703)
  Bump mypy to v0.761 (pydata#3704)
  remove DataArray and Dataset constructor deprecations for 0.15  (pydata#3560)
  Tests for variables with units (pydata#3654)
@dcherian
Copy link
Contributor Author

Ah found it: dask/dask#5334. Missed it when I looked through the changelog earlier.

@dcherian
Copy link
Contributor Author

ready for review and merge.

xarray/tests/test_sparse.py Show resolved Hide resolved
xarray/tests/test_backends.py Outdated Show resolved Hide resolved
@shoyer
Copy link
Member

shoyer commented Jan 21, 2020 via email

@dcherian
Copy link
Contributor Author

Thanks for the review. I'll merge when tests pass.

@dcherian dcherian merged commit 17b70ca into pydata:master Jan 22, 2020
@dcherian dcherian deleted the ufunc-vectorize-meta branch January 22, 2020 15:43
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 22, 2020
* 'master' of github.com:pydata/xarray:
  apply_ufunc: Add meta kwarg + bump dask to 2.2 (pydata#3660)
dcherian added a commit to fujiisoup/xarray that referenced this pull request Jan 25, 2020
* 'master' of github.com:pydata/xarray: (27 commits)
  bump min deps for 0.15 (pydata#3713)
  setuptools-scm and isort tweaks (pydata#3720)
  Allow binned coordinates on 1D plots y-axis. (pydata#3685)
  apply_ufunc: Add meta kwarg + bump dask to 2.2 (pydata#3660)
  setuptools-scm and one-liner setup.py (pydata#3714)
  Feature/align in dot (pydata#3699)
  ENH: enable `H5NetCDFStore` to work with already open h5netcdf.File a… (pydata#3618)
  One-off isort run (pydata#3705)
  hardcoded xarray.__all__ (pydata#3703)
  Bump mypy to v0.761 (pydata#3704)
  remove DataArray and Dataset constructor deprecations for 0.15  (pydata#3560)
  Tests for variables with units (pydata#3654)
  Add an example notebook using apply_ufunc to vectorize 1D functions (pydata#3629)
  Use encoding['dtype'] over data.dtype when possible within CFMaskCoder.encode (pydata#3652)
  allow passing any iterable to drop when dropping variables (pydata#3693)
  Typo on DataSet/DataArray.to_dict documentation (pydata#3692)
  Fix mypy type checking tests failure in ds.merge (pydata#3690)
  Explicitly convert result of pd.to_datetime to a timezone-naive type (pydata#3688)
  ds.merge(da) bugfix (pydata#3677)
  fix docstring for combine_first: returns a Dataset (pydata#3683)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 27, 2020
* master:
  Add support for CFTimeIndex in get_clean_interp_index (pydata#3631)
  sel with categorical index (pydata#3670)
  bump min deps for 0.15 (pydata#3713)
  setuptools-scm and isort tweaks (pydata#3720)
  Allow binned coordinates on 1D plots y-axis. (pydata#3685)
  apply_ufunc: Add meta kwarg + bump dask to 2.2 (pydata#3660)
  setuptools-scm and one-liner setup.py (pydata#3714)
  Feature/align in dot (pydata#3699)
  ENH: enable `H5NetCDFStore` to work with already open h5netcdf.File a… (pydata#3618)
  One-off isort run (pydata#3705)
  hardcoded xarray.__all__ (pydata#3703)
  Bump mypy to v0.761 (pydata#3704)
  remove DataArray and Dataset constructor deprecations for 0.15  (pydata#3560)
  Tests for variables with units (pydata#3654)
  Add an example notebook using apply_ufunc to vectorize 1D functions (pydata#3629)
  Use encoding['dtype'] over data.dtype when possible within CFMaskCoder.encode (pydata#3652)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta
2 participants