-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta #3574
Comments
Another approach would be to bypass Perhaps this is an oversight in |
I am having a similar problem. This impacts some of my frequently used code to compute correlations. Here is a simplified example that used to work with older dependencies:
This works when passing numpy arrays:
array([[[ 0.09958247, 0.36831431],
[-0.54445474, 0.66997513],
[-0.22894182, 0.65433402],
[ 0.38536482, 0.20656073],
[ 0.25083224, 0.46955618]],
Dimensions without coordinates: x, y, parameter But when I convert both arrays to dask arrays, I get the same error as @smartass101.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
1 a = xr.DataArray(np.random.rand(3, 13, 5), dims=['x', 'time', 'y'])
2 b = xr.DataArray(np.random.rand(3, 5, 13), dims=['x','y', 'time'])
----> 3 wrapper(a.chunk({'x':2, 'time':-1}),b.chunk({'x':2, 'time':-1}))
in wrapper(a, b, dim) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, *args) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, output_sizes, keep_attrs, *args) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in func(*arrays) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/xarray/core/computation.py in _apply_blockwise(func, args, input_dims, output_dims, signature, output_dtypes, output_sizes) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/dask/array/blockwise.py in blockwise(func, out_ind, name, token, dtype, adjust_chunks, new_axes, align_arrays, concatenate, meta, *args, **kwargs) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/dask/array/utils.py in compute_meta(func, _dtype, *args, **kwargs) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in call(self, *args, **kwargs) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args) ~/miniconda/envs/euc_dynamics/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call_with_signature(self, func, args) ValueError: cannot call This used to work like a charm...I however was sloppy in testing this functionality (a good reminder always to write tests immediately 🙄 ), and I was not able to determine a combination of dependencies that would work. I am still experimenting and will report back Could this behaviour be a bug introduced in dask at some point (as indicated by @smartass101 above)? cc'ing @dcherian @shoyer @mrocklin EDIT: I can confirm that it seems to be a dask issue. If I restrict my dask version to |
Sounds similar. But I'm not sure why you get the 0d issue when even your chunks don't (from a quick reading) seem to have a 0 size in any of the dimensions. Could you please show us what is the resulting chunk setup? |
The problem is that Dask, as of version 2.0, calls functions applied to dask arrays with size zero inputs, to figure out the output array type, e.g., is the output a dense numpy.ndarray or a sparse array? Unfortunately, For xarray, we have a couple of options:
(1) is probably easiest here. |
Yes, now I recall that this was the issue, yeah. It doesn't even depend on your actual data really. Possible option 3. is to address dask/dask#5642 directly (haven't found time to do a PR yet). Essentially from the code described in that issue I have the feeling that if a |
@shoyer's option 1 should be a relatively simple xarray PR is one of you is up for it. |
I can give it a shot if you could point me to the appropriate place, since I have never messed with the dask internals of xarray. |
xarray/xarray/core/computation.py Lines 579 to 593 in 6ad59b9
|
I'm afraid that passing |
Right the xarray solution is to set |
Yes, sorry, written this way I now see what you meant and that will likely work indeed. |
* apply_func: Set meta=np.ndarray when vectorize=True and dask="parallelized" Closes #3574 * Add meta kwarg to apply_ufunc. * Bump minimum dask to 2.1.0 * Update distributed too * bump minimum dask, distributed to 2.2 * Update whats-new * minor. * fix whats-new * Attempt numpy=1.15 * Revert "Attempt numpy=1.15" This reverts commit 2b22470. * xfail test. * More xfailed tests. * Update xfail reason. * fix whats-new * Add test to ensure meta is passed on to dask. * Use skipif instead of xfail.
I experienced problems with a computation due to a bug that was recently solved in xarray (pydata/xarray#3574 (comment)). This PR adds the most recent release of xarray
MCVE Code Sample
fails with
ValueError: cannot call
vectorizewith a signature including new output dimensions on size 0 inputs
becausedask.array.utils.compute_meta()
passes it 0-sized arrays.Expected Output
This should work and works well on the non-chunked ds, without
dask='parallelized'
and the associatedoutput*
parameters.Problem Description
I'm trying to parallelize a peak finding routine with dask (works well without it) and I hoped that
dask='parallelized
would make that simple. However, the peak finding needs to be vectorized and it works well with vectorize=True, but
np.vectorizeappears to have issues in
compute_meta` which is internally issued by dask in blockwise application as indicated in the source code:https://github.com/dask/dask/blob/e6ba8f5de1c56afeaed05c39c2384cd473d7c893/dask/array/utils.py#L118
A possible solution might be for
apply_ufunc
to passmeta
directly to dask if it would be possible to foresee whatmeta
should be. I suppose we are aiming fornp.nadarray
most of the time, thoughsparse
might change that in the future.I know I could use groupby-apply as an alternative, but there are several issues that made us use
apply_ufunc
instead:Output of
xr.show_versions()
commit: None
python: 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.9.0-11-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1
xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.7.12
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0
The text was updated successfully, but these errors were encountered: