Fix/apply ufunc meta dtype #4022

mathause · 2020-05-03T15:21:09Z

Closes apply_ufunc gives wrong dtype with dask=parallelized and vectorized=True #4015
Tests added
Passes isort -rc . && black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

xarray/core/computation.py

ulijh · 2020-05-04T17:45:40Z

Thanks @mathause , this works for me. Also with a change of dtype in func, as long as "output_dtypes" is set correctly.

xarray/tests/test_computation.py

dcherian

Thanks @mathause. I just have one minor suggestion. LGTM otherwise.

xarray/core/computation.py

shoyer · 2020-05-13T15:53:22Z

xarray/core/computation.py

            # set meta=np.ndarray by default for numpy vectorized functions
            # work around dask bug computing meta with vectorized functions: GH5642
-            meta = np.ndarray
+            # defer raising errors to _apply_blockwise (e.g. if output_dtypes is None)
+            meta = np.ndarray((0, 0), dtype=output_dtypes[0])


shouldn't we still set meta = np.ndarray if no output dtype is specified?

output_dtypes is required for dask="parallelized" and will error if it is missing:

xarray/xarray/core/computation.py

Lines 670 to 674 in 8051c47

if output_dtypes is None:

raise ValueError(

"output dtypes (output_dtypes) must be supplied to "

"apply_func when using dask='parallelized'"

)

so this wont take effect. I am also not very happy with my approach, but didn't want to copy the checks from apply_blockwise up here - suggestions?

Maybe the cleaner workaround is to move this down in to _apply_blockwise? Would it be enough to pass vectorize down to that level and then set meta as you are doing here?

Also, it seems like we should raise that error about output_dtypes only if meta.dtype has not been set?

Yes, I agree it would be cleaner to thread vectorize through to _apply_blockwise.

Also, it seems like we should raise that error about output_dtypes only if meta.dtype has not been set?

Depends how important output_dtypes is for np.vectorize.

I am happy to work more on this, but I think it would be good to discuss #4060 first, which might make this obsolete.

mathause · 2020-08-14T08:32:43Z

I'll close this in favor of #4060

@mathause

* ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` for multiple outputs when `dask='parallelized'`, add/fix tests * DOC: Update docstring and whats-new.rst * WIP: apply_gufunc * WIP: apply_gufunc -> reinstate dask='allowed' as per @mathause, adapting tests * WIP: apply_gufunc -> add test for GH #4015, fix test for sparse meta checking * WIP: apply_gufunc -> remove unused `input_dims` * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * WIP: use dask_gufunc_kwargs, keep vectorize first but only for non-dask-gufunc, rework docstrings, adapt tests * DOC: add reference to internal changes in whats-new.rst * FIX: mypy * FIX: vectorize inside `apply_variable_ufunc` * TST: add tests from #4022 from @mathause * FIX: address black issue * FIX: xfail test for dask < 2.3 * WIP: apply changes in response to @mathause's review comments * WIP: remove line * WIP: catch different chunksize error and allow_rechunk, docstring fixes * WIP: remove comment * WIP: style issues * WIP: revert catch, revert test, add tests without output_dtypes * WIP: fix signature in apply_ufunc->apply_gufunc, handle output_sizes, handle dask version, fix tests * WIP: fix tuple * WIP: add dims_map to _UFuncSignature, adapt output_sizes to fit for apply_gufunc * WIP: black * WIP: raise ValueError if output_sizes dimension mismatch * WIP: raise ValueError if output_sizes is missing for given output_core_dims * WIP: simplify if/else * FIX: resolve conflicts prior merge with master * FIX: combine if's as per review * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `apply_variable_ufunc` as per review suggestion * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `da.apply_gufunc` * FIX: address review comments of @keewis and @mathause * FIX: black * FIX: `vectorize` not needed in if-clause * FIX: set DeprecationWarning and stacklevel=2 * FIX: use FutureWarning for user visibility * FIX: remove comment as suggested Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

mathause added 3 commits May 3, 2020 16:23

add tests

ad5a3f4

update apply_ufunc

ba73f16

whats new

f154d1c

mathause commented May 3, 2020

View reviewed changes

xarray/core/computation.py Outdated Show resolved Hide resolved

ulijh reviewed May 4, 2020

View reviewed changes

xarray/core/computation.py Outdated Show resolved Hide resolved

dcherian reviewed May 5, 2020

View reviewed changes

xarray/tests/test_computation.py Outdated Show resolved Hide resolved

mathause added 2 commits May 5, 2020 11:51

Merge branch 'master' into fix/apply_ufunc_meta_dtype

9514f56

add warning

303cef1

dcherian approved these changes May 5, 2020

View reviewed changes

xarray/core/computation.py Outdated Show resolved Hide resolved

dcherian mentioned this pull request May 5, 2020

0.16.0 release #4031

Closed

23 tasks

mathause added 2 commits May 5, 2020 21:12

combine if statements

db5913a

Merge branch 'master' into fix/apply_ufunc_meta_dtype

1e7256a

shoyer reviewed May 13, 2020

View reviewed changes

mathause mentioned this pull request May 15, 2020

ENH: use dask.array.apply_gufunc in xr.apply_ufunc #4060

Merged

7 tasks

kmuehlbauer added a commit to kmuehlbauer/xarray that referenced this pull request May 28, 2020

TST: add tests from pydata#4022 from @mathause

ddeb1ea

mathause closed this Aug 14, 2020

mathause deleted the fix/apply_ufunc_meta_dtype branch August 19, 2020 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/apply ufunc meta dtype #4022

Fix/apply ufunc meta dtype #4022

mathause commented May 3, 2020

ulijh commented May 4, 2020

dcherian left a comment

shoyer May 13, 2020

mathause May 13, 2020

dcherian May 15, 2020

mathause May 15, 2020

mathause commented Aug 14, 2020

	if output_dtypes is None:
	raise ValueError(
	"output dtypes (output_dtypes) must be supplied to "
	"apply_func when using dask='parallelized'"
	)

Fix/apply ufunc meta dtype #4022

Fix/apply ufunc meta dtype #4022

Conversation

mathause commented May 3, 2020

ulijh commented May 4, 2020

dcherian left a comment

Choose a reason for hiding this comment

shoyer May 13, 2020

Choose a reason for hiding this comment

mathause May 13, 2020

Choose a reason for hiding this comment

dcherian May 15, 2020

Choose a reason for hiding this comment

mathause May 15, 2020

Choose a reason for hiding this comment

mathause commented Aug 14, 2020