-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/apply ufunc meta dtype #4022
Conversation
Thanks @mathause , this works for me. Also with a change of dtype in func, as long as "output_dtypes" is set correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mathause. I just have one minor suggestion. LGTM otherwise.
# set meta=np.ndarray by default for numpy vectorized functions | ||
# work around dask bug computing meta with vectorized functions: GH5642 | ||
meta = np.ndarray | ||
# defer raising errors to _apply_blockwise (e.g. if output_dtypes is None) | ||
meta = np.ndarray((0, 0), dtype=output_dtypes[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we still set meta = np.ndarray
if no output dtype is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_dtypes
is required for dask="parallelized"
and will error if it is missing:
xarray/xarray/core/computation.py
Lines 670 to 674 in 8051c47
if output_dtypes is None: | |
raise ValueError( | |
"output dtypes (output_dtypes) must be supplied to " | |
"apply_func when using dask='parallelized'" | |
) |
so this wont take effect. I am also not very happy with my approach, but didn't want to copy the checks from apply_blockwise
up here - suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the cleaner workaround is to move this down in to _apply_blockwise
? Would it be enough to pass vectorize
down to that level and then set meta
as you are doing here?
Also, it seems like we should raise that error about output_dtypes
only if meta.dtype
has not been set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree it would be cleaner to thread vectorize
through to _apply_blockwise
.
Also, it seems like we should raise that error about output_dtypes only if meta.dtype has not been set?
Depends how important output_dtypes
is for np.vectorize
.
I am happy to work more on this, but I think it would be good to discuss #4060 first, which might make this obsolete.
I'll close this in favor of #4060 |
* ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` for multiple outputs when `dask='parallelized'`, add/fix tests * DOC: Update docstring and whats-new.rst * WIP: apply_gufunc * WIP: apply_gufunc -> reinstate dask='allowed' as per @mathause, adapting tests * WIP: apply_gufunc -> add test for GH #4015, fix test for sparse meta checking * WIP: apply_gufunc -> remove unused `input_dims` * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * WIP: use dask_gufunc_kwargs, keep vectorize first but only for non-dask-gufunc, rework docstrings, adapt tests * DOC: add reference to internal changes in whats-new.rst * FIX: mypy * FIX: vectorize inside `apply_variable_ufunc` * TST: add tests from #4022 from @mathause * FIX: address black issue * FIX: xfail test for dask < 2.3 * WIP: apply changes in response to @mathause's review comments * WIP: remove line * WIP: catch different chunksize error and allow_rechunk, docstring fixes * WIP: remove comment * WIP: style issues * WIP: revert catch, revert test, add tests without output_dtypes * WIP: fix signature in apply_ufunc->apply_gufunc, handle output_sizes, handle dask version, fix tests * WIP: fix tuple * WIP: add dims_map to _UFuncSignature, adapt output_sizes to fit for apply_gufunc * WIP: black * WIP: raise ValueError if output_sizes dimension mismatch * WIP: raise ValueError if output_sizes is missing for given output_core_dims * WIP: simplify if/else * FIX: resolve conflicts prior merge with master * FIX: combine if's as per review * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `apply_variable_ufunc` as per review suggestion * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `da.apply_gufunc` * FIX: address review comments of @keewis and @mathause * FIX: black * FIX: `vectorize` not needed in if-clause * FIX: set DeprecationWarning and stacklevel=2 * FIX: use FutureWarning for user visibility * FIX: remove comment as suggested Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
isort -rc . && black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new API