Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/apply ufunc meta dtype #4022

Closed
wants to merge 7 commits into from

Conversation

mathause
Copy link
Collaborator

@mathause mathause commented May 3, 2020

xarray/core/computation.py Outdated Show resolved Hide resolved
xarray/core/computation.py Outdated Show resolved Hide resolved
@ulijh
Copy link
Contributor

ulijh commented May 4, 2020

Thanks @mathause , this works for me. Also with a change of dtype in func, as long as "output_dtypes" is set correctly.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mathause. I just have one minor suggestion. LGTM otherwise.

xarray/core/computation.py Outdated Show resolved Hide resolved
@dcherian dcherian mentioned this pull request May 5, 2020
23 tasks
# set meta=np.ndarray by default for numpy vectorized functions
# work around dask bug computing meta with vectorized functions: GH5642
meta = np.ndarray
# defer raising errors to _apply_blockwise (e.g. if output_dtypes is None)
meta = np.ndarray((0, 0), dtype=output_dtypes[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we still set meta = np.ndarray if no output dtype is specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_dtypes is required for dask="parallelized" and will error if it is missing:

if output_dtypes is None:
raise ValueError(
"output dtypes (output_dtypes) must be supplied to "
"apply_func when using dask='parallelized'"
)

so this wont take effect. I am also not very happy with my approach, but didn't want to copy the checks from apply_blockwise up here - suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the cleaner workaround is to move this down in to _apply_blockwise? Would it be enough to pass vectorize down to that level and then set meta as you are doing here?

Also, it seems like we should raise that error about output_dtypes only if meta.dtype has not been set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree it would be cleaner to thread vectorize through to _apply_blockwise.

Also, it seems like we should raise that error about output_dtypes only if meta.dtype has not been set?

Depends how important output_dtypes is for np.vectorize.

I am happy to work more on this, but I think it would be good to discuss #4060 first, which might make this obsolete.

kmuehlbauer added a commit to kmuehlbauer/xarray that referenced this pull request May 28, 2020
@mathause
Copy link
Collaborator Author

I'll close this in favor of #4060

@mathause mathause closed this Aug 14, 2020
mathause added a commit that referenced this pull request Aug 19, 2020
* ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` for multiple outputs when `dask='parallelized'`, add/fix tests

* DOC: Update docstring and whats-new.rst

* WIP: apply_gufunc

* WIP: apply_gufunc -> reinstate dask='allowed' as per @mathause, adapting tests

* WIP: apply_gufunc -> add test for GH #4015, fix test for sparse meta checking

* WIP: apply_gufunc -> remove unused `input_dims`

* Update xarray/core/computation.py

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update xarray/core/computation.py

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update xarray/core/computation.py

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* WIP: use dask_gufunc_kwargs, keep vectorize first but only for non-dask-gufunc, rework docstrings, adapt tests

* DOC: add reference to internal changes in whats-new.rst

* FIX: mypy

* FIX: vectorize inside `apply_variable_ufunc`

* TST: add tests from #4022 from @mathause

* FIX: address black issue

* FIX: xfail test for dask < 2.3

* WIP: apply changes in response to @mathause's review comments

* WIP: remove line

* WIP: catch different chunksize error and allow_rechunk, docstring fixes

* WIP: remove comment

* WIP: style issues

* WIP: revert catch, revert test, add tests without output_dtypes

* WIP: fix signature in apply_ufunc->apply_gufunc, handle output_sizes, handle dask version, fix tests

* WIP: fix tuple

* WIP: add dims_map to _UFuncSignature, adapt output_sizes to fit for apply_gufunc

* WIP: black

* WIP: raise ValueError if output_sizes dimension mismatch

* WIP: raise ValueError if output_sizes is missing for given output_core_dims

* WIP: simplify if/else

* FIX: resolve conflicts prior merge with master

* FIX: combine if's as per review

* FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `apply_variable_ufunc` as per review suggestion

* FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `da.apply_gufunc`

* FIX: address review comments of @keewis and @mathause

* FIX: black

* FIX: `vectorize` not needed in if-clause

* FIX: set DeprecationWarning and stacklevel=2

* FIX: use FutureWarning for user visibility

* FIX: remove comment as suggested

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@mathause mathause deleted the fix/apply_ufunc_meta_dtype branch August 19, 2020 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apply_ufunc gives wrong dtype with dask=parallelized and vectorized=True
4 participants