Rewrite interp to use `apply_ufunc` #9881

dcherian · 2024-12-13T06:31:59Z

Removes a bunch of complexity around interpolating dask arrays by using apply_ufunc instead of blockwise directly.
A major improvement is that we can now use vectorize=True to get sane dask graphs for vectorized interpolation to chunked arrays (interp performance with chunked dimensions #6799 (comment))
Added a bunch of typing.
Happily this fixes Interpolation with multiple mutlidimensional arrays sharing dims fails #4463

Closes interp performance with chunked dimensions #6799 (comment)
Closes Interpolation with multiple mutlidimensional arrays sharing dims fails #4463
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

cc @ks905383 your vectorized interpolation example now has this graph:

instead of this quadratic monstrosity

dcherian · 2024-12-13T06:32:34Z

xarray/core/dataset.py

@@ -4127,18 +4119,6 @@ def interp(

        coords = either_dict_or_kwargs(coords, coords_kwargs, "interp")
        indexers = dict(self._validate_interp_indexers(coords))
-
-        if coords:


Handled by vectorize=True. This is possibly a perf regression with numpy arrays, but a massive improvement with chunked arrays.

For posterity the bad thing about this approach is that it can greatly expand the number of core dimensions for the problem, limiting the potential for parallelism.

Consider the problem in #6799 (comment). In the following, dimension names are listed out in [].

da[time, q, lat, lon].interp(q=bar[lat,lon]) gets rewritten to da[time,q,lat,lon].interp(q=bar[lat, lon], lat=lat[lat], lon=lon[lon]) which thanks to our automatic rechunking, makes dask merge chunks in lat, lon too, for no benefit.

xarray/core/missing.py

dcherian · 2024-12-13T06:33:17Z

xarray/core/missing.py

-def _chunked_aware_interpnd(var, *coords, interp_func, interp_kwargs, localize=True):
-    """Wrapper for `_interpnd` through `blockwise` for chunked arrays.
-
+def _interpnd(


I merged in two functions to reduce indirection and make it easier to read.

xarray/tests/test_interp.py

dcherian · 2024-12-13T17:14:13Z

xarray/core/missing.py

+        exclude_dims=all_in_core_dims,
+        dask="parallelized",
+        kwargs=dict(interp_func=func, interp_kwargs=kwargs),
+        dask_gufunc_kwargs=dict(output_sizes=output_sizes, allow_rechunk=True),


allow_rechunk=True matches the current behaviour where we rechunk along all core dimensions to a single chunk.

Closes pydata#4463

for more information, see https://pre-commit.ci

dcherian · 2024-12-17T17:15:53Z

Merging on thursday if there are no comments.

IMO this is a big win for maintainability.

headtr1ck · 2024-12-17T17:53:58Z

xarray/core/missing.py

@@ -566,29 +577,30 @@ def _get_valid_fill_mask(arr, dim, limit):
    ) <= limit


-def _localize(var, indexes_coords):
+def _localize(obj: T, indexes_coords: SourceDest) -> tuple[T, SourceDest]:


Probably should use T_Xarray instead of a plain T to get rid of the type ignore at return.

That doesn't have Variable, so I'd have to make a new T_DatasetOrVariable or a protocol with .isel perhaps?

xarray/core/missing.py

xarray/tests/test_interp.py

Co-authored-by: Michael Niklas <mick.niklas@gmail.com>

dcherian added 5 commits December 11, 2024 22:12

Don't eagerly compute dask arrays in localize

c1603c5

Clean up test

6109f6e

Clean up Variable handling

61c5b2c

Silence test warning

81b73b9

Use apply_ufunc instead

652bcc1

dcherian added needs review run-benchmark Run the ASV benchmark workflow labels Dec 13, 2024

dcherian requested a review from Illviljan December 13, 2024 06:32

dcherian commented Dec 13, 2024

View reviewed changes

xarray/core/missing.py Show resolved Hide resolved

dcherian commented Dec 13, 2024

View reviewed changes

xarray/tests/test_interp.py Outdated Show resolved Hide resolved

dcherian commented Dec 13, 2024

View reviewed changes

Add test for pydata#4463

03f0b36

Closes pydata#4463

dcherian added the topic-interpolation label Dec 13, 2024

dcherian added 3 commits December 13, 2024 16:26

complete tests

9b915b2

Add comments

be5c783

Clear up broadcasting

a5e1854

dcherian force-pushed the redo-blockwise-interp branch from 245697e to a5e1854 Compare December 14, 2024 00:06

dcherian added 5 commits December 13, 2024 17:07

typing

eef94fa

try a different warning filter

79a7e56

one more fix

6e22072

types + more duck_array_ops

8d4503a

fixes

586f638

dcherian force-pushed the redo-blockwise-interp branch from 652a239 to 586f638 Compare December 14, 2024 04:02

Illviljan and others added 2 commits December 14, 2024 09:37

Merge branch 'main' into pr/9881

972e9fb

[pre-commit.ci] auto fixes from pre-commit.com hooks

38e66fc

for more information, see https://pre-commit.ci

Illviljan mentioned this pull request Dec 14, 2024

Use integers instead of randint #9889

Merged

1 task

Merge branch 'main' into pr/9881

43a0691

dcherian removed the needs review label Dec 17, 2024

dcherian added the needs review label Dec 17, 2024

dcherian added plan to merge Final call for comments and removed needs review labels Dec 17, 2024

headtr1ck reviewed Dec 17, 2024

View reviewed changes

dcherian and others added 2 commits December 17, 2024 13:20

Apply suggestions from code review

97a388e

Co-authored-by: Michael Niklas <mick.niklas@gmail.com>

Merge branch 'main' into redo-blockwise-interp

ef24840

dcherian mentioned this pull request Dec 18, 2024

BUG: interpolating Dask Array with NumPy Arrays completely blows up the chunk size for multiple dimensions #9907

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite interp to use `apply_ufunc` #9881

Rewrite interp to use `apply_ufunc` #9881

dcherian commented Dec 13, 2024 •

edited

Loading

dcherian Dec 13, 2024

dcherian Dec 13, 2024

dcherian Dec 13, 2024

dcherian Dec 13, 2024

dcherian commented Dec 17, 2024

headtr1ck Dec 17, 2024

dcherian Dec 17, 2024

Rewrite interp to use apply_ufunc #9881

Are you sure you want to change the base?

Rewrite interp to use apply_ufunc #9881

Conversation

dcherian commented Dec 13, 2024 • edited Loading

dcherian Dec 13, 2024

Choose a reason for hiding this comment

dcherian Dec 13, 2024

Choose a reason for hiding this comment

dcherian Dec 13, 2024

Choose a reason for hiding this comment

dcherian Dec 13, 2024

Choose a reason for hiding this comment

dcherian commented Dec 17, 2024

headtr1ck Dec 17, 2024

Choose a reason for hiding this comment

dcherian Dec 17, 2024

Choose a reason for hiding this comment

Rewrite interp to use `apply_ufunc` #9881

Rewrite interp to use `apply_ufunc` #9881

dcherian commented Dec 13, 2024 •

edited

Loading