2x~5x speed up for isel() in most cases #3533

crusaderky · 2019-11-14T15:34:24Z

Yet another major improvement for #2799.

Achieve a 2x to 5x boost in isel performance when slicing small arrays by int, slice, list of int, scalar ndarray, or 1-dimensional ndarray.

import xarray

da = xarray.DataArray([[1, 2], [3, 4]], dims=['x', 'y'])
v = da.variable
a = da.variable.values
ds = da.to_dataset(name="d")

ds_with_idx = xarray.Dataset({
    'x': [10, 20],
    'y': [100, 200],
    'd': (('x', 'y'), [[1, 2], [3, 4]])
})
da_with_idx = ds_with_idx.d

# before -> after
%timeit a[0] # 121 ns
%timeit v[0] # 7 µs
%timeit v.isel(x=0) # 10 µs
%timeit da[0]  # 65 µs -> 15 µs
%timeit da.isel(x=0)  # 63 µs -> 13 µs
%timeit ds.isel(x=0)  # 48 µs -> 24 µs
%timeit da_with_idx[0]  # 209 µs -> 82 µs
%timeit da_with_idx.isel(x=0, drop=False)  # 135 µs -> 34 µs
%timeit da_with_idx.isel(x=0, drop=True)  # 101 µs -> 34 µs
%timeit ds_with_idx.isel(x=0, drop=False)  # 90 µs -> 49 µs
%timeit ds_with_idx.isel(x=0, drop=True)  # 65 µs -> 49 µs

Marked as WIP because this commands running the asv suite to verify there are no regressions for large arrays.
(on a separate note, we really need to add the small size cases to asv - as discussed in #3382).

This profoundly alters one of the most important methods in xarray and I must confess it makes me nervous, particularly as I am unsure if the test coverage of DataArray.isel() is as through as that for Dataset.isel().

crusaderky · 2019-11-14T15:35:10Z

@shoyer I struggle to understand the meaning of Dataset.indexes. How does it differ from

{
    k: v.to_index()
    for k, v in self._variables.items()
    if isinstance(v, IndexVariable)
}

?

xarray/core/dataarray.py

max-sixty

Amazing! Great idea!

shoyer · 2019-11-14T23:01:41Z

Dataset.indexes is currently equivalent to just calling to_index() on each appropriate variable. But eventually we want to separate the notion of an index from variables with names matching dimensions, and this is a prerequisite for that.

crusaderky · 2019-11-15T12:30:22Z

I would very much appreciate if everybody could play around with this PR in their projects and confirm that nothing breaks!

crusaderky · 2019-12-03T16:12:30Z

Merging soon if nobody speaks up!

shoyer

I haven't tested this but the logic here looks sound.

xarray/core/dataset.py

crusaderky · 2019-12-04T17:30:33Z

Implemented @shoyer 's suggestions and aligned to master.
Merging into master tomorrow!

crusaderky · 2019-12-05T16:39:29Z

In it goes *ducks for cover*

dcherian · 2019-12-05T16:45:40Z

=) thanks @crusaderky

* upstream/master: Fix map_blocks HLG layering (pydata#3598) Silence sphinx warnings: Round 2 (pydata#3592) 2x~5x speed up for isel() in most cases (pydata#3533) remove xarray again (pydata#3591) fix plotting with transposed nondim coords. (pydata#3441) make coarsen reductions consistent with reductions on other classes (pydata#3500) Resolve the version issues on RTD (pydata#3589) Add bottleneck & rasterio git tip to upstream-dev CI (pydata#3585)

…equiv * 'master' of github.com:pydata/xarray: (28 commits) Add nanmedian for dask arrays (pydata#3604) added pyinterp to related projects (pydata#3655) Allow incomplete hypercubes in combine_by_coords (pydata#3649) concat keeps attrs from first variable. (pydata#3637) Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612) update readthedocs.yml (pydata#3639) silence sphinx warnings round 3 (pydata#3602) Fix/quantile wrong errmsg (pydata#3635) Provide shape info in shape mismatch error. (pydata#3619) Minor doc fixes (pydata#3615) Respect user-specified coordinates attribute. (pydata#3487) Add Facetgrid.row_labels & Facetgrid.col_labels (pydata#3597) Fix pint integration tests (pydata#3600) Minor fix to combine_by_coords to allow for the combination of CFTimeIndexes separated by large time intervals (pydata#3543) Fix map_blocks HLG layering (pydata#3598) Silence sphinx warnings: Round 2 (pydata#3592) 2x~5x speed up for isel() in most cases (pydata#3533) remove xarray again (pydata#3591) fix plotting with transposed nondim coords. (pydata#3441) make coarsen reductions consistent with reductions on other classes (pydata#3500) ...

Speed up isel in most cases

13a80c0

crusaderky requested a review from shoyer November 14, 2019 15:34

crusaderky added the topic-performance label Nov 14, 2019

crusaderky changed the title ~~WIP: Speed up isel in most cases~~ WIP: 2x~5x speed up for isel() in most cases Nov 14, 2019

What's New

dd19da7

crusaderky mentioned this pull request Nov 14, 2019

Performance: numpy indexes small amounts of data 1000 faster than xarray #2799

Open

Trivial

a12641f

max-sixty reviewed Nov 14, 2019

View reviewed changes

xarray/core/dataarray.py Outdated Show resolved Hide resolved

max-sixty approved these changes Nov 14, 2019

View reviewed changes

Use _replace

79fea4d

crusaderky changed the title ~~WIP: 2x~5x speed up for isel() in most cases~~ [needs live testing] 2x~5x speed up for isel() in most cases Nov 15, 2019

shoyer approved these changes Dec 3, 2019

View reviewed changes

xarray/core/dataset.py Outdated Show resolved Hide resolved

xarray/core/dataset.py Outdated Show resolved Hide resolved

crusaderky added 4 commits December 4, 2019 17:08

Merge branch 'master' into fast_isel

3d62776

isort

2a58062

Code review

066cf55

What's New

df22f5d

mypy

1b814e9

crusaderky changed the title ~~[needs live testing] 2x~5x speed up for isel() in most cases~~ 2x~5x speed up for isel() in most cases Dec 5, 2019

crusaderky merged commit 87a25b6 into pydata:master Dec 5, 2019

crusaderky deleted the fast_isel branch December 5, 2019 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2x~5x speed up for isel() in most cases #3533

2x~5x speed up for isel() in most cases #3533

crusaderky commented Nov 14, 2019 •

edited

Loading

crusaderky commented Nov 14, 2019 •

edited

Loading

max-sixty left a comment

shoyer commented Nov 14, 2019

crusaderky commented Nov 15, 2019

crusaderky commented Dec 3, 2019

shoyer left a comment

crusaderky commented Dec 4, 2019

crusaderky commented Dec 5, 2019

dcherian commented Dec 5, 2019

2x~5x speed up for isel() in most cases #3533

2x~5x speed up for isel() in most cases #3533

Conversation

crusaderky commented Nov 14, 2019 • edited Loading

crusaderky commented Nov 14, 2019 • edited Loading

max-sixty left a comment

Choose a reason for hiding this comment

shoyer commented Nov 14, 2019

crusaderky commented Nov 15, 2019

crusaderky commented Dec 3, 2019

shoyer left a comment

Choose a reason for hiding this comment

crusaderky commented Dec 4, 2019

crusaderky commented Dec 5, 2019

dcherian commented Dec 5, 2019

crusaderky commented Nov 14, 2019 •

edited

Loading

crusaderky commented Nov 14, 2019 •

edited

Loading