Some alignment optimizations #7382

benbovy · 2022-12-15T12:54:56Z

Benchmark added
User visible changes (including notable bug fixes) are documented in whats-new.rst

May fix some performance regressions, e.g., see #7376 (comment).

@ravwojdyla with this PR ds.assign(foo=~ds["d3"]) in your example should be much faster (on par with version 2022.3.0).

This may happen in some (rare?) cases where the objects to align share the same indexes.

If all unindexed dimension sizes match the indexed dimension sizes in the objects to align, we don't need re-indexing.

benbovy · 2022-12-15T13:05:55Z

Quick benchmark taking the example in #7376 (it seems even much faster than in version 2022.3.0!)

# version 2022.3.0
%timeit ds.assign(foo=~ds["d3"])
# 22.5 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

# main branch
%timeit ds.assign(foo=~ds["d3"])
# 193 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# this PR
%timeit ds.assign(foo=~ds["d3"])
# 1.01 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Illviljan · 2022-12-15T17:43:29Z

No benchmark is catching this? Maybe we can add a small one in https://github.com/pydata/xarray/blob/main/asv_bench/benchmarks/indexing.py ?

Illviljan · 2022-12-15T20:19:34Z

xarray/core/indexes.py

@@ -1419,6 +1419,11 @@ def check_variables():
        )

    indexes = [e[0] for e in elements]
+
+    same_objects = all(indexes[0] is other_idx for other_idx in indexes[1:])


Suggested change

same_objects = all(indexes[0] is other_idx for other_idx in indexes[1:])

indexes_0 = indexes[0]

same_objects = all(indexes_0 is other_idx for other_idx in indexes[1:])

No need to use getitem several times for the same value. Similar thing can be done in other places in the function as well.

List indexing is O(1), so should be only a marginal speedup :P

Yeah I think the gain in perf would be negligible.

benbovy · 2022-12-19T14:03:56Z

I don't know if the optimizations added here will benefit a large set of use cases (it took 6 months before seeing an issue report), but it is worth for at least a few of them. This is ready I think (added some benchmarks).

Illviljan · 2023-01-05T21:26:13Z

Thanks, @benbovy !

* main: (41 commits) v2023.01.0 whats-new (pydata#7440) explain keep_attrs in docstring of apply_ufunc (pydata#7445) Add sentence to open_dataset docstring (pydata#7438) pin scipy version in doc environment (pydata#7436) Improve performance for backend datetime handling (pydata#7374) fix typo (pydata#7433) Add lazy backend ASV test (pydata#7426) Pull Request Labeler - Workaround sync-labels bug (pydata#7431) see also : groupby in resample doc and vice-versa (pydata#7425) Some alignment optimizations (pydata#7382) Make `broadcast` and `concat` work with the Array API (pydata#7387) remove `numbagg` and `numba` from the upstream-dev CI (pydata#7416) [pre-commit.ci] pre-commit autoupdate (pydata#7402) Preserve original dtype when accessing MultiIndex levels (pydata#7393) [pre-commit.ci] pre-commit autoupdate (pydata#7389) [pre-commit.ci] pre-commit autoupdate (pydata#7360) COMPAT: Adjust CFTimeIndex.get_loc for pandas 2.0 deprecation enforcement (pydata#7361) Avoid loading entire dataset by getting the nbytes in an array (pydata#7356) `keep_attrs` for pad (pydata#7267) Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4 (pydata#7375) ...

benbovy added 2 commits December 15, 2022 13:45

compare indexes: return early if all same objects

b54bb3b

This may happen in some (rare?) cases where the objects to align share the same indexes.

avoid re-indexing when not needed

689e5dd

If all unindexed dimension sizes match the indexed dimension sizes in the objects to align, we don't need re-indexing.

github-actions bot added the topic-indexing label Dec 15, 2022

ravwojdyla mentioned this pull request Dec 15, 2022

groupby+map performance regression on MultiIndex dataset #7376

Closed

4 tasks

Illviljan added the run-benchmark Run the ASV benchmark workflow label Dec 15, 2022

Illviljan reviewed Dec 15, 2022

View reviewed changes

benbovy added 2 commits December 19, 2022 14:23

add benchmark

94f2d99

update what's new

30d7295

github-actions bot added the topic-performance label Dec 19, 2022

Merge branch 'main' into some-align-optimizations

43ed521

benbovy added the plan to merge Final call for comments label Dec 20, 2022

Merge branch 'main' into pr/7382

95be2d0

Illviljan enabled auto-merge (squash) January 5, 2023 20:45

Illviljan disabled auto-merge January 5, 2023 21:25

Illviljan merged commit d6d2450 into pydata:main Jan 5, 2023

benbovy deleted the some-align-optimizations branch August 30, 2023 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some alignment optimizations #7382

Some alignment optimizations #7382

benbovy commented Dec 15, 2022 •

edited

Loading

benbovy commented Dec 15, 2022

Illviljan commented Dec 15, 2022 •

edited

Loading

Illviljan Dec 15, 2022

headtr1ck Dec 18, 2022

benbovy Dec 19, 2022

benbovy commented Dec 19, 2022

Illviljan commented Jan 5, 2023

	same_objects = all(indexes[0] is other_idx for other_idx in indexes[1:])
	indexes_0 = indexes[0]
	same_objects = all(indexes_0 is other_idx for other_idx in indexes[1:])

Some alignment optimizations #7382

Some alignment optimizations #7382

Conversation

benbovy commented Dec 15, 2022 • edited Loading

benbovy commented Dec 15, 2022

Illviljan commented Dec 15, 2022 • edited Loading

Illviljan Dec 15, 2022

Choose a reason for hiding this comment

headtr1ck Dec 18, 2022

Choose a reason for hiding this comment

benbovy Dec 19, 2022

Choose a reason for hiding this comment

benbovy commented Dec 19, 2022

Illviljan commented Jan 5, 2023

benbovy commented Dec 15, 2022 •

edited

Loading

Illviljan commented Dec 15, 2022 •

edited

Loading