Optimize some copying #7209

headtr1ck · 2022-10-24T21:00:21Z

Potentially closes xarray 2022.10.0 much slower then 2022.6.0 #7181
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

I have passed along some more memo dicts, which could prevent some double deep-copying of the same data (don't know how exactly, but who knows :P)
Also, I have found some copy calls that did not pass along the deep argument (I am not sure if that breaks things, lets find out).
And finally I have found some places where shallow copies are enough.

All together it should improve the performance a lot when copying things around.

dcherian · 2022-10-28T16:28:12Z

Thanks @headtr1ck do we have a benchmark for this, if not can we add one please?

headtr1ck · 2022-10-29T15:56:23Z

Thanks @headtr1ck do we have a benchmark for this, if not can we add one please?

Since the benchmark didn't change we either don't have one or my change doesn't matter much, haha.

I think the most important change is the shallow copy for Variable.to_index_variable, but I will have to check how to test this in a useful way, probably better testing some functions that use it indirectly.

dschwoerer · 2022-11-02T11:51:50Z

The change does matter - but deep copies are still much more expensive than they used to be (as to be expected, I guess)

headtr1ck · 2022-11-02T16:05:13Z

The change does matter - but deep copies are still much more expensive than they used to be (as to be expected, I guess)

Do you by any chance know which parts have improved, so we can add them as a benchmark here?

headtr1ck · 2022-11-05T22:23:41Z

I added a benchmark for swap_dims which uses Variable.to_index_variable when swapping into an existing coord, but I can not see any significant improvement...

Any ideas what else to test?
Maybe Indexes.copy_indexes, but I have not found a more high-level method that can take advantage of the memo dict...

benbovy · 2022-11-07T13:09:05Z

The change in Variable.to_index_variable seems sensible (not sure when one wants a deep copy of an IndexVariable or an Xarray / Pandas index).

to_index_variable may be called in some core functions of Xarray internals (e.g., in as_variable()) so it might be tricky to benchmark its effect Xarray-wise. Perhaps it would be good to track it down in the original issue #7181?

headtr1ck · 2022-11-29T19:47:08Z

Are we merging this anyway, or should we try harder to find a benchmark that shows some improvement?

dcherian · 2022-11-29T19:51:28Z

I don't think we need a benchmark to merge. Sorry that wasn't clear, it mostly for information purposes.

The change in Variable.to_index_variable seems sensible (not sure when one wants a deep copy of an IndexVariable or an Xarray / Pandas index).

Great! let's merge

* upstream/main: (39 commits) Support the new compression argument in netCDF4 > 1.6.0 (pydata#6981) Remove setuptools-scm-git-archive, require setuptools-scm>=7 (pydata#7253) Fix mypy failures (pydata#7343) Docs: add example of writing and reading groups to netcdf (pydata#7338) Reset file pointer to 0 when reading file stream (pydata#7304) Enable mypy warn unused ignores (pydata#7335) Optimize some copying (pydata#7209) Add parse_dims func (pydata#7051) Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` (pydata#7229) Remove code used to support h5py<2.10.0 (pydata#7334) [pre-commit.ci] pre-commit autoupdate (pydata#7330) Fix PR number in what’s new (pydata#7331) Enable `origin` and `offset` arguments in `resample` (pydata#7284) fix doctests: supress urllib3 warning (pydata#7326) fix flake8 config (pydata#7321) implement Zarr v3 spec support (pydata#6475) Fix polyval overloads (pydata#7315) deprecate pynio backend (pydata#7301) mypy - Remove some ignored packages and modules (pydata#7319) Switch to T_DataArray in .coords (pydata#7285) ...

pass some deep copy args along

4a6742b

github-actions bot added the topic-indexing label Oct 24, 2022

headtr1ck mentioned this pull request Oct 24, 2022

xarray 2022.10.0 much slower then 2022.6.0 #7181

Closed

fix mypy

f098fea

TomNicholas added topic-performance and removed topic-indexing labels Oct 25, 2022

Illviljan added the run-benchmark Run the ASV benchmark workflow label Oct 25, 2022

dcherian requested a review from benbovy November 2, 2022 14:51

headtr1ck added 2 commits November 5, 2022 21:21

Merge branch 'main' into copy

0f4e836

add swap_dims benchmark

4efecde

github-actions bot added the topic-indexing label Nov 5, 2022

hopefully fix asv

d0f74c9

headtr1ck added the plan to merge Final call for comments label Nov 29, 2022

dcherian merged commit 0aee4fe into pydata:main Nov 30, 2022

headtr1ck deleted the copy branch December 8, 2022 20:09

benbovy mentioned this pull request Feb 10, 2023

Coordinates not deep copy #7463

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize some copying #7209

Optimize some copying #7209

headtr1ck commented Oct 24, 2022 •

edited

Loading

dcherian commented Oct 28, 2022

headtr1ck commented Oct 29, 2022

dschwoerer commented Nov 2, 2022

headtr1ck commented Nov 2, 2022

headtr1ck commented Nov 5, 2022 •

edited

Loading

benbovy commented Nov 7, 2022

headtr1ck commented Nov 29, 2022

dcherian commented Nov 29, 2022

Optimize some copying #7209

Optimize some copying #7209

Conversation

headtr1ck commented Oct 24, 2022 • edited Loading

dcherian commented Oct 28, 2022

headtr1ck commented Oct 29, 2022

dschwoerer commented Nov 2, 2022

headtr1ck commented Nov 2, 2022

headtr1ck commented Nov 5, 2022 • edited Loading

benbovy commented Nov 7, 2022

headtr1ck commented Nov 29, 2022

dcherian commented Nov 29, 2022

headtr1ck commented Oct 24, 2022 •

edited

Loading

headtr1ck commented Nov 5, 2022 •

edited

Loading