Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize some copying #7209

Merged
merged 5 commits into from
Nov 30, 2022
Merged

Optimize some copying #7209

merged 5 commits into from
Nov 30, 2022

Conversation

headtr1ck
Copy link
Collaborator

@headtr1ck headtr1ck commented Oct 24, 2022

I have passed along some more memo dicts, which could prevent some double deep-copying of the same data (don't know how exactly, but who knows :P)
Also, I have found some copy calls that did not pass along the deep argument (I am not sure if that breaks things, lets find out).
And finally I have found some places where shallow copies are enough.

All together it should improve the performance a lot when copying things around.

@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Oct 25, 2022
@dcherian
Copy link
Contributor

Thanks @headtr1ck do we have a benchmark for this, if not can we add one please?

@headtr1ck
Copy link
Collaborator Author

Thanks @headtr1ck do we have a benchmark for this, if not can we add one please?

Since the benchmark didn't change we either don't have one or my change doesn't matter much, haha.

I think the most important change is the shallow copy for Variable.to_index_variable, but I will have to check how to test this in a useful way, probably better testing some functions that use it indirectly.

@dschwoerer
Copy link
Contributor

The change does matter - but deep copies are still much more expensive than they used to be (as to be expected, I guess)

@dcherian dcherian requested a review from benbovy November 2, 2022 14:51
@headtr1ck
Copy link
Collaborator Author

The change does matter - but deep copies are still much more expensive than they used to be (as to be expected, I guess)

Do you by any chance know which parts have improved, so we can add them as a benchmark here?

@headtr1ck
Copy link
Collaborator Author

headtr1ck commented Nov 5, 2022

I added a benchmark for swap_dims which uses Variable.to_index_variable when swapping into an existing coord, but I can not see any significant improvement...

Any ideas what else to test?
Maybe Indexes.copy_indexes, but I have not found a more high-level method that can take advantage of the memo dict...

@benbovy
Copy link
Member

benbovy commented Nov 7, 2022

The change in Variable.to_index_variable seems sensible (not sure when one wants a deep copy of an IndexVariable or an Xarray / Pandas index).

to_index_variable may be called in some core functions of Xarray internals (e.g., in as_variable()) so it might be tricky to benchmark its effect Xarray-wise. Perhaps it would be good to track it down in the original issue #7181?

@headtr1ck
Copy link
Collaborator Author

Are we merging this anyway, or should we try harder to find a benchmark that shows some improvement?

@dcherian
Copy link
Contributor

I don't think we need a benchmark to merge. Sorry that wasn't clear, it mostly for information purposes.

The change in Variable.to_index_variable seems sensible (not sure when one wants a deep copy of an IndexVariable or an Xarray / Pandas index).

Great! let's merge

@headtr1ck headtr1ck added the plan to merge Final call for comments label Nov 29, 2022
@dcherian dcherian merged commit 0aee4fe into pydata:main Nov 30, 2022
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 2, 2022
* upstream/main: (39 commits)
  Support the new compression argument in netCDF4 > 1.6.0 (pydata#6981)
  Remove setuptools-scm-git-archive, require setuptools-scm>=7 (pydata#7253)
  Fix mypy failures (pydata#7343)
  Docs: add example of writing and reading groups to netcdf (pydata#7338)
  Reset file pointer to 0 when reading file stream (pydata#7304)
  Enable mypy warn unused ignores (pydata#7335)
  Optimize some copying (pydata#7209)
  Add parse_dims func (pydata#7051)
  Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` (pydata#7229)
  Remove code used to support h5py<2.10.0 (pydata#7334)
  [pre-commit.ci] pre-commit autoupdate (pydata#7330)
  Fix PR number in what’s new (pydata#7331)
  Enable `origin` and `offset` arguments in `resample` (pydata#7284)
  fix doctests: supress urllib3 warning (pydata#7326)
  fix flake8 config (pydata#7321)
  implement Zarr v3 spec support (pydata#6475)
  Fix polyval overloads (pydata#7315)
  deprecate pynio backend (pydata#7301)
  mypy - Remove some ignored packages and modules (pydata#7319)
  Switch to T_DataArray in .coords (pydata#7285)
  ...
@headtr1ck headtr1ck deleted the copy branch December 8, 2022 20:09
@benbovy benbovy mentioned this pull request Feb 10, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-indexing topic-performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xarray 2022.10.0 much slower then 2022.6.0
6 participants