Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actually make the fast code path return early for Aligner.align #7222

Merged
merged 5 commits into from
Oct 28, 2022

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented Oct 26, 2022

In relation to my other PR.

Without this PR
image

With the early return
image

Removing the frivolous copy (does not pass tests)

image

Code for benchmark
from tqdm import tqdm
import xarray as xr
from time import perf_counter
import numpy as np

N = 1000

# Everybody is lazy loading now, so lets force modules to get instantiated
dummy_dataset = xr.Dataset()
dummy_dataset['a'] = 1
dummy_dataset['b'] = 1
del dummy_dataset

time_elapsed = np.zeros(N)
dataset = xr.Dataset()

# tqdm = iter
for i in tqdm(range(N)):
    time_start = perf_counter()
    dataset[f"var{i}"] = i
    time_end = perf_counter()
    time_elapsed[i] = time_end - time_start
    
    
# %%
from matplotlib import pyplot as plt

plt.plot(np.arange(N), time_elapsed * 1E3, label='Time to add one variable')
plt.xlabel("Number of existing variables")
plt.ylabel("Time to add a variables (ms)")
plt.ylim([0, 10])
plt.grid(True)

xref: #7221

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@hmaarrfk
Copy link
Contributor Author

The reason this is a separate merge request, is that I agree that this is more contentious as a change.

However, I will argue that Aligner should really not be a class.

Using ripgrep you find that the only instances of Aligner exist internally:

xarray/core/dataset.py
2775:        aligner: alignment.Aligner,
2783:        """Callback called from ``Aligner`` to create a new reindexed Dataset."""

xarray/core/alignment.py
107:class Aligner(Generic[DataAlignable]):
114:    aligner = Aligner(*objects, **kwargs)           <------- Example
767:    aligner = Aligner(                                          <----------- Used and consumed for the method `align`
881:    aligner = Aligner(                                          <----------- Used and consumed for the method `reindex`
909:        # This check is not performed in Aligner.

xarray/core/dataarray.py
1752:        aligner: alignment.Aligner,
1760:        """Callback called from ``Aligner`` to create a new reindexed DataArray."""

@hmaarrfk
Copy link
Contributor Author

hmm ok. it seems i can't blatently avoid the copy like that.

@hmaarrfk
Copy link
Contributor Author

I think the rapid return, helps by about 40% is still pretty good.

@benbovy
Copy link
Member

benbovy commented Oct 27, 2022

Thanks @hmaarrfk!

I think the rapid return, helps by about 40% is still pretty good.

Yes definitely. I think we just forgot to add it.

However, I will argue that Aligner should really not be a class.

The reason of using a class is mainly for better code readability and also so that it is easier to refactor later. The alignment logic is really complex with lots of intermediate objects that are created and/or used at various stages. Probably using functions with some custom containers would have achieved the same goal, to be fair. This part of Xarray internals still deserves to be improved, but that would be a lot of work especially for such a critical piece of code in Xarray.

@hmaarrfk hmaarrfk mentioned this pull request Oct 27, 2022
4 tasks
@hmaarrfk
Copy link
Contributor Author

but that would be a lot of work especially for such a critical piece of code in Xarray.

Agreed. I'll take the small wins where I can :D.

Great! I think this will be a good addition with: #7223 (comment)

@dcherian dcherian added the run-benchmark Run the ASV benchmark workflow label Oct 27, 2022
@github-actions github-actions bot removed the run-benchmark Run the ASV benchmark workflow label Oct 27, 2022
@dcherian dcherian added the plan to merge Final call for comments label Oct 27, 2022
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Oct 27, 2022
@github-actions github-actions bot removed the run-benchmark Run the ASV benchmark workflow label Oct 28, 2022
@dcherian
Copy link
Contributor

       before           after         ratio
     [040816a6]       [cd40c8a0]
-         486±6μs          311±5μs     0.64  merge.DatasetAddVariable.time_variable_insertion(10)
-        19.1±2ms       12.0±0.2ms     0.63  merge.DatasetAddVariable.time_variable_insertion(1000)
-     2.18±0.02ms      1.34±0.03ms     0.62  merge.DatasetAddVariable.time_variable_insertion(100)

nice work @hmaarrfk !

@dcherian dcherian merged commit 65bfa4d into pydata:main Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants