Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip dask rolling #9909

Merged
merged 4 commits into from
Dec 18, 2024
Merged

Skip dask rolling #9909

merged 4 commits into from
Dec 18, 2024

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Dec 18, 2024

Skip the rolling tests using dask so the CI becomes usable again.

Test with terrible performance, feel free to fix it and reactivate this test:

import numpy as np
import pandas as pd

import xarray as xr


def randn(shape, frac_nan=None, chunks=None, seed=0):
    rng = np.random.default_rng(seed)
    if chunks is None:
        x = rng.standard_normal(shape)
    else:
        import dask.array as da

        rng = da.random.default_rng(seed)
        x = rng.standard_normal(shape, chunks=chunks)

    if frac_nan is not None:
        inds = rng.choice(range(x.size), int(x.size * frac_nan))
        x.flat[inds] = np.nan

    return x


nx = 3000
long_nx = 30000
ny = 200
nt = 1000
window = 20

randn_xy = randn((nx, ny), frac_nan=0.1)
randn_xt = randn((nx, nt))
randn_t = randn((nt,))
randn_long = randn((long_nx,), frac_nan=0.1)


ds = xr.Dataset(
    {
        "var1": (("x", "y"), randn_xy),
        "var2": (("x", "t"), randn_xt),
        "var3": (("t",), randn_t),
    },
    coords={
        "x": np.arange(nx),
        "y": np.linspace(0, 1, ny),
        "t": pd.date_range("1970-01-01", periods=nt, freq="D"),
        "x_coords": ("x", np.linspace(1.1, 2.1, nx)),
    },
)
window_ = 20
min_periods = 5
use_bottleneck = False
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 601 ms ± 43.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


ds = ds.chunk({"x": 100, "y": 50, "t": 50})
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 1min 9s ± 1.31 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Dec 18, 2024
@Illviljan Illviljan merged commit a90fff9 into pydata:main Dec 18, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark Run the ASV benchmark workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant