Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glob to create_ensembles and mention flox in doc. #1081

Merged
merged 12 commits into from
May 20, 2022
Merged

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented May 16, 2022

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • HISTORY.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added
  • bumpversion patch has been called on this branch
  • The relevant author information has been added to .zenodo.json

What kind of change does this PR introduce?

  • This adds the possibility to pass a glob string to create_ensemble instead of a list of path.
  • Added flox to the upstream test suite.
  • Added mentions of flox in the installation instructions.
  • Removed deprecated extra deps example in installation instructions and move clisops to the extra section.

Does this PR introduce a breaking change?

No.

Other information:

This is a very minor change, more like a reason to trigger the github actions and run all tests, including the one against xarray's main branch.

@aulemahal aulemahal requested a review from tlogan2000 May 16, 2022 15:42
@aulemahal aulemahal changed the title Add glob to create_ensembles Add glob to create_ensembles and mention flox in doc. May 16, 2022
@aulemahal aulemahal requested a review from Zeitsperre May 16, 2022 16:00
@aulemahal
Copy link
Collaborator Author

aulemahal commented May 16, 2022

Oh! Well @dcherian, it looks like flox and the main of xarray breaks xclim! It seems the error are related to a different treatment of nans and errors in the results precision. Could the new algorithm modify the results slightly?

I'll test in more details on my side.

@coveralls
Copy link

coveralls commented May 16, 2022

Coverage Status

Coverage decreased (-0.005%) to 91.656% when pulling daa1c8c on ensemble_glob into b08fa84 on master.

@dcherian
Copy link

Yeah I had to make some tests a little looser w.r.t precision.

The other failures look real :( NaNs in the wrong place; grouping by cftime;

@dcherian
Copy link

I noticed in some places you're using np.testing to compare xarray objects. Should be nicer to use xarray.testing instead (it will compare coordinate locations too)

    def test_season(self, tasmin_series, calendar):
        ts = tasmin_series(np.zeros(360))
        ts = convert_calendar(ts, calendar, missing=0, align_on="date")
    
        miss = missing.missing_any(ts, freq="YS", season="MAM")
>       np.testing.assert_equal(miss, [False])
E       AssertionError: 
E       Items are not equal:
E        ACTUAL: <xarray.DataArray (time: 1)>
E       array([ True])
E       Coordinates:
E         * time     (time) datetime64[ns] 2001-01-01
E        DESIRED: [False]

docs/installation.rst Show resolved Hide resolved
docs/installation.rst Outdated Show resolved Hide resolved
docs/installation.rst Outdated Show resolved Hide resolved
HISTORY.rst Outdated Show resolved Hide resolved
Co-authored-by: Trevor James Smith <10819524+Zeitsperre@users.noreply.github.com>
@aulemahal
Copy link
Collaborator Author

Some errors are caused by the resampling of cftime arrays, I opened an issue : pydata/xarray#6613, But for now, we can simply call those with a xr.set_options(use_flox=False) context.

@aulemahal
Copy link
Collaborator Author

aulemahal commented May 16, 2022

Most other errors seem to be because of pydata/xarray#6615:
Flox uses np.add for the sumoperation. But np.add acts as a OR on boolean values, while np.sum implicitly converts booleans to integers.

Xclim extensively resamples and sums over boolean values. The older xclim code usually explicitly casted those boolean arrays to int by multiplying by 1. But the newer code assumes the implicit behaviour from numpy/xarray and these parts are now broken.

xclim/core/missing.py Outdated Show resolved Hide resolved
xclim/ensembles/_base.py Outdated Show resolved Hide resolved
@aulemahal
Copy link
Collaborator Author

@tlogan2000 This is ready for review.

@aulemahal aulemahal requested a review from Zeitsperre May 18, 2022 16:19
Copy link
Collaborator

@tlogan2000 tlogan2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! @dcherian thank-you for this addition to xarray I think xclim peformance will benefit a lot from this change.

@aulemahal @dcherian just so it's clear in my mind : what is expected behavior going forward on something like (xr.DataArray() > thresh).resample(time='YS').sum(dim='time') ? Will the booleans be cast to ints in the end with flox?

@dcherian
Copy link

Will the booleans be cast to ints in the end with flox?

Yes it will promote bool to int before aggregating. (xarray-contrib/flox@01b1afe). You could require flox >=0.5.3 to be extra sure.

@Zeitsperre Zeitsperre merged commit 1d8014c into master May 20, 2022
@Zeitsperre Zeitsperre deleted the ensemble_glob branch May 20, 2022 13:40
@Zeitsperre
Copy link
Collaborator

@dcherian Thanks again for your work and considerations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run test against xarray main
5 participants