adapt RPS #277

aaronspring · 2021-02-26T09:37:26Z

Description

fix RPS formula in math
remove wrong dirty fix limit RPS to 1
allow more flexible category_edges: this was a request from @judithberner asking how to work with climatological terciles, which are lon, lat dependent
recheck fair

gist: https://gist.github.com/aaronspring/3c3db6b7d5f39c08643e818b0964ee6c

Closes #275, #266

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Performance (if you touched existing code run asv to detect performance changes)
refactoring

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. This could point to a cell in the updated notebooks. Or a snippet of code with accompanying figures here.

Checklist (while developing)

I have added docstrings to all new functions.
I have commented my code, particularly in hard-to-understand areas
Tests added for pytest, if necessary.

Pre-Merge Checklist (final steps)

I have rebased onto master or develop (wherever I am merging) and dealt with any conflicts.
I have squashed commits to a reasonable amount, and force-pushed the squashed commits.

References

Please add any references to manuscripts, textbooks, etc.

xskillscore/core/probabilistic.py

aaronspring · 2021-02-27T09:02:52Z

Here in https://www.cawcr.gov.au/projects/verification/verif_web_page.html#RPS RPS is defined as the mean over all categories. In wilks 2006 it’s the sum over all categories. Have to add a note to clearly specify which one we use.

xskillscore/core/probabilistic.py

judithberner · 2021-02-28T23:37:33Z

I believe the formula on the webpage is wrong.
E.g. https://journals.ametsoc.org/view/journals/mwre/135/1/mwr3280.1.xml
is highly cited and define the RPS as sum over the categories. It discuss a correction for the use of finite ensemble members which might be interesting in the future.
The original citation is Epstein (1969), J. Appl. Meteor. and can be found as scan here:
https://www.jstor.org/stable/26174707?seq=1

aaronspring · 2021-03-06T20:20:22Z

In the previous version of rps, no member dimension was needed.

requirements.txt

dougiesquire

Thanks again for doing this @aaronspring! And thanks for the motivation @judithberner.

I agree that there seems to be a discrepancy across definitions of RPS. Most uses and implementations do seem to use the sum (though some don't, for example, the verification package in R takes the mean: http://finzi.psych.upenn.edu/R/library/verification/html/rps.html). I think this change makes us consistent with the majority of the literature and with the other rps-type functions in xskillscore. Also the ability to have multi-dimensional edges is really great!

Some comments below from just reading the code. Going to play about with actually using the function now and may add some additional comments.

xskillscore/core/contingency.py

xskillscore/core/probabilistic.py

aaronspring · 2021-03-08T11:16:27Z

@dougiesquire thanks for the review.

what about we add -np.inf, np.inf as the most-left and most-right category_edge by internally default? then for terciles, a user would use need to pass [.33,.66], i.e. the inner category edges but not the outer edges. This would work for probabilities and absolute values.

aaronspring · 2021-03-08T15:52:06Z

I think we need to be explicit about this. If I understand correctly, the current explanation about being similar to np.histogram is not correct. What about:
"Bins include the left edge and are defined to always include the full range of the observations and forecasts distributions. That is, the left-most bin contains data between -infinity and the first element of category_edges, and the right-most bin contains data between the last element of category edges and +infinity."

The new way yields identical results to the histogram algorithm before. The only difference is how the upper edge is treated. xhistogram excludes members which are out of category_edges, the new version doesnt, which is why I added the tests that category_edges cover all observations and forecasts.

I see a few ways how to make this more straightforward:

category_edges defines only the edges in between -np.inf and np.inf, add docstring accordingly
category_edges have to cover observations and forecasts as is now, therefore have check functions

dougiesquire · 2021-03-08T23:01:23Z

The new way yields identical results to the histogram algorithm before. The only difference is how the upper edge is treated. xhistogram excludes members which are out of category_edges, the new version doesnt, which is why I added the tests that category_edges cover all observations and forecasts.

I see a few ways how to make this more straightforward:

category_edges defines only the edges in between -np.inf and np.inf, add docstring accordingly

category_edges have to cover observations and forecasts as is now, therefore have check functions

Doesn't the current approach already implicitly cover -inf to inf? Because the binning is done with <, the first bin includes everything less than the first bin edge (ie everything between -inf and category_edges[0]). Additionally as you mention above, there is an effective right-most edge at inf. This is actually a really nice feature of your current approach. Does that make sense or am I misunderstanding?

So we could just leave as is and be clear in the docstring that this is the case. Alternatively, if you'd prefer to explicitly add -np.inf and np.inf, I think this is also a good solution

aaronspring · 2021-03-08T23:13:08Z

Hm. So for now I throw away the first Fc category edges bin because this corresponds to smaller than the first category_edges bin.

The question is what to expect the user to put in as argument. As category_edges has to span the full distribution, I think going from - inf to +inf is anyways required. I tend to just let the user add the edges in between. Will be an easier interface. And the topmost edge is not needed for rps calculation anyways, as Fc and Oc have to be 1 and hence cancel.

xskillscore/core/probabilistic.py

dougiesquire · 2021-03-08T23:18:42Z

Hm. So for now I throw away the first Fc category edges bin because this corresponds to smaller than the first category_edges bin.

The question is what to expect the user to put in as argument. As category_edges has to span the full distribution, I think going from - inf to +inf is anyways required. I tend to just let the user add the edges in between. Will be an easier interface. And the topmost edge is not needed for rps calculation anyways, as Fc and Oc have to be 1 and hence cancel.

Yes sorry, my above comment is only true if we don't throw away the first category_edge. Then, I think the user could input, for example, [1/3, 2/3] for terciles.

dougiesquire · 2021-03-08T23:46:27Z

Something like this for the docstring:
"Edges (left-edge inclusive) of the bins used to calculate the cumulative density function (cdf). Note that here the bins are defined from category_edges so as to always include the full range of observations and forecasts data. Effectively, negative infinity is appended to the left side of category_edges, and positive infinity is appended to the right side. Thus, N category edges produces N+1 bins. For example, specifying category_edges = [0,1] will compute the cdfs for bins [-inf, 0), [0, 1) and [1, inf)"

dougiesquire · 2021-03-08T23:55:52Z

Something like this for the docstring:
"Edges (left-edge inclusive) of the bins used to calculate the cumulative density function (cdf). Note that here the bins are defined from category_edges so as to always include the full range of observations and forecasts data. Effectively, negative infinity is appended to the left side of category_edges, and positive infinity is appended to the right side. Thus, N category edges produces N+1 bins. For example, specifying category_edges = [0,1] will compute the cdfs for bins [-inf, 0), [0, 1) and [1, inf)"

Sorry, the last sentence here is misleading. The cdf is really computed for [-inf, 0), [-inf, 1) and [-inf, inf). Maybe change this or remove the last sentence. Also, it's probably more important to emphasise that the edges are right-edge exclusive rather than left-edge inclusive (given that the left edge is always effectively -inf).

dougiesquire

Looks great @aaronspring. One more very minor question/comment.

dougiesquire · 2021-03-10T21:17:53Z

xskillscore/core/probabilistic.py

-        forecasts_category_edge     <U38 '[0.0, 0.33), [0.33, 0.66), [0.66, 1.0]'
-        observations_category_edge  <U38 '[0.0, 0.33), [0.33, 0.66), [0.66, 1.0]'
+        forecasts_category_edge     <U38 '[-np.inf, 0.33), [0.33, 0.66), [0.66, np.inf]'
+        observations_category_edge  <U38 '[-np.inf, 0.33), [0.33, 0.66), [0.66, np.inf]'


I'm wondering now if these might be confusing. As in the docstring, the cdf bins are actually all bounded on the left by -np.inf, but these are correct if one is thinking about the pdf. I'm not sure what's clearest for the user:

'[-np.inf, 0.33), [0.33, 0.66), [0.66, np.inf]'; or

'[-np.inf, 0.33), [-np.inf, 0.66), [-np.inf, np.inf]'

Happy for you to decide what you think is the clearest, but we should probably be consistent between the docstring and _assign_rps_category_bounds

Dos the question is whether we think in cdf or pdf? Cumulative bins or single size bins? Although rps is computed based on cdfs, I would still call it category_edges and therefore use the first choice of category edges as coords.

Sounds good, but then we should maybe also change the docstring to be consistent. That is, back to "For example, specifying category_edges = [0,1] will compute the rps for bins [-inf, 0), [0, 1) and [1, inf)"

Then I happy to merge, thanks!

CHANGELOG.rst

fix rps formula, rm clip, rm limit tests, allow many category_edges

3fff094

aaronspring added bug Something isn't working feature request labels Feb 26, 2021

aaronspring self-assigned this Feb 26, 2021

AS added 5 commits February 26, 2021 15:35

add tests

70e1e9b

add tests

072f50f

rtd

00d24b2

rtd

9eea7c3

rtd

ddd51ca

aaronspring marked this pull request as ready for review February 26, 2021 15:58

aaronspring requested a review from dougiesquire February 26, 2021 16:00

aaronspring commented Feb 26, 2021

View reviewed changes

xskillscore/core/probabilistic.py Outdated Show resolved Hide resolved

AS added 3 commits February 26, 2021 17:04

rtd

e0b177a

rtd

eca1d7e

test category_edges np.array or xr.DataArray same results

09b1e86

aaronspring requested a review from raybellwaves February 26, 2021 16:37

aaronspring mentioned this pull request Feb 27, 2021

0.0.19 release? #238

Closed

4 tasks

raybellwaves reviewed Feb 28, 2021

View reviewed changes

xskillscore/core/probabilistic.py Outdated Show resolved Hide resolved

raybellwaves reviewed Feb 28, 2021

View reviewed changes

xskillscore/core/probabilistic.py Outdated Show resolved Hide resolved

raybellwaves reviewed Feb 28, 2021

View reviewed changes

xskillscore/core/probabilistic.py Outdated Show resolved Hide resolved

AS added 3 commits February 28, 2021 14:32

mask all nans

51cb1b0

add Weigel ref

35396ed

move helper functions out of rps

724853d

aaronspring removed the request for review from dougiesquire February 28, 2021 17:04

raybellwaves mentioned this pull request Feb 28, 2021

Check that a and b are xarray objects #29

Open

aaronspring mentioned this pull request Feb 28, 2021

multi-dimensional category_edges for contingency, reliability, etc`` #278

Open

AS added 2 commits March 1, 2021 15:06

working version

0e62ec5

refactor rps_xhist to tests

b2352b2

aaronspring added 2 commits March 6, 2021 21:16

Update probabilistic.py

022bbdf

Update probabilistic.py

b875ddf

aaronspring commented Mar 6, 2021

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

dougiesquire reviewed Mar 7, 2021

View reviewed changes

aaronspring mentioned this pull request Mar 8, 2021

contribution guide judithberner/climpred_CESM1_S2S#10

Merged

7 tasks

aaronspring commented Mar 8, 2021

View reviewed changes

xskillscore/core/probabilistic.py Outdated Show resolved Hide resolved

aaronspring commented Mar 8, 2021

View reviewed changes

xskillscore/core/probabilistic.py Outdated Show resolved Hide resolved

AS added 2 commits March 9, 2021 14:58

set +/- np.inf as category label, less checks

3dd943d

quick-start rps now equals brier

86176cc

aaronspring mentioned this pull request Mar 9, 2021

unpin dask #280

Closed

aaronspring requested a review from dougiesquire March 9, 2021 14:39

dougiesquire approved these changes Mar 10, 2021

View reviewed changes

aaronspring commented Mar 10, 2021

View reviewed changes

CHANGELOG.rst Outdated Show resolved Hide resolved

aaronspring added 7 commits March 10, 2021 22:40

Update CHANGELOG.rst

20610bb

Update probabilistic.py

1b3e36b

Update contingency.py

15923cc

Update requirements.txt

6991802

Update requirements.txt

904a0f3

Update probabilistic.py

0267df1

Update CHANGELOG.rst

efffb79

aaronspring merged commit 493f9af into xarray-contrib:master Mar 10, 2021

aaronspring deleted the AS_adapt_RPS branch March 10, 2021 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapt RPS #277

adapt RPS #277

aaronspring commented Feb 26, 2021 •

edited

Loading

aaronspring commented Feb 27, 2021

judithberner commented Feb 28, 2021

aaronspring commented Mar 6, 2021

dougiesquire left a comment

aaronspring commented Mar 8, 2021

aaronspring commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

aaronspring commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

dougiesquire left a comment

dougiesquire Mar 10, 2021

aaronspring Mar 10, 2021

dougiesquire Mar 10, 2021

adapt RPS #277

adapt RPS #277

Conversation

aaronspring commented Feb 26, 2021 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist (while developing)

Pre-Merge Checklist (final steps)

References

aaronspring commented Feb 27, 2021

judithberner commented Feb 28, 2021

aaronspring commented Mar 6, 2021

dougiesquire left a comment

Choose a reason for hiding this comment

aaronspring commented Mar 8, 2021

aaronspring commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

aaronspring commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

dougiesquire commented Mar 8, 2021

dougiesquire left a comment

Choose a reason for hiding this comment

dougiesquire Mar 10, 2021

Choose a reason for hiding this comment

aaronspring Mar 10, 2021

Choose a reason for hiding this comment

dougiesquire Mar 10, 2021

Choose a reason for hiding this comment

aaronspring commented Feb 26, 2021 •

edited

Loading