Implementation of calibration error metrics #394

edwardclem · 2021-07-23T02:33:46Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Adds metrics described in #218 .

Implements L1, L2, and max-norm classification calibration errors as described here and here. Calibration errors are computed by binning predictions and comparing the empirical probability of correctness (i.e. accuracy) to the confidence - in a frequentist sense, a model is "calibrated" if a prediction with a probability of 60% is correct 60% of the time. Note that currently these probabilities are only computed for the top-1 prediction (as given by the traditional CE definition). There are some variants that take into account all predictions, which is worth including in a future PR but out of scope for this one.

Tests are written using a local copy of the calibration code in this scikit-learn pull request, and should be rewritten to use the master branch once it's merged. The debiasing term described in Verified Uncertainty Calibration is currently not supported by my PR - I am checking with the sklearn developers in the linked PR to see what the correct implementation should be.

NOTE: DDP is currently broken in this PR - working on fixing.

pep8speaks · 2021-07-23T02:33:49Z

Hello @edwardclem! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-03 07:34:08 UTC

edwardclem · 2021-07-23T02:35:59Z

It looks like pep8speaks doesn't like the math sections of docstrings - is this expected?

for more information, see https://pre-commit.ci

Borda · 2021-07-26T07:35:30Z

@edwardclem how is it going, still draft? 🐰

codecov · 2021-07-26T07:38:54Z

Codecov Report

Merging #394 (984b879) into master (79c966d) will increase coverage by 20.44%.
The diff coverage is 93.82%.

@@             Coverage Diff             @@
##           master     #394       +/-   ##
===========================================
+ Coverage   75.56%   96.00%   +20.44%     
===========================================
  Files         124      126        +2     
  Lines        4002     4083       +81     
===========================================
+ Hits         3024     3920      +896     
+ Misses        978      163      -815

Flag	Coverage Δ
Linux	`74.65% <29.62%> (-0.92%)`	⬇️
Windows	`74.65% <29.62%> (-0.92%)`	⬇️
cpu	`74.65% <29.62%> (-0.92%)`	⬇️
gpu	`96.00% <93.82%> (?)`
macOS	`74.65% <29.62%> (-0.92%)`	⬇️
pytest	`96.00% <93.82%> (+20.44%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
torchmetrics/__init__.py	`100.00% <ø> (ø)`
torchmetrics/functional/classification/iou.py	`100.00% <ø> (+10.00%)`	⬆️
torchmetrics/regression/cosine_similarity.py	`96.42% <ø> (+3.57%)`	⬆️
torchmetrics/utilities/distributed.py	`98.27% <ø> (+81.03%)`	⬆️
torchmetrics/classification/calibration_error.py	`93.10% <93.10%> (ø)`
...ics/functional/classification/calibration_error.py	`93.87% <93.87%> (ø)`
torchmetrics/classification/__init__.py	`100.00% <100.00%> (ø)`
torchmetrics/functional/__init__.py	`100.00% <100.00%> (ø)`
torchmetrics/functional/classification/__init__.py	`100.00% <100.00%> (ø)`
... and 83 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79c966d...984b879. Read the comment docs.

SkafteNicki

@edwardclem really great job with this one. Even though I have some comments, they are all minor and we can probably get this merged pretty fast.
Note that the comments to the docstring of the functional implementation also apply to the modular implementation.
Please also add changelog :]

tests/classification/_sklearn_calibration.py

tests/classification/test_calibration_error.py

torchmetrics/functional/classification/calibration_error.py

torchmetrics/classification/calibration_error.py

tests/classification/test_calibration_error.py

torchmetrics/classification/calibration_error.py

for more information, see https://pre-commit.ci

edwardclem · 2021-07-31T19:21:41Z

@SkafteNicki I've fixed the DDP features and I believe I've resolved all of your comments! Let me know if there's anything else I should take a look at. All tests pass on my MacBook, I'll wait until the ubuntu tests run to move out of draft.

for more information, see https://pre-commit.ci

Borda · 2021-08-02T15:04:23Z

@SkafteNicki seems most test cases are failing...

CHANGELOG.md

for more information, see https://pre-commit.ci

…dwardclem/master

for more information, see https://pre-commit.ci

…dwardclem/master

edwardclem · 2021-08-03T04:07:40Z

@SkafteNicki seems most test cases are failing...

I think I fixed it, it was just a small issue with type signatures in the CalibrationError class.

for more information, see https://pre-commit.ci

tests/helpers/non_sklearn_metrics.py

torchmetrics/classification/calibration_error.py

torchmetrics/functional/classification/calibration_error.py

edwardclem added 5 commits May 20, 2021 23:24

basic ECE functional + class metric working

a202225

max calibration error and multidim-multiclass

67300d5

comb metrics, working functional l2, class broken

2a65d97

removed debias term, ddp still broken

0cd7a33

updated docs

be2cee1

SkafteNicki linked an issue Jul 23, 2021 that may be closed by this pull request

Add Expected Calibration Error #218

Closed

SkafteNicki added enhancement New feature or request New metric labels Jul 23, 2021

Borda and others added 2 commits July 26, 2021 09:34

Merge branch 'master' into master

d6fe8ab

[pre-commit.ci] auto fixes from pre-commit.com hooks

91b0451

for more information, see https://pre-commit.ci

SkafteNicki reviewed Jul 26, 2021

View reviewed changes

Borda assigned SkafteNicki Jul 28, 2021

edwardclem and others added 10 commits July 31, 2021 00:51

fixed part of ddp, added changelog

d9e004c

[pre-commit.ci] auto fixes from pre-commit.com hooks

cd6a334

for more information, see https://pre-commit.ci

fixed ddp, still need to fix input unit tests

e316f64

[pre-commit.ci] auto fixes from pre-commit.com hooks

ed2430f

for more information, see https://pre-commit.ci

removing sklearn_calibration

190ea63

more docstring fixes

b2e8ca6

fixed tests for invalid inputs and added regex

5c661f0

[pre-commit.ci] auto fixes from pre-commit.com hooks

41d6bd8

for more information, see https://pre-commit.ci

added test for non-int val bins

8dd2a2d

Merge branch 'master' of github.com:edwardclem/metrics

ab0f0e1

edwardclem and others added 3 commits July 31, 2021 17:11

removed doctest from calibration_error

9e542fb

flake8/typing cleanup

7a980a7

[pre-commit.ci] auto fixes from pre-commit.com hooks

8f837ad

for more information, see https://pre-commit.ci

SkafteNicki and others added 2 commits August 2, 2021 15:00

Merge branch 'master' into master

e6cb17c

Merge branch 'master' into master

88365ad

Merge branch 'master' into master

a81252b

Borda reviewed Aug 2, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Borda and others added 10 commits August 2, 2021 18:36

Apply suggestions from code review

086886f

[pre-commit.ci] auto fixes from pre-commit.com hooks

77da9ce

for more information, see https://pre-commit.ci

:

c11acc9

Merge branch 'master' of https://github.com/edwardclem/metrics into e…

9fa9863

…dwardclem/master

[pre-commit.ci] auto fixes from pre-commit.com hooks

53c58b6

for more information, see https://pre-commit.ci

...

c0db244

Merge branch 'master' of https://github.com/edwardclem/metrics into e…

3bbc9f5

…dwardclem/master

Merge branch 'master' of github.com:edwardclem/metrics

f50ec75

fixed class variable issue

7fb4508

added docstrings

2d71884

auto-merge was automatically disabled August 3, 2021 04:07
Head branch was pushed to by a user without write access

pre-commit-ci bot and others added 6 commits August 3, 2021 04:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

940fa6c

for more information, see https://pre-commit.ci

more flake8 fixes

8d5a4a1

Merge branch 'master' of github.com:edwardclem/metrics

939bb75

[pre-commit.ci] auto fixes from pre-commit.com hooks

4870b3f

for more information, see https://pre-commit.ci

removed duplicate reference

75cfcac

Merge branch 'master' of github.com:edwardclem/metrics

3d5e91a

Borda approved these changes Aug 3, 2021

View reviewed changes

Apply suggestions from code review

7e9cf6d

mergify bot added the ready label Aug 3, 2021

Merge branch 'master' into master

984b879

SkafteNicki enabled auto-merge (squash) August 3, 2021 07:57

SkafteNicki merged commit 2aaf27f into Lightning-AI:master Aug 3, 2021

Borda added this to the v0.5 milestone Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of calibration error metrics #394

Implementation of calibration error metrics #394

edwardclem commented Jul 23, 2021

pep8speaks commented Jul 23, 2021 •

edited

Loading

edwardclem commented Jul 23, 2021

Borda commented Jul 26, 2021

codecov bot commented Jul 26, 2021 •

edited

Loading

SkafteNicki left a comment

edwardclem commented Jul 31, 2021

Borda commented Aug 2, 2021

edwardclem commented Aug 3, 2021 •

edited

Loading

Implementation of calibration error metrics #394

Implementation of calibration error metrics #394

Conversation

edwardclem commented Jul 23, 2021

Before submitting

What does this PR do?

pep8speaks commented Jul 23, 2021 • edited Loading

Comment last updated at 2021-08-03 07:34:08 UTC

edwardclem commented Jul 23, 2021

Borda commented Jul 26, 2021

codecov bot commented Jul 26, 2021 • edited Loading

Codecov Report

SkafteNicki left a comment

Choose a reason for hiding this comment

edwardclem commented Jul 31, 2021

Borda commented Aug 2, 2021

edwardclem commented Aug 3, 2021 • edited Loading

pep8speaks commented Jul 23, 2021 •

edited

Loading

codecov bot commented Jul 26, 2021 •

edited

Loading

edwardclem commented Aug 3, 2021 •

edited

Loading