-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of calibration error metrics #394
Conversation
Hello @edwardclem! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-08-03 07:34:08 UTC |
It looks like pep8speaks doesn't like the math sections of docstrings - is this expected? |
for more information, see https://pre-commit.ci
@edwardclem how is it going, still draft? 🐰 |
Codecov Report
@@ Coverage Diff @@
## master #394 +/- ##
===========================================
+ Coverage 75.56% 96.00% +20.44%
===========================================
Files 124 126 +2
Lines 4002 4083 +81
===========================================
+ Hits 3024 3920 +896
+ Misses 978 163 -815
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edwardclem really great job with this one. Even though I have some comments, they are all minor and we can probably get this merged pretty fast.
Note that the comments to the docstring of the functional implementation also apply to the modular implementation.
Please also add changelog :]
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@SkafteNicki I've fixed the DDP features and I believe I've resolved all of your comments! Let me know if there's anything else I should take a look at. All tests pass on my MacBook, I'll wait until the ubuntu tests run to move out of draft. |
@SkafteNicki seems most test cases are failing... |
for more information, see https://pre-commit.ci
…dwardclem/master
for more information, see https://pre-commit.ci
…dwardclem/master
Head branch was pushed to by a user without write access
I think I fixed it, it was just a small issue with type signatures in the CalibrationError class. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Before submitting
What does this PR do?
Adds metrics described in #218 .
Implements L1, L2, and max-norm classification calibration errors as described here and here. Calibration errors are computed by binning predictions and comparing the empirical probability of correctness (i.e. accuracy) to the confidence - in a frequentist sense, a model is "calibrated" if a prediction with a probability of 60% is correct 60% of the time. Note that currently these probabilities are only computed for the top-1 prediction (as given by the traditional CE definition). There are some variants that take into account all predictions, which is worth including in a future PR but out of scope for this one.
Tests are written using a local copy of the calibration code in this scikit-learn pull request, and should be rewritten to use the master branch once it's merged. The debiasing term described in Verified Uncertainty Calibration is currently not supported by my PR - I am checking with the sklearn developers in the linked PR to see what the correct implementation should be.
NOTE: DDP is currently broken in this PR - working on fixing.