-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added calibration metric #253
added calibration metric #253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I left some comments.
bofire/surrogates/diagnostics.py
Outdated
"""Calculates the Spearman correlation coefficient between the models absolute error | ||
and the uncertainty - non-linear correlation. | ||
|
||
This implementation is taken from Ax: https://github.com/aspuru-guzik-group/dionysus/blob/main/dionysus/uncertainty_metrics.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From ax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know. Copied from the fisher format. Will just remove it
bofire/surrogates/diagnostics.py
Outdated
"""Calculates the Pearson correlation coefficient between the models absolute error | ||
and the uncertainty - linear correlation. | ||
|
||
This implementation is taken from Ax: https://github.com/aspuru-guzik-group/dionysus/blob/main/dionysus/uncertainty_metrics.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ax?
bofire/surrogates/diagnostics.py
Outdated
"""Calculates the Kendall correlation coefficient between the models absolute error | ||
and the uncertainty - linear correlation. | ||
|
||
This implementation is taken from Ax: https://github.com/aspuru-guzik-group/dionysus/blob/main/dionysus/uncertainty_metrics.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ax?
bofire/surrogates/diagnostics.py
Outdated
standard_deviation: Optional[np.ndarray] = None, | ||
) -> float: | ||
"""Calculates the Negative log-likelihood of predicted data | ||
This implementation is taken from Ax: https://github.com/aspuru-guzik-group/dionysus/blob/main/dionysus/uncertainty_metrics.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ax?
@@ -419,7 +431,12 @@ def test_CvResults2CrossValidationValues(cv_results): | |||
else: | |||
assert transformed["a"][i].standardDeviation is None | |||
for m in metrics.columns: | |||
assert metrics.loc[i, m] == transformed["a"][i].metrics[m] | |||
if cv_results.results[i].standard_deviation is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise i get assert nan = nan error
cv = generate_cvresult(key="a", n_samples=10, include_standard_deviation=False) | ||
assert cv.n_samples == 10 | ||
np.isnan(cv.get_metric(RegressionMetricsEnum.CALIBRATION)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add tests for all the added new metrices starting with an _? Maybe they also had test cases for them in the original implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check then that the warning is raised and the nan returned when no uncertainty is provided for all _ methods that you added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sould i make a seperate dictionary for UQ_metrics like metrics
bofire/surrogates/diagnostics.py
Outdated
np.ndarray: callibration score for each quantile. | ||
""" | ||
if standard_deviation is None: | ||
warnings.warn("Calibration metric without standard deviation is not possible") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise a value error here if no standard deviation is provided, and in the drived metrics then catch this error with a try except and raise a warning and return the nan.
bofire/surrogates/diagnostics.py
Outdated
1.0 / (2.0 * observed.shape[0]) | ||
+ observed.shape[0] * np.log(2 * np.pi) | ||
+ np.sum(np.log(predicted)) | ||
+ np.sum(np.square(predicted - observed) / standard_deviation) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure this is correct? It should be standard_deviation to the power of two. This looks a bit wrong to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it could also be that I overlook something ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied it. maybe they missed a bracket. I will double check and correct accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think u r right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we should have seperate dictionaries and enums. Thanks for the hint.
@@ -40,6 +40,13 @@ def permutation_importance( | |||
RegressionMetricsEnum.MSD: -1.0, | |||
RegressionMetricsEnum.PEARSON: 1.0, | |||
RegressionMetricsEnum.SPEARMAN: 1.0, | |||
RegressionMetricsEnum.PEARSON_UQ: 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove them here, it makes no sense for permutation importance. You are right, we need a second enum and second dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would then iterate here only over the standard regression metrices.
@@ -43,3 +43,10 @@ class RegressionMetricsEnum(Enum): | |||
PEARSON = "PEARSON" | |||
SPEARMAN = "SPEARMAN" | |||
FISHER = "FISHER" | |||
PEARSON_UQ = "PEARSON_UQ" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a new enum named UQRegressionMetricsEnum and move the UQ Metrics into this,
bofire/data_models/enum.py
Outdated
MAXIMUMCALIBRATION = "MAXIMUMCALIBRATION" | ||
MISCALIBRATIONAREA = "MISCALIBRATIONAREA" | ||
ABSOLUTEMISCALIBRATIONAREA = "ABSOLUTEMISCALIBRATIONAREA" | ||
NLL = "NLL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove the complete NLL metric.
@@ -19,6 +19,13 @@ | |||
RegressionMetricsEnum.PEARSON: MaximizeObjective, | |||
RegressionMetricsEnum.SPEARMAN: MaximizeObjective, | |||
RegressionMetricsEnum.FISHER: MaximizeObjective, | |||
RegressionMetricsEnum.PEARSON_UQ: MaximizeObjective, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has to be then updated with the new type.
bofire/surrogates/diagnostics.py
Outdated
@@ -181,6 +453,13 @@ def _fisher_exact_test_p( | |||
RegressionMetricsEnum.PEARSON: _pearson, | |||
RegressionMetricsEnum.SPEARMAN: _spearman, | |||
RegressionMetricsEnum.FISHER: _fisher_exact_test_p, | |||
RegressionMetricsEnum.PEARSON_UQ: _pearson_UQ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to update here, then with UQRegressionMetricsEnum
def test_cvresult_get_UQ_metric_invalid(): | ||
cv = generate_cvresult(key="a", n_samples=10, include_standard_deviation=False) | ||
assert cv.n_samples == 10 | ||
for metric in metrics.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have to iteratre here only over the uq metrices, as only there the warning is raised.
cv = generate_cvresult(key="a", n_samples=10, include_standard_deviation=True) | ||
assert cv.n_samples == 10 | ||
for metric in UQ_metrics.keys(): | ||
cv.get_UQ_metric(metric=metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv.get_UQ_metric(metric=metric) | |
m = cv.get_UQ_metric(metric=metric) | |
assert type(m)==float |
@jduerholt ,
Added calibration metric from https://github.com/aspuru-guzik-group/dionysus/blob/main/dionysus/uncertainty_metrics.py