Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect value of AUROC when plotting a PrecisionRecallCurve metric with score=True #2405

Closed
tanhevg opened this issue Feb 26, 2024 · 2 comments · Fixed by #2437
Closed
Assignees
Labels
bug / fix Something isn't working help wanted Extra attention is needed v1.2.x

Comments

@tanhevg
Copy link

tanhevg commented Feb 26, 2024

🐛 Bug

test.csv.gz
mc_prec_rec

Unfortunately, the documentation does not give details of how AUROC is computed for a PrecisionRecallCurve. I would expect it to match the value of BinaryAveragePrecision for each class. At least for non-overlapping curves I would expect AUROC values agree with the visual inspection of the curves - the one outside should be larger than the one inside. This is not what is shown on the image provided. I would expect the value for class 1 to be close to 1, and the value for class 0 to be close to 0. In fact, this is what BinaryAveragePrecision gives me.

It would also be nice to label the axes on MultiClassPrecisionRecallCurve, similarly to BinaryPrecisionRecallCurve, especially given how axes appears to have been flipped between versions (1.2.1 vs 1.3.1)

In the documentation for the latest version (1.3.1), the plot shows a negative value for AUROC (-0.639).

To Reproduce

Compute a MultiClassPrecisionRecallCurve metric from the dataset attached and plot it:

Code sample
import pandas as pd
import torch
from torchmetrics.classification import MulticlassPrecisionRecallCurve, BinaryAveragePrecision

df = pd.read_csv('test.csv.gz')
m = MulticlassPrecisionRecallCurve(2)
logits = torch.tensor(df[['logit_0', 'logit_1']].values)
y = torch.tensor(df['y'].values).to(dtype=torch.long)
m.update(logits, ys)
(prec, rec, thresholds) = m.compute()
fig, ax = m.plot(score=True)

m0 = BinaryAveragePrecision()
logits = torch.tensor(df['logit_0'].values)
y = torch.tensor(df['y'].values == 0).to(dtype=torch.long)
print("BinaryAveragePrecision for class 0:", m0(logits, y).item())

m1 = BinaryAveragePrecision()
logits = torch.tensor(df['logit_1'].values)
y = torch.tensor(df['y'].values == 1).to(dtype=torch.long)
print("BinaryAveragePrecision for class 1:", m1(logits, y).item())

fig

Expected behavior

AUCs on the plot should match the printed values for BinaryAveragePrecisions (0.18 and 0.96). Axes should be labelled.

The documentation should be more clear about how the AUC is computed, and should not contain obvious errors (like negative values).

Environment

  • TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source):
    • conda, v1.2.1
  • Python & PyTorch Version (e.g., 1.0):
    • python 3.11
    • pytorch 2.2.0
  • Any other relevant information such as OS (e.g., Linux):
    • MacOS Sonoma 14.2.1, M2Pro

Additional context

@SkafteNicki
Copy link
Member

SkafteNicki commented Mar 7, 2024

Hi @tanhevg, thanks for raising this issue and sorry for the slow response from my side.
The overall problem was that when calculating the auc in the plot method of PrecisionRecallCurve there was a assumption that the x input of the curve is in ascending order but for PR-curve it is in descending order. This should be taken care of now. This should make sure that regardless of method of what way you calculate the area the values will be similar.

However, there will still be a difference in the values compared to BinaryAveragePrecision. The reason for this is that BinaryAveragePrecision uses one type of curve interpolation (as stated in the docs) whereas for PrecisionRecallCurve it uses the trapezoidal rule which will give difference in value.
For the example provided the updated code gives

Class 0 Class 1
BinaryAveragePrecision 0.18 0.96
MulticlassPrecisionRecallCurve 0.16 0.96

I added some notes about this to the documentation in PR #2437 which also mentions the fix above.

@tanhevg
Copy link
Author

tanhevg commented Mar 14, 2024

Thanks for the update, @SkafteNicki . Minor discrepancies between AUROC and BinaryAveragePrecision should not be a problem.

Does the PR also add labels to the axes? I had a quick look at the changes and I don't think it does. It is a bit confusing now, especially after the axes were swapped between v1.2.1 and v1.3.1

Also, there are no changes to the tests in the PR, meaning nothing prevents this or similar issues from reappearing in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed v1.2.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants