Incorrect value of AUROC when plotting a `PrecisionRecallCurve` metric with `score=True` #2405

tanhevg · 2024-02-26T12:06:15Z

🐛 Bug

test.csv.gz

Unfortunately, the documentation does not give details of how AUROC is computed for a PrecisionRecallCurve. I would expect it to match the value of BinaryAveragePrecision for each class. At least for non-overlapping curves I would expect AUROC values agree with the visual inspection of the curves - the one outside should be larger than the one inside. This is not what is shown on the image provided. I would expect the value for class 1 to be close to 1, and the value for class 0 to be close to 0. In fact, this is what BinaryAveragePrecision gives me.

It would also be nice to label the axes on MultiClassPrecisionRecallCurve, similarly to BinaryPrecisionRecallCurve, especially given how axes appears to have been flipped between versions (1.2.1 vs 1.3.1)

In the documentation for the latest version (1.3.1), the plot shows a negative value for AUROC (-0.639).

To Reproduce

Compute a MultiClassPrecisionRecallCurve metric from the dataset attached and plot it:

Code sample

import pandas as pd
import torch
from torchmetrics.classification import MulticlassPrecisionRecallCurve, BinaryAveragePrecision

df = pd.read_csv('test.csv.gz')
m = MulticlassPrecisionRecallCurve(2)
logits = torch.tensor(df[['logit_0', 'logit_1']].values)
y = torch.tensor(df['y'].values).to(dtype=torch.long)
m.update(logits, ys)
(prec, rec, thresholds) = m.compute()
fig, ax = m.plot(score=True)

m0 = BinaryAveragePrecision()
logits = torch.tensor(df['logit_0'].values)
y = torch.tensor(df['y'].values == 0).to(dtype=torch.long)
print("BinaryAveragePrecision for class 0:", m0(logits, y).item())

m1 = BinaryAveragePrecision()
logits = torch.tensor(df['logit_1'].values)
y = torch.tensor(df['y'].values == 1).to(dtype=torch.long)
print("BinaryAveragePrecision for class 1:", m1(logits, y).item())

fig

Expected behavior

AUCs on the plot should match the printed values for BinaryAveragePrecisions (0.18 and 0.96). Axes should be labelled.

The documentation should be more clear about how the AUC is computed, and should not contain obvious errors (like negative values).

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source):
- conda, v1.2.1
Python & PyTorch Version (e.g., 1.0):
- python 3.11
- pytorch 2.2.0
Any other relevant information such as OS (e.g., Linux):
- MacOS Sonoma 14.2.1, M2Pro

Additional context

The text was updated successfully, but these errors were encountered:

SkafteNicki · 2024-03-07T08:08:44Z

Hi @tanhevg, thanks for raising this issue and sorry for the slow response from my side.
The overall problem was that when calculating the auc in the plot method of PrecisionRecallCurve there was a assumption that the x input of the curve is in ascending order but for PR-curve it is in descending order. This should be taken care of now. This should make sure that regardless of method of what way you calculate the area the values will be similar.

However, there will still be a difference in the values compared to BinaryAveragePrecision. The reason for this is that BinaryAveragePrecision uses one type of curve interpolation (as stated in the docs) whereas for PrecisionRecallCurve it uses the trapezoidal rule which will give difference in value.
For the example provided the updated code gives

	Class 0	Class 1
BinaryAveragePrecision	0.18	0.96
MulticlassPrecisionRecallCurve	0.16	0.96

I added some notes about this to the documentation in PR #2437 which also mentions the fix above.

tanhevg · 2024-03-14T12:14:28Z

Thanks for the update, @SkafteNicki . Minor discrepancies between AUROC and BinaryAveragePrecision should not be a problem.

Does the PR also add labels to the axes? I had a quick look at the changes and I don't think it does. It is a bit confusing now, especially after the axes were swapped between v1.2.1 and v1.3.1

Also, there are no changes to the tests in the PR, meaning nothing prevents this or similar issues from reappearing in the future.

tanhevg added bug / fix Something isn't working help wanted Extra attention is needed labels Feb 26, 2024

Borda added the v1.2.x label Feb 27, 2024

Borda assigned SkafteNicki Feb 27, 2024

SkafteNicki mentioned this issue Mar 7, 2024

Fix how auc scores are calculated in PrecisionRecallCurve.plot methods #2437

Merged

4 tasks

justusschock closed this as completed in #2437 Mar 15, 2024

baskrahmer mentioned this issue Mar 19, 2024

Fix axis names #2462

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect value of AUROC when plotting a `PrecisionRecallCurve` metric with `score=True` #2405

Incorrect value of AUROC when plotting a `PrecisionRecallCurve` metric with `score=True` #2405

tanhevg commented Feb 26, 2024 •

edited

Loading

SkafteNicki commented Mar 7, 2024 •

edited

Loading

tanhevg commented Mar 14, 2024 •

edited

Loading

Incorrect value of AUROC when plotting a PrecisionRecallCurve metric with score=True #2405

Incorrect value of AUROC when plotting a PrecisionRecallCurve metric with score=True #2405

Comments

tanhevg commented Feb 26, 2024 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

SkafteNicki commented Mar 7, 2024 • edited Loading

tanhevg commented Mar 14, 2024 • edited Loading

Incorrect value of AUROC when plotting a `PrecisionRecallCurve` metric with `score=True` #2405

Incorrect value of AUROC when plotting a `PrecisionRecallCurve` metric with `score=True` #2405

tanhevg commented Feb 26, 2024 •

edited

Loading

SkafteNicki commented Mar 7, 2024 •

edited

Loading

tanhevg commented Mar 14, 2024 •

edited

Loading