MAP metric, fix metric for CUDA execution #673

tkupek · 2021-12-10T12:17:58Z

What does this PR do?

Fixes an issue where the new MAP implementation cannot be executed on CUDA devices.
The tensors have to be initialized/moved to the correct CUDA device.
Fixes #671

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

🎉

codecov · 2021-12-10T12:22:02Z

Codecov Report

Merging #673 (b0a798e) into master (01f88fe) will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #673   +/-   ##
=====================================
  Coverage      95%    95%           
=====================================
  Files         166    166           
  Lines        6377   6379    +2     
=====================================
+ Hits         6070   6074    +4     
+ Misses        307    305    -2

justusschock

Why wasn't this caught in our test? We do have CUDA tests

tkupek · 2021-12-10T12:30:36Z

@justusschock Good question. I don't have a special setup. Just DDP mode with a single GPU.

justusschock · 2021-12-10T13:57:58Z

@tupek could you try to include the necessary changes to the tests in this PR?

tkupek · 2021-12-10T14:01:50Z

Discussed this already in Slack. @twsl you found the issue with the tests did you?

torchmetrics/detection/map.py

Bunoviske · 2021-12-12T17:43:28Z

When executing MAP.compute() on CUDA device, my error was different from #671. I got the following with torchmetrics==0.6.1 and torch==1.10.0:

/usr/local/lib/python3.7/dist-packages/torchmetrics/metric.py in wrapped_func(*args, **kwargs)
    370                 dist_sync_fn=self.dist_sync_fn, should_sync=self._to_sync, should_unsync=self._should_unsync
    371             ):
--> 372                 self._computed = compute(*args, **kwargs)
    373 
    374             return self._computed

/usr/local/lib/python3.7/dist-packages/torchmetrics/detection/map.py in compute(self)
    666             - mar_100_per_class: ``torch.Tensor`` (-1 if class metrics are disabled)
    667         """
--> 668         overall, map, mar = self._calculate(self._get_classes())
    669 
    670         map_per_class_values: Tensor = Tensor([-1])

/usr/local/lib/python3.7/dist-packages/torchmetrics/detection/map.py in _calculate(self, class_ids)
    513         eval_imgs = [
    514             self._evaluate_image(img_id, class_id, area, max_detections, ious)
--> 515             for class_id in class_ids
    516             for area in area_ranges
    517             for img_id in img_ids

/usr/local/lib/python3.7/dist-packages/torchmetrics/detection/map.py in <listcomp>(.0)
    515             for class_id in class_ids
    516             for area in area_ranges
--> 517             for img_id in img_ids
    518         ]
    519 

/usr/local/lib/python3.7/dist-packages/torchmetrics/detection/map.py in _evaluate_image(self, id, class_id, area_range, max_det, ious)
    372 
    373         # sort dt highest score first, sort gt ignore last
--> 374         ignore_area_sorted, gtind = torch.sort(ignore_area)
    375         gt = gt[gtind]
    376         scores = self.detection_scores[id]

RuntimeError: Sort currently does not support bool dtype on CUDA.

tkupek · 2021-12-12T22:00:16Z

@Bunoviske pretty sure this one was fixed in a previous PR. If you install the lib from main branch the error above should show up.

Borda · 2021-12-12T23:47:15Z

pretty sure this one was fixed in a previous PR. If you install the lib from main branch the error above should show up.

you can simply install from future bugfix release as

pip install https://github.com/PyTorchLightning/metrics/archive/refs/heads/release/0.6.x.zip

Borda · 2021-12-13T14:54:27Z

@tkupek could you please add test for this case so we can make quick bug-fix release 🐰
cc: @SkafteNicki

tkupek · 2021-12-13T15:00:21Z

Sorry, I don't know why the CUDA test is not working.

twsl · 2021-12-13T16:25:55Z

I think the issue is that in https://github.com/PyTorchLightning/metrics/blob/d071eb2f1245112c599b7fe0d165b8ac55663083/tests/helpers/testers.py#L168 the passed predictions don't satisfy the condition as a list of lists is passed. Therefor the tensors are never move to device.

SkafteNicki · 2021-12-15T07:39:19Z

I think the tests should be fixed now. Had to change the tester function to move not only tensors but also collection of tensors to the right device.

* move and initialize tensors on the correct device (fix cuda) * remove condition for moving tensors, its done in .to() * fix gpu test * docs Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: SkafteNicki <skaftenicki@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> (cherry picked from commit 07b5dc5)

move and initialize tensors on the correct device (fix cuda)

b7205db

tkupek requested review from ananyahjha93, Borda, ethanwharris, justusschock, SeanNaren, SkafteNicki and tchaton as code owners December 10, 2021 12:17

Borda added the bug / fix Something isn't working label Dec 10, 2021

justusschock reviewed Dec 10, 2021

View reviewed changes

Borda added the Priority Critical task/issue label Dec 10, 2021

Borda added this to the v0.6 milestone Dec 10, 2021

awaelchli reviewed Dec 10, 2021

View reviewed changes

torchmetrics/detection/map.py Outdated Show resolved Hide resolved

remove condition for moving tensors, its done in .to()

81fe48a

Merge branch 'master' into bug/map-devices

97e4327

changelog

777b72e

SkafteNicki approved these changes Dec 13, 2021

View reviewed changes

Merge branch 'master' into bug/map-devices

41abc33

mergify bot added the ready label Dec 13, 2021

Borda and others added 2 commits December 13, 2021 19:10

Merge branch 'master' into bug/map-devices

2972ef0

fix gpu test

dd5680a

docs

b0a798e

Borda approved these changes Dec 15, 2021

View reviewed changes

Borda requested review from justusschock and awaelchli December 15, 2021 09:45

Borda merged commit 07b5dc5 into Lightning-AI:master Dec 15, 2021

Borda added the topic: Image label Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAP metric, fix metric for CUDA execution #673

MAP metric, fix metric for CUDA execution #673

tkupek commented Dec 10, 2021 •

edited by Borda

Loading

codecov bot commented Dec 10, 2021 •

edited

Loading

justusschock left a comment

tkupek commented Dec 10, 2021

justusschock commented Dec 10, 2021

tkupek commented Dec 10, 2021 •

edited

Loading

Bunoviske commented Dec 12, 2021 •

edited

Loading

tkupek commented Dec 12, 2021

Borda commented Dec 12, 2021 •

edited

Loading

Borda commented Dec 13, 2021

tkupek commented Dec 13, 2021

twsl commented Dec 13, 2021

SkafteNicki commented Dec 15, 2021

MAP metric, fix metric for CUDA execution #673

MAP metric, fix metric for CUDA execution #673

Conversation

tkupek commented Dec 10, 2021 • edited by Borda Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Dec 10, 2021 • edited Loading

Codecov Report

justusschock left a comment

Choose a reason for hiding this comment

tkupek commented Dec 10, 2021

justusschock commented Dec 10, 2021

tkupek commented Dec 10, 2021 • edited Loading

Bunoviske commented Dec 12, 2021 • edited Loading

tkupek commented Dec 12, 2021

Borda commented Dec 12, 2021 • edited Loading

Borda commented Dec 13, 2021

tkupek commented Dec 13, 2021

twsl commented Dec 13, 2021

SkafteNicki commented Dec 15, 2021

tkupek commented Dec 10, 2021 •

edited by Borda

Loading

codecov bot commented Dec 10, 2021 •

edited

Loading

tkupek commented Dec 10, 2021 •

edited

Loading

Bunoviske commented Dec 12, 2021 •

edited

Loading

Borda commented Dec 12, 2021 •

edited

Loading