Fix CI after torchmetrics update #567

danthe3rd · 2022-12-08T10:09:24Z

Stack from ghstack (oldest at bottom):

It now takes an argument: https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html

Change in pytorch lightning:
Lightning-AI/torchmetrics@20eab43

Somehow this is failing with a SEGFAULT on my A100 (in a triton kernel):

#0  0x00007fffc0f62e10 in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#1  0x00007fffc0f9303c in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#2  0x00007fffc0f2ea13 in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#3  0x00007fffc0f94603 in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#4  0x00007fffc119e4a0 in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#5  0x00007fffc0f3728f in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#6  0x00007fffc0f3999f in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#7  0x00007fffc0fdb1c2 in ?? () from /lib/x86_64-linux-gnu/libcuda.so
#8  0x00007fff502234c0 in _launch ()
   from /data/home/XXXXX/.triton/cache/704a3e6949e60326bc68d18a620bee50/layer_norm_fw.so
#9  0x00007fff3c0eea25 in launch ()
   from /data/home/XXXXX/.triton/cache/2cebb5590a024a2e06fe9de08c6b7079/k_dropout_bw.so
#10 0x0000555555698422 in cfunction_call (func=0x7fff3c6e5760, args=<optimized out>, kwargs=<optimized out>)
    at /usr/local/src/conda/python-3.10.6/Objects/methodobject.c:552

It now takes an argument: https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html [ghstack-poisoned]

fmassa

Thanks!

It now takes an argument: https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html Change in pytorch lightning: Lightning-AI/torchmetrics@20eab43 Somehow this is failing with a SEGFAULT on my A100 (in a triton kernel): ``` #0 0x00007fffc0f62e10 in ?? () from /lib/x86_64-linux-gnu/libcuda.so #1 0x00007fffc0f9303c in ?? () from /lib/x86_64-linux-gnu/libcuda.so #2 0x00007fffc0f2ea13 in ?? () from /lib/x86_64-linux-gnu/libcuda.so #3 0x00007fffc0f94603 in ?? () from /lib/x86_64-linux-gnu/libcuda.so #4 0x00007fffc119e4a0 in ?? () from /lib/x86_64-linux-gnu/libcuda.so #5 0x00007fffc0f3728f in ?? () from /lib/x86_64-linux-gnu/libcuda.so #6 0x00007fffc0f3999f in ?? () from /lib/x86_64-linux-gnu/libcuda.so #7 0x00007fffc0fdb1c2 in ?? () from /lib/x86_64-linux-gnu/libcuda.so #8 0x00007fff502234c0 in _launch () from /data/home/XXXXX/.triton/cache/704a3e6949e60326bc68d18a620bee50/layer_norm_fw.so #9 0x00007fff3c0eea25 in launch () from /data/home/XXXXX/.triton/cache/2cebb5590a024a2e06fe9de08c6b7079/k_dropout_bw.so #10 0x0000555555698422 in cfunction_call (func=0x7fff3c6e5760, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.10.6/Objects/methodobject.c:552 ``` [ghstack-poisoned]

It now takes an argument: https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html ghstack-source-id: 124e71e10d9e2f513512cdc08e158a0e1f485239 Pull Request resolved: #567

Fix CI after torchmetrics update

7e40466

It now takes an argument: https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html [ghstack-poisoned]

danthe3rd mentioned this pull request Dec 8, 2022

Refactor1: Move files to fmha/ folder #555

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 8, 2022

danthe3rd requested a review from blefaudeux December 8, 2022 10:20

fmassa approved these changes Dec 8, 2022

View reviewed changes

danthe3rd added 3 commits December 8, 2022 10:28

This was referenced Dec 9, 2022

Refactor7: Restore dropout #570

Merged

Refactor8: Reapply changes from Artem #571

Merged

danthe3rd merged commit 8a6130d into gh/danthe3rd/63/base Dec 9, 2022

danthe3rd deleted the gh/danthe3rd/63/head branch December 9, 2022 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CI after torchmetrics update #567

Fix CI after torchmetrics update #567

danthe3rd commented Dec 8, 2022 •

edited

Loading

fmassa left a comment

Fix CI after torchmetrics update #567

Fix CI after torchmetrics update #567

Conversation

danthe3rd commented Dec 8, 2022 • edited Loading

fmassa left a comment

Choose a reason for hiding this comment

danthe3rd commented Dec 8, 2022 •

edited

Loading