[ROOFLINE] Add CUDA support to roofline analysis #12205

tkonolige · 2022-07-27T18:39:53Z

Add functions to estimate peak flops and bandwidth for CUDA. Add a new registration mechanism to the roofline analysis to support adding any target. This mechanism uses generic functions with overrides. New targets only need to add estimate_peak_bandwidth and estimate_peak_flops functions.

Also fix cuda codegen and tensorcore_infer_fragment.cc to support filling matrix_a and matrix_b fragments.

@AndrewZhaoLuo

AndrewZhaoLuo · 2022-07-28T00:02:51Z

Will take a look tomorrow

AndrewZhaoLuo

Need to grok the tensorcore stuff a bit but seems good so far. On my 3070

I get 420 Gb/s bandwidth vs the 448 advertised. For the TFLops I actually get more than the 40.6 TFLops advertised (I get 41.2 TFlops which seems close enough)

python/tvm/utils/roofline/cuda.py

tests/python/unittest/test_roofline.py

python/tvm/utils/roofline/cuda.py

Add functions to estimate peak flops and bandwidth for CUDA. Add a new registration mechanism to the roofline analysis to support adding any target. This mechanism uses generic functions with overrides. New targets only need to add `estimate_peak_bandwidth` and `estimate_peak_flops` functions. Also fix cuda codegen and tensorcore_infer_fragment.cc to support filling matrix_a and matrix_b fragments.

* [ROOFLINE] Add CUDA support to roofline analysis Add functions to estimate peak flops and bandwidth for CUDA. Add a new registration mechanism to the roofline analysis to support adding any target. This mechanism uses generic functions with overrides. New targets only need to add `estimate_peak_bandwidth` and `estimate_peak_flops` functions. Also fix cuda codegen and tensorcore_infer_fragment.cc to support filling matrix_a and matrix_b fragments. * formatiing * move statement back inside loops * print out report for debugging * default to avx2 * review comments

AndrewZhaoLuo self-requested a review July 27, 2022 20:20

AndrewZhaoLuo reviewed Jul 28, 2022

View reviewed changes

Tristan Konolige added 6 commits July 29, 2022 08:56

formatiing

f853173

move statement back inside loops

68d0a92

print out report for debugging

08e4cc6

default to avx2

42ebae0

review comments

8da98e9

tkonolige force-pushed the cuda_roofline branch from 0f64454 to 8da98e9 Compare July 29, 2022 15:56

AndrewZhaoLuo approved these changes Jul 30, 2022

View reviewed changes

AndrewZhaoLuo merged commit 961a7c7 into apache:main Jul 30, 2022

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROOFLINE] Add CUDA support to roofline analysis #12205

[ROOFLINE] Add CUDA support to roofline analysis #12205

tkonolige commented Jul 27, 2022

AndrewZhaoLuo commented Jul 28, 2022

AndrewZhaoLuo left a comment

[ROOFLINE] Add CUDA support to roofline analysis #12205

[ROOFLINE] Add CUDA support to roofline analysis #12205

Conversation

tkonolige commented Jul 27, 2022

AndrewZhaoLuo commented Jul 28, 2022

AndrewZhaoLuo left a comment

Choose a reason for hiding this comment