[TensorIR][ROCm] AMD Matrix Core Support #15106

LeiWang1999 · 2023-06-15T14:18:03Z

This Pull Request adds support for AMD Matrix Core in TVM.

Changes Made

The following changes have been made to enable AMD Matrix Core support in TVM:

Added ROCm tensor intrins for AMD Matrix Core architecture.
Added test case of a 1024x1024x1024 dense gemm on each of these computations
Implemented the required tile sizes for Matrix FMA (MFMA) computations. The available tile sizes for MFMA are as follows:
- Integer computation: i8xi8
- Half-precision computation: f16xf16
- Single-precision computation: f32xf32

refer to AMD matrix core readme, available tile for the given computations could be:

A/B Data Format	C/D Data Format	M	N	K	Blocks	Cycles	Flops/cycle/CU
FP32	FP32	32	32	2	1	64	256
FP32	FP32	16	16	4	1	32	256
FP16	FP32	32	32	8	1	64	1024
FP16	FP32	16	16	16	1	32	1024
INT8	INT32	32	32	8	1	64	1024
INT8	INT32	16	16	16	1	32	1024

For each of these computations, only one intrinsic has been chosen for implementation. This decision is based on their identical TFLOPS performance. Considering real-world systems requirements, we have selected a small 'm' tile and a large 'k' tile to optimize the performance.

Please review the changes and provide any feedback or suggestions for improvement, see more discussions here.

tvm-bot · 2023-06-15T14:18:09Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

No users to tag found in teams: tensorir, rocm _{See #10317 for details}

_{Generated by tvm-bot}

LeiWang1999 · 2023-06-15T14:18:24Z

@junrushao please cc junru.

junrushao · 2023-06-15T16:34:19Z

This is awesome to have! @vinx13 @tqchen and I had some experience with MatrixCore a couple of years ago but not upstreamed. Thanks for taking the initiative!

yzh119

LGTM, some suggestions.

tests/python/unittest/test_tir_schedule_tensorize_mfma.py

python/tvm/testing/tir.py

tests/python/unittest/test_tir_schedule_tensorize_mfma.py

yzh119 · 2023-06-18T15:52:15Z

Hi @LeiWang1999 would you mind fixing the lint issues so that we can merge this?

LeiWang1999 · 2023-06-18T16:20:33Z

yeah, but I found some performance issue, I may have known where the problem is, I will fix the lint after I handle it.

…at-mfma

LeiWang1999 · 2023-06-20T10:01:34Z

The performance problem was due to the usage of local memory scope instead of warp scope in tensor intrins. To address this, we need to switch to warp scope in tensorization and pass the "lower_warp_storage" optimization pass to convert warp memory to register files.

Using local memory resulted in excessive redundant register file usage, leading to register spills and decreased performance. this issue is hard to analysis in llvm ir btw. I wrote anther HIP source codegen to address the bug more effectively, and which can offer similar performance as llvm ir does. maybe we can open another thread LeiWang1999/tvm/lei/feat-hip.

LeiWang1999 · 2023-06-20T14:09:40Z

Also, all of these codebase worked fine on my workspace (a tvm old release), but it failed in current tvm upstream, i found there're some rocm/llvm backend issues here, I have tried to fix some of them, see more at this comments: #14901 (comment)

Please also cc @Lunderberg :)

Hzfengsy · 2023-06-22T05:49:12Z

ping @vinx13 @masahi if you can help review :)

yzh119

LGTM

yzh119 · 2023-06-19T20:30:50Z

python/tvm/tir/tensor_intrin/rocm.py

+from tvm.tir.expr import Cast, IntImm
+from tvm.tir.function import TensorIntrin
+
+lift = convert


Is this alias used anywhere? If not, we can delete it.

sure, I'm sorry that I didn't see the conversation, I think this alias can be deleted, given this pr has been merged and I will be introducing some new features later on, like other data layouts, and we can address the issue when we do that, mark..

tqchen · 2023-06-28T11:38:44Z

Thanks@LeiWang1999! @LeiWang1999 @Lunderberg would be great to followup on the LLVM rocm issues and get things rolling!

LeiWang1999 and others added 6 commits June 13, 2023 20:10

fix rocm arch parse issue,

c01a5c0

Merge branch 'apache:main' into main

ff6025d

test case init.

dcd341a

mfma scheduler

c99a487

support intrinsic.

7585cb4

test case update ( i8-i32 f16-f32

b4f2d92

tqchen assigned vinx13 Jun 15, 2023

yzh119 changed the title ~~[TensorIR][ROCm] AMD Matrix Core Support.~~ [TensorIR][ROCm] AMD Matrix Core Support Jun 15, 2023

yzh119 reviewed Jun 16, 2023

View reviewed changes

tests/python/unittest/test_tir_schedule_tensorize_mfma.py Outdated Show resolved Hide resolved

LeiWang1999 added 2 commits June 16, 2023 17:20

detect if the arch support matrix core

4f7a2c2

replace requires_rocm to requires_matrixcore

7f73cc4

Hzfengsy reviewed Jun 17, 2023

View reviewed changes

python/tvm/testing/tir.py Outdated Show resolved Hide resolved

tests/python/unittest/test_tir_schedule_tensorize_mfma.py Show resolved Hide resolved

remove redundant debug print

3a5a025

LeiWang1999 added 12 commits June 19, 2023 06:32

fix lint & replace local with warp for lowering

41ba943

Merge branch 'main' of https://github.com/LeiWang1999/tvm into lei/fe…

776a82e

…at-mfma

replace default set rocm arch from parsering

fc24d19

Merge branch 'main' of https://github.com/apache/tvm into lei/feat-mfma

fc56d69

auto detect rocm archtecture.

99ca403

code optimize

41c0d77

simple typo fix.

8b8be44

lint

81f7b3b

fix lint

5965348

lint fix

3cfc4d5

i8 warp mfma fix

671fb4d

typo fix

ace2884

fix get_rocm_arch.

230d54c

yzh119 approved these changes Jun 28, 2023

View reviewed changes

yzh119 merged commit 588d1f2 into apache:main Jun 28, 2023

ysh329 mentioned this pull request Jul 12, 2023

[Release] v0.13.0 Release Candidate Notes #15295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorIR][ROCm] AMD Matrix Core Support #15106

[TensorIR][ROCm] AMD Matrix Core Support #15106

LeiWang1999 commented Jun 15, 2023

tvm-bot commented Jun 15, 2023

LeiWang1999 commented Jun 15, 2023

junrushao commented Jun 15, 2023

yzh119 left a comment

yzh119 commented Jun 18, 2023 •

edited

Loading

LeiWang1999 commented Jun 18, 2023

LeiWang1999 commented Jun 20, 2023

LeiWang1999 commented Jun 20, 2023

Hzfengsy commented Jun 22, 2023

yzh119 left a comment

yzh119 Jun 19, 2023

LeiWang1999 Jun 28, 2023

tqchen commented Jun 28, 2023

[TensorIR][ROCm] AMD Matrix Core Support #15106

[TensorIR][ROCm] AMD Matrix Core Support #15106

Conversation

LeiWang1999 commented Jun 15, 2023

Changes Made

tvm-bot commented Jun 15, 2023

LeiWang1999 commented Jun 15, 2023

junrushao commented Jun 15, 2023

yzh119 left a comment

Choose a reason for hiding this comment

yzh119 commented Jun 18, 2023 • edited Loading

LeiWang1999 commented Jun 18, 2023

LeiWang1999 commented Jun 20, 2023

LeiWang1999 commented Jun 20, 2023

Hzfengsy commented Jun 22, 2023

yzh119 left a comment

Choose a reason for hiding this comment

yzh119 Jun 19, 2023

Choose a reason for hiding this comment

LeiWang1999 Jun 28, 2023

Choose a reason for hiding this comment

tqchen commented Jun 28, 2023

yzh119 commented Jun 18, 2023 •

edited

Loading