Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorIR][ROCm] AMD Matrix Core Support #15106

Merged
merged 22 commits into from
Jun 28, 2023
Merged

Conversation

LeiWang1999
Copy link
Contributor

This Pull Request adds support for AMD Matrix Core in TVM.

Changes Made

The following changes have been made to enable AMD Matrix Core support in TVM:

  • Added ROCm tensor intrins for AMD Matrix Core architecture.
  • Added test case of a 1024x1024x1024 dense gemm on each of these computations
  • Implemented the required tile sizes for Matrix FMA (MFMA) computations. The available tile sizes for MFMA are as follows:
    • Integer computation: i8xi8
    • Half-precision computation: f16xf16
    • Single-precision computation: f32xf32

refer to AMD matrix core readme, available tile for the given computations could be:

A/B Data Format C/D Data Format M N K Blocks Cycles Flops/cycle/CU
FP32 FP32 32 32 2 1 64 256
FP32 FP32 16 16 4 1 32 256
FP16 FP32 32 32 8 1 64 1024
FP16 FP32 16 16 16 1 32 1024
INT8 INT32 32 32 8 1 64 1024
INT8 INT32 16 16 16 1 32 1024

For each of these computations, only one intrinsic has been chosen for implementation. This decision is based on their identical TFLOPS performance. Considering real-world systems requirements, we have selected a small 'm' tile and a large 'k' tile to optimize the performance.

Please review the changes and provide any feedback or suggestions for improvement, see more discussions here.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jun 15, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: tensorir, rocm See #10317 for details

Generated by tvm-bot

@LeiWang1999
Copy link
Contributor Author

@junrushao please cc junru.

@junrushao
Copy link
Member

This is awesome to have! @vinx13 @tqchen and I had some experience with MatrixCore a couple of years ago but not upstreamed. Thanks for taking the initiative!

@yzh119 yzh119 changed the title [TensorIR][ROCm] AMD Matrix Core Support. [TensorIR][ROCm] AMD Matrix Core Support Jun 15, 2023
Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some suggestions.

tests/python/unittest/test_tir_schedule_tensorize_mfma.py Outdated Show resolved Hide resolved
@yzh119
Copy link
Member

yzh119 commented Jun 18, 2023

Hi @LeiWang1999 would you mind fixing the lint issues so that we can merge this?

@LeiWang1999
Copy link
Contributor Author

yeah, but I found some performance issue, I may have known where the problem is, I will fix the lint after I handle it.

@LeiWang1999
Copy link
Contributor Author

The performance problem was due to the usage of local memory scope instead of warp scope in tensor intrins. To address this, we need to switch to warp scope in tensorization and pass the "lower_warp_storage" optimization pass to convert warp memory to register files.

Using local memory resulted in excessive redundant register file usage, leading to register spills and decreased performance. this issue is hard to analysis in llvm ir btw. I wrote anther HIP source codegen to address the bug more effectively, and which can offer similar performance as llvm ir does. maybe we can open another thread LeiWang1999/tvm/lei/feat-hip.

@LeiWang1999
Copy link
Contributor Author

Also, all of these codebase worked fine on my workspace (a tvm old release), but it failed in current tvm upstream, i found there're some rocm/llvm backend issues here, I have tried to fix some of them, see more at this comments: #14901 (comment)

Please also cc @Lunderberg :)

@Hzfengsy
Copy link
Member

ping @vinx13 @masahi if you can help review :)

Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

from tvm.tir.expr import Cast, IntImm
from tvm.tir.function import TensorIntrin

lift = convert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this alias used anywhere? If not, we can delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'm sorry that I didn't see the conversation, I think this alias can be deleted, given this pr has been merged and I will be introducing some new features later on, like other data layouts, and we can address the issue when we do that, mark..

@yzh119 yzh119 merged commit 588d1f2 into apache:main Jun 28, 2023
@tqchen
Copy link
Member

tqchen commented Jun 28, 2023

Thanks@LeiWang1999! @LeiWang1999 @Lunderberg would be great to followup on the LLVM rocm issues and get things rolling!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants