-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TensorIR][ROCm] AMD Matrix Core Support #15106
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
@junrushao please cc junru. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some suggestions.
Hi @LeiWang1999 would you mind fixing the lint issues so that we can merge this? |
yeah, but I found some performance issue, I may have known where the problem is, I will fix the lint after I handle it. |
The performance problem was due to the usage of local memory scope instead of warp scope in tensor intrins. To address this, we need to switch to warp scope in tensorization and pass the "lower_warp_storage" optimization pass to convert warp memory to register files. Using local memory resulted in excessive redundant register file usage, leading to register spills and decreased performance. this issue is hard to analysis in llvm ir btw. I wrote anther HIP source codegen to address the bug more effectively, and which can offer similar performance as llvm ir does. maybe we can open another thread LeiWang1999/tvm/lei/feat-hip. |
Also, all of these codebase worked fine on my workspace (a tvm old release), but it failed in current tvm upstream, i found there're some rocm/llvm backend issues here, I have tried to fix some of them, see more at this comments: #14901 (comment) Please also cc @Lunderberg :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
from tvm.tir.expr import Cast, IntImm | ||
from tvm.tir.function import TensorIntrin | ||
|
||
lift = convert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this alias used anywhere? If not, we can delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I'm sorry that I didn't see the conversation, I think this alias can be deleted, given this pr has been merged and I will be introducing some new features later on, like other data layouts, and we can address the issue when we do that, mark..
Thanks@LeiWang1999! @LeiWang1999 @Lunderberg would be great to followup on the LLVM rocm issues and get things rolling! |
This Pull Request adds support for AMD Matrix Core in TVM.
Changes Made
The following changes have been made to enable AMD Matrix Core support in TVM:
refer to AMD matrix core readme, available tile for the given computations could be:
For each of these computations, only one intrinsic has been chosen for implementation. This decision is based on their identical TFLOPS performance. Considering real-world systems requirements, we have selected a small 'm' tile and a large 'k' tile to optimize the performance.
Please review the changes and provide any feedback or suggestions for improvement, see more discussions here.