[AMD][Navi31] Support WMMA transformations in AccelerateAMDMatmul pass #3309

joviliast · 2024-03-07T14:31:52Z

-Transform dot
-Transform dot operands to required layouts
-Support datatypes convertion
-add lit test for AccelerateAAMDMatmul pass for WMMA case

joviliast · 2024-03-07T14:35:15Z

Can be merged after
#3170
#3171
#3308

This PR enables whole pipeline for wmma lowering
tested on the wip branch: https://github.com/joviliast/triton/tree/wmma-upstream-wip

zahimoud · 2024-03-12T17:39:22Z

third_party/amd/lib/TritonAMDGPUTransforms/AccelerateAMDMatmul.cpp

+
+MatrixCoreVersion getMatrixCoreVersion(StringRef archGen) {
+  if (archGen.contains("gfx11"))
+    return MatrixCoreVersion::RDNA_WMMA;


is there a reason why we want to pass archGen all the way here to decide on matrix core version ? I thought it's enough to use matrix_core_version from frontend. (cc @zhanglx13 )

RDNA3 arch introduces wmma instructions that are not technically a new version of matrix core (it is only applicable for CDNA). So it couldn't be described by matrix_core_version parameter.
Reserving some value for wmma could be dangerous for future versions of CDNA, thats why I'm passing archGen.
If you have any suggestions please share..

One way to unify wmma and mfma in terms of version is to use the gfx number directly, which corresponds to how AMD tracks hw version today and in the future.
And since on AMD path, we don't use capability anyway, we can just use the gfx number as the version here:

908 <-- matrix_core_version 1

90a <-- matrix_core_version 2

940, 941, 942 <-- matrix_core_version 3

1100 <-- wmma

@zahimoud what do you think?

If we want int, we can use 910 for gfx90a.

Yea I'm fine with passing arch to backend, I would just pass it to TargetInfo object, and in the constructor we can map to an enum class of the arch (probably MatrixCoreVersion) and we would have an api like taretInfo.getMfmaVersion() to get the mfma version directly.

Well, now it is a part of TritonAMDGPUToLLVM CMake target
I think it would be better to create new target AMDTargetInfo instead of adding dependency between TritonAMDGPUTransforms and TritonAMDGPUToLLVM ... WDYT?

Yea let's move it to amd/include, would that work ?

I can see, we also need to move Utility to the separate target. May be there are other nvidia specific pitfalls.
Would it be Ok to do it in separate PR? I can create an issue for that.

Which utility are you referring to and why do you think we should move it ?

https://github.com/openai/triton/blob/main/third_party/amd/lib/TritonAMDGPUToLLVM/Utility.cpp this one should be moved to separate target to avoid cyclic dependency.

I think it is really need to be done, but this PR is not about it.

zhanglx13 · 2024-03-15T16:12:46Z

third_party/amd/backend/compiler.py

-        if(self.matrix_core_version == -1):
-            object.__setattr__(self, 'matrix_core_version', self.get_matrix_core_version(self.arch))
+        if(self.mfma_version == -1):
+            object.__setattr__(self, 'mfma_version', self.get_mfma_version(self.arch))


So on Navi arch, this is set to mfma_version = 0. Then the stream-pipeline pass is disabled.
Is this the expected behavior on Navi?

Good point. Thanks.
Just checked again. Stream pipeline works fine.
Please, check updated https://github.com/openai/triton/pull/3309/files#diff-33c9a103282c05c9d9d213b94450ae7481b6db8c3c6d810f54f175b4735a3c72R39-R44

-Change input option for AccelerateAMDMatmul pass from matrix-core-version to arch-generation-name -Transform dot -Transform dot operands to required layouts -Support datatypes convertion -Enable stream pipeline pass for WMMA -Add lit test for AccelerateAAMDMatmul pass for WMMA case Signed-off-by: joviliast <iveselov.nn@gmail.com>

triton-lang#3309) -Transform dot -Transform dot operands to required layouts -Support datatypes convertion -add lit test for AccelerateAAMDMatmul pass for WMMA case Signed-off-by: joviliast <iveselov.nn@gmail.com>

joviliast force-pushed the wmma-accelerate-amd-matmul branch 5 times, most recently from 51b1bf2 to 287d227 Compare March 11, 2024 11:07

zahimoud reviewed Mar 12, 2024

View reviewed changes

joviliast force-pushed the wmma-accelerate-amd-matmul branch from 5964c3a to 5603f47 Compare March 14, 2024 12:05

joviliast marked this pull request as ready for review March 14, 2024 12:05

joviliast requested review from Jokeren and ptillet as code owners March 14, 2024 12:05

joviliast requested a review from zahimoud March 14, 2024 14:21

zahimoud approved these changes Mar 14, 2024

View reviewed changes

joviliast requested a review from zhanglx13 March 15, 2024 11:04

zhanglx13 reviewed Mar 15, 2024

View reviewed changes

joviliast requested review from zhanglx13 and zahimoud March 15, 2024 17:14

joviliast force-pushed the wmma-accelerate-amd-matmul branch from 1ce9a46 to 2646616 Compare March 15, 2024 17:25

Merge branch 'main' into wmma-accelerate-amd-matmul

984f258

zhanglx13 merged commit ce74d42 into triton-lang:main Mar 15, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][Navi31] Support WMMA transformations in AccelerateAMDMatmul pass #3309

[AMD][Navi31] Support WMMA transformations in AccelerateAMDMatmul pass #3309

joviliast commented Mar 7, 2024

joviliast commented Mar 7, 2024

zahimoud Mar 12, 2024

joviliast Mar 12, 2024

zhanglx13 Mar 12, 2024 •

edited

Loading

zhanglx13 Mar 12, 2024

zahimoud Mar 12, 2024

joviliast Mar 14, 2024 •

edited

Loading

zahimoud Mar 14, 2024

joviliast Mar 14, 2024

zahimoud Mar 14, 2024

joviliast Mar 14, 2024

zhanglx13 Mar 15, 2024

joviliast Mar 15, 2024

[AMD][Navi31] Support WMMA transformations in AccelerateAMDMatmul pass #3309

[AMD][Navi31] Support WMMA transformations in AccelerateAMDMatmul pass #3309

Conversation

joviliast commented Mar 7, 2024

joviliast commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanglx13 Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joviliast Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanglx13 Mar 12, 2024 •

edited

Loading

joviliast Mar 14, 2024 •

edited

Loading