[AMD] [MFMA] Support dot3d in MFMA layout #3600

binarman · 2024-04-08T14:22:21Z

Support 3d tensor when emitting offsets for mfma layouts
Support 3d tensors in Shared to dot operand conversion
Support dot3d in Dialect.cpp
Replace amd::DecomposeConversion with common::ReduceDataDuplication

binarman · 2024-04-08T21:49:41Z

Notes:

This PR is a continuation of #3298

this PR fixes dot3d for MFMA layout only, I am going to prepare additional patch for WMMA(Navi) layout

binarman · 2024-04-08T21:57:21Z

third_party/amd/backend/compiler.py

@@ -23,7 +23,7 @@ class HIPOptions:
    arch: str = None
    allow_fp8e4nv: bool = False
    default_dot_input_precision: str = "ieee"
-    allowed_dot_input_precisions: Tuple[str] = ("ieee", )
+    allowed_dot_input_precisions: Tuple[str] = ("tf32", "ieee")


@zhanglx13
I have a question about this part.
I see two possibilities here:

assume that TF32 is optional low-precision mode of float32, so we can use ordinary float32 even if TF32 is set (this is what happens in this PR)

TF32 is not supported by AMD backend and related tests should be simply skipped

From what I saw in test_dot, tf32 is skipped So I assume it's ok to just skip them on AMD backend.

zhanglx13 · 2024-04-08T22:46:30Z

python/test/unit/language/test_core.py

@@ -3181,8 +3181,8 @@ def kernel(X, stride_xm, stride_xk, Y, stride_yk, stride_yn, W, stride_wn, strid
 @pytest.mark.parametrize("in_dtype_str, out_dtype_str", [('int8', 'int8'), ('float16', 'float16'),
                                                         ('float16', 'float32'), ('float32', 'float32')])
 def test_dot3d(B, num_warps, M, N, K, in_dtype_str, out_dtype_str, device):
-    if is_hip():
-        pytest.skip('TODO test_dot3d not supported on HIP.')
+    if in_dtype_str == 'int8' and is_interpreter():


Is this also true for nv path? Then why was it not caught before?

This is a new change: #3566

This part leaked here during rebase, will remove it

zhanglx13 · 2024-04-08T22:49:11Z

include/triton/Conversion/TritonGPUToLLVM/Utility.h

    const int uniqueValuesPerWarp = 4;
    effectiveWarpSize = i32_val(uniqueValuesPerWarp);
  }
  Value laneId = urem(threadId, effectiveWarpSize);
-
+  // Note: here we assume warpId goes along the M dim first


This is not the case anymore. We should remove it.

- Support 3d tensor when emitting offsets for mfma layouts - Support 3d tensors in Shared to dot operand conversion - Support dot3d in Dialect.cpp - Replace amd::DecomposeConversion with common::ReduceDataDuplication

zahimoud · 2024-04-09T19:58:16Z

lib/Dialect/TritonGPU/IR/Dialect.cpp

@@ -1424,7 +1428,12 @@ void SharedEncodingAttr::print(AsmPrinter &printer) const {
 SmallVector<unsigned>
 AMDMfmaEncodingAttr::getShapePerCTATile(ArrayRef<int64_t> tensorShape) const {
  auto nonKDim = getMDim();


Should we be using mDim and nDim here ?

zahimoud · 2024-04-09T20:11:30Z

third_party/amd/lib/TritonAMDGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMFMA.cpp

+  int64_t mfmaInstrK;
+  // TODO(Lixun): make it simpler
+  // getMFMAInstrShapeForOperands always returns a 2D vector
+  if (rank == 3) {


I thought we were going to do this: #3298 (comment)

Let's leave this for later,
I am going to enable dot3d for WMMA layout as well, after this I'll try to refactor them uniformly.

Are you enabling dot3d for wmma layout in this PR ?

No, this will be separate PR.

I my opinion it is easier to review and fix stuff step by step.

binarman force-pushed the enable_dot3d_mfma branch from 352331c to 8ca5404 Compare April 8, 2024 15:18

binarman marked this pull request as ready for review April 8, 2024 21:02

binarman requested review from Jokeren and ptillet as code owners April 8, 2024 21:02

binarman force-pushed the enable_dot3d_mfma branch from 8f989af to 0466f46 Compare April 8, 2024 21:05

binarman commented Apr 8, 2024

View reviewed changes

zhanglx13 reviewed Apr 8, 2024

View reviewed changes

zhanglx13 and others added 7 commits April 9, 2024 18:51

[AMD] [MFMA] Support dot3d in MFMA layout

f3cf129

- Support 3d tensor when emitting offsets for mfma layouts - Support 3d tensors in Shared to dot operand conversion - Support dot3d in Dialect.cpp - Replace amd::DecomposeConversion with common::ReduceDataDuplication

post rebase fixes and refactoring

54cc120

address review comments

6d4127b

remove DecomposeConversions pass

1609d7d

fix Navi dot

7bd0933

addressing review comments and fix FA

65c9f67

fix formatting and fix cuda input_precision

fae2f6e

binarman force-pushed the enable_dot3d_mfma branch from 955c17d to fae2f6e Compare April 9, 2024 18:51

zahimoud reviewed Apr 9, 2024

View reviewed changes

simplify getShapePerCTATile

d6eb97c

zahimoud approved these changes Apr 9, 2024

View reviewed changes

zahimoud merged commit 3c2f88b into triton-lang:main Apr 9, 2024
5 checks passed

binarman mentioned this pull request May 23, 2024

[AMD][WMMA] Support dot3d #3674

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] [MFMA] Support dot3d in MFMA layout #3600

[AMD] [MFMA] Support dot3d in MFMA layout #3600

binarman commented Apr 8, 2024

binarman commented Apr 8, 2024

binarman Apr 8, 2024

zhanglx13 Apr 8, 2024

zhanglx13 Apr 8, 2024 •

edited

Loading

binarman Apr 9, 2024

zhanglx13 Apr 8, 2024

zahimoud Apr 9, 2024

binarman Apr 9, 2024

zahimoud Apr 9, 2024

binarman Apr 9, 2024

zahimoud Apr 9, 2024

binarman Apr 9, 2024

[AMD] [MFMA] Support dot3d in MFMA layout #3600

[AMD] [MFMA] Support dot3d in MFMA layout #3600

Conversation

binarman commented Apr 8, 2024

binarman commented Apr 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanglx13 Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanglx13 Apr 8, 2024 •

edited

Loading