[backend][amd] Reorder and adjust optimization passes #3516

antiagainst · 2024-03-30T18:48:36Z

This commit adjusts optimization passes a bit:

Invoke canonicalization and cse after initial conversion to LLVM dialect in MLIR. This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect, which has not much canonicalization by design, or LLVM proper.
First attach kernel attributes then link in external libs and run general LLVM optimize. Those kernel attributes may affect how optimizations are done.

This commit adjusts optimization passes a bit: * Invoke canonicalization and cse after initial conversion to LLVM dialect in MLIR. This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect, which has no much canonicalization by design, or LLVM proper. * First attach kernel arguments then link in external libs and run general LLVM optimize. Those kernel arguments can affect how optimizations are done.

jlebar · 2024-03-30T18:52:59Z

This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect

What's an example of this?

antiagainst · 2024-03-30T18:58:47Z

This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect

What's an example of this?

When debugging print op lowering, I found the following after convert-triton-amdgpu-to-llvm:

    %46 = llvm.mlir.constant(true) : i1 loc(#loc)
    %47 = llvm.addrspacecast %45 : !llvm.ptr<1> to !llvm.ptr loc(#loc)
    %48 = scf.if %46 -> (i32) {
      %141 = llvm.load %47 : !llvm.ptr -> i32 loc(#loc)
      scf.yield %141 : i32 loc(#loc)
    } else {
      %141 = llvm.mlir.constant(0 : i32) : i32 loc(#loc)
      scf.yield %141 : i32 loc(#loc)
    } loc(#loc)

It carries all the way down to converting to llvm ir proper. not a major issue. but still good to have it cleaned a bit.

This commit adjusts optimization passes a bit: * Invoke canonicalization and cse after initial conversion to LLVM dialect in MLIR. This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect, which has not much canonicalization by design, or LLVM proper. * First attach kernel attributes then link in external libs and run general LLVM optimize. Those kernel attributes may affect how optimizations are done.

antiagainst requested a review from ptillet as a code owner March 30, 2024 18:48

antiagainst force-pushed the amd-opt branch from 8f7f9d8 to cae78cf Compare March 30, 2024 18:49

antiagainst force-pushed the amd-opt branch from cae78cf to d9af739 Compare March 30, 2024 18:50

jlebar approved these changes Mar 30, 2024

View reviewed changes

jlebar enabled auto-merge (squash) March 30, 2024 19:02

jlebar merged commit 45fff31 into triton-lang:main Mar 30, 2024
5 checks passed

antiagainst deleted the amd-opt branch March 30, 2024 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend][amd] Reorder and adjust optimization passes #3516

[backend][amd] Reorder and adjust optimization passes #3516

antiagainst commented Mar 30, 2024 •

edited by jlebar

Loading

jlebar commented Mar 30, 2024

antiagainst commented Mar 30, 2024

[backend][amd] Reorder and adjust optimization passes #3516

[backend][amd] Reorder and adjust optimization passes #3516

Conversation

antiagainst commented Mar 30, 2024 • edited by jlebar Loading

jlebar commented Mar 30, 2024

antiagainst commented Mar 30, 2024

antiagainst commented Mar 30, 2024 •

edited by jlebar

Loading