Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend][amd] Reorder and adjust optimization passes #3516

Merged
merged 1 commit into from
Mar 30, 2024

Conversation

antiagainst
Copy link
Collaborator

@antiagainst antiagainst commented Mar 30, 2024

This commit adjusts optimization passes a bit:

  • Invoke canonicalization and cse after initial conversion to LLVM dialect in MLIR. This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect, which has not much canonicalization by design, or LLVM proper.
  • First attach kernel attributes then link in external libs and run general LLVM optimize. Those kernel attributes may affect how optimizations are done.

This commit adjusts optimization passes a bit:

* Invoke canonicalization and cse after initial conversion to
  LLVM dialect in MLIR. This makes it possible to pick up
  canonicalization in other dialects (e.g., scf) before fully
  going down to LLVM dialect, which has no much canonicalization
  by design, or LLVM proper.
* First attach kernel arguments then link in external libs and
  run general LLVM optimize. Those kernel arguments can affect
  how optimizations are done.
@jlebar
Copy link
Collaborator

jlebar commented Mar 30, 2024

This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect

What's an example of this?

@antiagainst
Copy link
Collaborator Author

This makes it possible to pick up canonicalization in other dialects (e.g., scf) before fully going down to LLVM dialect

What's an example of this?

When debugging print op lowering, I found the following after convert-triton-amdgpu-to-llvm:

    %46 = llvm.mlir.constant(true) : i1 loc(#loc)
    %47 = llvm.addrspacecast %45 : !llvm.ptr<1> to !llvm.ptr loc(#loc)
    %48 = scf.if %46 -> (i32) {
      %141 = llvm.load %47 : !llvm.ptr -> i32 loc(#loc)
      scf.yield %141 : i32 loc(#loc)
    } else {
      %141 = llvm.mlir.constant(0 : i32) : i32 loc(#loc)
      scf.yield %141 : i32 loc(#loc)
    } loc(#loc)

It carries all the way down to converting to llvm ir proper. not a major issue. but still good to have it cleaned a bit.

@jlebar jlebar enabled auto-merge (squash) March 30, 2024 19:02
@jlebar jlebar merged commit 45fff31 into triton-lang:main Mar 30, 2024
5 checks passed
@antiagainst antiagainst deleted the amd-opt branch March 30, 2024 19:04
ptillet pushed a commit that referenced this pull request Apr 1, 2024
This commit adjusts optimization passes a bit:

* Invoke canonicalization and cse after initial conversion to LLVM
dialect in MLIR. This makes it possible to pick up canonicalization in
other dialects (e.g., scf) before fully going down to LLVM dialect,
which has not much canonicalization by design, or LLVM proper.
* First attach kernel attributes then link in external libs and run
general LLVM optimize. Those kernel attributes may affect how
optimizations are done.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants