Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend] Allow layout propagation through TransOp. #3316

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

jlebar
Copy link
Collaborator

@jlebar jlebar commented Mar 7, 2024

[NVGPU] Allow layout propagation through TransOp.

This is a revival of #3097, which I now
realize I need for some int4 matmul work I'm doing.

Previously this caused performance problems / running out of shmem on A100, but
it seems that #3261 (or perhaps something
else) fixed it. This now has no performance delta on our internal performance
tests on A100 or H100.

PR chain

  1. 👉 [Backend] Allow layout propagation through TransOp. #3316 👈 YOU ARE HERE

@jlebar jlebar requested a review from ptillet as a code owner March 7, 2024 23:58
@ThomasRaoux
Copy link
Collaborator

can we add a simple lit test that does forward propagation:

 %c = convert...
  %t = trans
  tt.return %tt

that should move the convert after

@jlebar
Copy link
Collaborator Author

jlebar commented Mar 8, 2024

Absolutely, I intended to add a test but wanted to check that the tests were passing first. Looks like they are, so yay. :)

@jlebar jlebar force-pushed the dev-jlebar/trans-op branch from b0be26e to 47d1dca Compare March 8, 2024 07:35
@jlebar jlebar changed the base branch from main to dev-jlebar/propagate-dot-wait March 8, 2024 07:35
@jlebar jlebar force-pushed the dev-jlebar/trans-op branch from 47d1dca to 663662a Compare March 8, 2024 07:38
@jlebar jlebar force-pushed the dev-jlebar/propagate-dot-wait branch 2 times, most recently from b2cae8a to 5fd0d0c Compare March 8, 2024 07:42
@jlebar jlebar force-pushed the dev-jlebar/trans-op branch 2 times, most recently from ff2cbb6 to 4462f94 Compare March 8, 2024 07:50
@jlebar jlebar requested a review from ThomasRaoux March 8, 2024 07:50
@jlebar
Copy link
Collaborator Author

jlebar commented Mar 8, 2024

Added a test; phal, Thomas.

Base automatically changed from dev-jlebar/propagate-dot-wait to main March 8, 2024 07:51
This is a revival of #3097, which I now
realize I need for some int4 matmul work I'm doing.

Previously this caused performance problems / running out of shmem on A100, but
it seems that #3261 (or perhaps something
else) fixed it.  This now has no performance delta on our internal performance
tests on A100 or H100.

GPC: trans-op
@jlebar jlebar force-pushed the dev-jlebar/trans-op branch from 4462f94 to 6aeae0a Compare March 8, 2024 19:12
@jlebar
Copy link
Collaborator Author

jlebar commented Mar 8, 2024

Thanks, Thomas. PHAL.

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlebar jlebar merged commit 3d2a2c2 into main Mar 8, 2024
4 checks passed
@jlebar jlebar deleted the dev-jlebar/trans-op branch March 8, 2024 19:42
htyu pushed a commit to htyu/triton that referenced this pull request Mar 20, 2024
[NVGPU] Allow layout propagation through TransOp.

This is a revival of triton-lang#3097, which I
now realize I need for some int4 matmul work I'm doing.

Previously this caused performance problems / running out of shmem on
A100, but it seems that triton-lang#3261 (or perhaps
something else) fixed it. This now has no performance delta on our internal
performance tests on A100 or H100.
karupayun pushed a commit to openxla/triton that referenced this pull request Apr 3, 2024
[NVGPU] Allow layout propagation through TransOp.

This is a revival of triton-lang#3097, which I
now realize I need for some int4 matmul work I'm doing.

Previously this caused performance problems / running out of shmem on
A100, but it seems that triton-lang#3261 (or perhaps
something else) fixed it. This now has no performance delta on our internal
performance tests on A100 or H100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants