Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax check for synchronous dots in matmul loop pipeliner. #3353

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

jlebar
Copy link
Collaborator

@jlebar jlebar commented Mar 12, 2024

Relax check for synchronous dots in matmul loop pipeliner.

Previously we were checking for a sync dot at the beginning or end of the loop;
if we found one, we could omit the final async wait in the loop. But actually
this is too strict: A sync dot anywhere inside the loop is sufficient (sketch
of the proof in the code).

PR chain

  1. 👉 Relax check for synchronous dots in matmul loop pipeliner. #3353 👈 YOU ARE HERE

@jlebar jlebar requested a review from ptillet as a code owner March 12, 2024 17:47
@jlebar jlebar requested a review from ThomasRaoux March 12, 2024 17:48
Previously we were checking for a sync dot at the beginning or end of the loop;
if we found one, we could omit the final async wait in the loop.  But actually
this is too strict: A sync dot *anywhere* inside the loop is sufficient (sketch
of the proof in the code).

GPC: relax-sync-dot-check
@jlebar jlebar force-pushed the dev-jlebar/relax-sync-dot-check branch from 620168a to 88c3965 Compare March 12, 2024 17:50
@@ -1080,80 +1080,54 @@ static std::optional<int> dotCanBeProperlyAsync(ttng::DotAsyncOp dotOp,
}

// If necessary, insert a dot-wait inside the loop, waiting for the results of
// the async dots from iteration i-1 to complete. (We pipeline to depth 2, so
// there are at most 2 copies of each dot_async in flight at a time.)
// the properly-async dots from iteration i-1 to complete. (We pipeline to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never saw this comment. This does not seem to be correct, right? @ThomasRaoux
(It is not really related to this PR)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's not correct about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine, @pawelszczerbuk note that this is done as a post-processing on the pipelining

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, OK, I got confused by it saying that we pipeline to the depth of 2, while in the main pipeliner I think this really depends on the num_stages

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, for the post processing we hard code to 2 stages for the async mma copy (that's why we need to add an extra buffer for mmav3)

@jlebar jlebar enabled auto-merge (squash) March 12, 2024 17:59
@jlebar
Copy link
Collaborator Author

jlebar commented Mar 12, 2024

Thank you for the review!

@jlebar jlebar merged commit 076c80c into main Mar 12, 2024
4 checks passed
@jlebar jlebar deleted the dev-jlebar/relax-sync-dot-check branch March 12, 2024 18:00
htyu pushed a commit to htyu/triton that referenced this pull request Mar 20, 2024
…ng#3353)

Previously we were checking for a sync dot at the beginning or end of
the loop; if we found one, we could omit the final async wait in the loop. But
actually this is too strict: A sync dot *anywhere* inside the loop is sufficient
(sketch of the proof in the code).
karupayun pushed a commit to openxla/triton that referenced this pull request Apr 3, 2024
…ng#3353)

Previously we were checking for a sync dot at the beginning or end of
the loop; if we found one, we could omit the final async wait in the loop. But
actually this is too strict: A sync dot *anywhere* inside the loop is sufficient
(sketch of the proof in the code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants