Relax check for synchronous dots in matmul loop pipeliner. #3353

jlebar · 2024-03-12T17:47:17Z

Relax check for synchronous dots in matmul loop pipeliner.

Previously we were checking for a sync dot at the beginning or end of the loop;
if we found one, we could omit the final async wait in the loop. But actually
this is too strict: A sync dot anywhere inside the loop is sufficient (sketch
of the proof in the code).

PR chain

👉 Relax check for synchronous dots in matmul loop pipeliner. #3353 👈 YOU ARE HERE

Previously we were checking for a sync dot at the beginning or end of the loop; if we found one, we could omit the final async wait in the loop. But actually this is too strict: A sync dot *anywhere* inside the loop is sufficient (sketch of the proof in the code). GPC: relax-sync-dot-check

pawelszczerbuk · 2024-03-12T17:50:55Z

lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp

@@ -1080,80 +1080,54 @@ static std::optional<int> dotCanBeProperlyAsync(ttng::DotAsyncOp dotOp,
 }

 // If necessary, insert a dot-wait inside the loop, waiting for the results of
-// the async dots from iteration i-1 to complete.  (We pipeline to depth 2, so
-// there are at most 2 copies of each dot_async in flight at a time.)
+// the properly-async dots from iteration i-1 to complete.  (We pipeline to


I have never saw this comment. This does not seem to be correct, right? @ThomasRaoux
(It is not really related to this PR)

what's not correct about this?

I think it is fine, @pawelszczerbuk note that this is done as a post-processing on the pipelining

Oh, OK, I got confused by it saying that we pipeline to the depth of 2, while in the main pipeliner I think this really depends on the num_stages

right, for the post processing we hard code to 2 stages for the async mma copy (that's why we need to add an extra buffer for mmav3)

jlebar · 2024-03-12T17:59:45Z

Thank you for the review!

…ng#3353) Previously we were checking for a sync dot at the beginning or end of the loop; if we found one, we could omit the final async wait in the loop. But actually this is too strict: A sync dot *anywhere* inside the loop is sufficient (sketch of the proof in the code).

jlebar requested a review from ptillet as a code owner March 12, 2024 17:47

jlebar requested a review from ThomasRaoux March 12, 2024 17:48

jlebar force-pushed the dev-jlebar/relax-sync-dot-check branch from 620168a to 88c3965 Compare March 12, 2024 17:50

pawelszczerbuk reviewed Mar 12, 2024

View reviewed changes

ThomasRaoux approved these changes Mar 12, 2024

View reviewed changes

jlebar enabled auto-merge (squash) March 12, 2024 17:59

jlebar merged commit 076c80c into main Mar 12, 2024
4 checks passed

jlebar deleted the dev-jlebar/relax-sync-dot-check branch March 12, 2024 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax check for synchronous dots in matmul loop pipeliner. #3353

Relax check for synchronous dots in matmul loop pipeliner. #3353

jlebar commented Mar 12, 2024 •

edited

Loading

pawelszczerbuk Mar 12, 2024

jlebar Mar 12, 2024

ThomasRaoux Mar 12, 2024

pawelszczerbuk Mar 12, 2024

ThomasRaoux Mar 12, 2024

jlebar commented Mar 12, 2024

Relax check for synchronous dots in matmul loop pipeliner. #3353

Relax check for synchronous dots in matmul loop pipeliner. #3353

Conversation

jlebar commented Mar 12, 2024 • edited Loading

PR chain

pawelszczerbuk Mar 12, 2024

Choose a reason for hiding this comment

jlebar Mar 12, 2024

Choose a reason for hiding this comment

ThomasRaoux Mar 12, 2024

Choose a reason for hiding this comment

pawelszczerbuk Mar 12, 2024

Choose a reason for hiding this comment

ThomasRaoux Mar 12, 2024

Choose a reason for hiding this comment

jlebar commented Mar 12, 2024

jlebar commented Mar 12, 2024 •

edited

Loading