-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax check for synchronous dots in matmul loop pipeliner. #3353
Conversation
Previously we were checking for a sync dot at the beginning or end of the loop; if we found one, we could omit the final async wait in the loop. But actually this is too strict: A sync dot *anywhere* inside the loop is sufficient (sketch of the proof in the code). GPC: relax-sync-dot-check
620168a
to
88c3965
Compare
@@ -1080,80 +1080,54 @@ static std::optional<int> dotCanBeProperlyAsync(ttng::DotAsyncOp dotOp, | |||
} | |||
|
|||
// If necessary, insert a dot-wait inside the loop, waiting for the results of | |||
// the async dots from iteration i-1 to complete. (We pipeline to depth 2, so | |||
// there are at most 2 copies of each dot_async in flight at a time.) | |||
// the properly-async dots from iteration i-1 to complete. (We pipeline to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never saw this comment. This does not seem to be correct, right? @ThomasRaoux
(It is not really related to this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's not correct about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine, @pawelszczerbuk note that this is done as a post-processing on the pipelining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, OK, I got confused by it saying that we pipeline to the depth of 2, while in the main pipeliner I think this really depends on the num_stages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, for the post processing we hard code to 2 stages for the async mma copy (that's why we need to add an extra buffer for mmav3)
Thank you for the review! |
…ng#3353) Previously we were checking for a sync dot at the beginning or end of the loop; if we found one, we could omit the final async wait in the loop. But actually this is too strict: A sync dot *anywhere* inside the loop is sufficient (sketch of the proof in the code).
…ng#3353) Previously we were checking for a sync dot at the beginning or end of the loop; if we found one, we could omit the final async wait in the loop. But actually this is too strict: A sync dot *anywhere* inside the loop is sufficient (sketch of the proof in the code).
Relax check for synchronous dots in matmul loop pipeliner.
Previously we were checking for a sync dot at the beginning or end of the loop;
if we found one, we could omit the final async wait in the loop. But actually
this is too strict: A sync dot anywhere inside the loop is sufficient (sketch
of the proof in the code).
PR chain