Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend] Only consider valid LoadOps for pipelining #3517

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

chenyang78
Copy link
Contributor

@chenyang78 chenyang78 commented Mar 31, 2024

This PR fixes #3454

Previously, we included all LoadOp(s) into opInfo, whose contents are used for pipelining. However, this approach causes issues when we decide to not pipeline some LoadOps. Particularly, those LoadOps that are excluded will end up with an inconsistent state which violates the relevant integrity checks, and cause ICEs.

We fix the issue by only including LoadOps that are valid for pipelining.

@chenyang78 chenyang78 requested a review from ptillet as a code owner March 31, 2024 09:57
@@ -403,6 +407,17 @@ collectOpsToPipeline(scf::ForOp forOp,
// fixed.
if (!loadInfo.sharedEncoding)
continue;
} else if (isa<tt::LoadOp>(distAndUse.second)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still one problem left even after this fix:
We calculate stagesBetweenLoads based on the longest chain that is present in loadOpToDistAndUse. If we can "break the chain" after loadOpToDistAndUse is filled, this value may be incorrect in some cases.
This being said, this fix will help most cases, and does not really make the situation worse, we already can have improper loadOpToDistAndUse because we are dropping the loads here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the explanation, @pawelszczerbuk !

@pawelszczerbuk
Copy link
Contributor

Thanks for the change @chenyang78 ! This will resolve some of the issues, but as I have mentioned in the inline comment, we still may hit some other problems in this code. I think we need to have single place where we drop the loads and can know the length of the "longest chain" of loads. Let's go with this change for now, and clean this code a bit soon.

@pawelszczerbuk pawelszczerbuk enabled auto-merge (squash) April 2, 2024 17:18
This PR fixed triton-lang#3454

Previously, we include all LoadOp(s) into opInfo, which content is
used for pipelining. However, this approach may cause issues when
we decide to not pipeline some LoadOps. Particularly, those
LoadOps being excluded will end up with an inconsistent state which
violates the relevant integraty checks, and hence cause ICEs.

We fixed the issue by only including LoadOps that are valid
for pipelining.
auto-merge was automatically disabled April 2, 2024 19:03

Head branch was pushed to by a user without write access

@chenyang78
Copy link
Contributor Author

Thanks for the change @chenyang78 ! This will resolve some of the issues, but as I have mentioned in the inline comment, we still may hit some other problems in this code. I think we need to have single place where we drop the loads and can know the length of the "longest chain" of loads. Let's go with this change for now, and clean this code a bit soon.

Make sense. Thanks for taking time reviewing the PR, @pawelszczerbuk! Seems the CI got stuck somehow. Do we need any approval to run the CI jobs? If so, could you help? Thanks!

@htyu htyu merged commit a23ee7f into triton-lang:main Apr 3, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MatmulLoopPipeline assertion failure: Assertion `!isa<tt::LoadOp>(iter->first)' failed
3 participants