-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backend] hoist ttg.local_alloc only if its valid #3449
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. Can you change the condition as suggested.
if (auto allocOp = dyn_cast<ttg::LocalAllocOp>(op)) { | ||
// We cannot hoist the allocOp if any of its operands | ||
// is declared in the forOp's body. | ||
if (llvm::any_of(allocOp->getOperands(), | ||
[&](Value operand) { | ||
return operand.getParentBlock() == forOp.getBody(); | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually can we skip any allocation that has an init value? I thought I did that but looks like I missed it.
Basically:
if (!allocOp.getInit())
continue;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review, Thomas! Would it be possible that the init
argument of the allocOp
comes from the parent scope of the forOp
? If that was the case, we might want something like the following?
auto init = allocOp.getInit();
if (!init || init.getParentBlock() != forOp.getBody())
toHoist.push_back(&op);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would work but I don't think that's what we want to do. The idea here is to hoist the alloc created by the inner loop pipelining, we shouldn't try to move the other alloc ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Make sense. Updated the PR. Thanks!
We should only hoist triton_gpu.local_alloc created by the inner loop pipelining. This PR fixed issue triton-lang#3431
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We cannot hoist a triton_gpu.local_alloc op if its init argument is declared in the same scope as local_alloc's containing for loop.
We cannot hoist a triton_gpu.local_alloc op if its init argument is declared in the same scope as local_alloc's containing for loop.