-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DietCode] Local Padding #11793
base: main
Are you sure you want to change the base?
[DietCode] Local Padding #11793
Conversation
Also cc @Hzfengsy @vinx13 @spectrometerHBH @masahi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending out the PR! We might want to deliberate on the implementation to ensure its correctness. To be clear, running a pass inside a Schedule class will invalidate all the scheduling states and thus lead to incorrect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to clarify some conceptions so that we can be on the same page:
- postproc is one of the stages in meta-schedule, which also means it is part of
Schedule
- During
Scheduling
, we can only mutate mod with schedule primitives. IRMutator is not allowed. - Schedule transformations with primitives can be traced by printing the tracing path.
However, this PR tries to directly mutate mod with IRMutator in postproc (also at schedule stages). I recommend doing the following steps and we can move on:
- Add a schedule primitive called "padding", which can pad local buffers during schedule
- Call the padding primitive at postproc
Please let me know if you have any questions @ArmageddonKnight
Per offline discussion with @junrushao1994 and @ArmageddonKnight, here is the current action items:
|
e4d5ee8
to
c27bd64
Compare
@junrushao1994 @Hzfengsy I have finished the revision. Please have a second look when you have time. Also cc @comaniac |
It seems that for some reason the CI build is stopped (as I am unable to query the current CI status), would it be possible to re-trigger the CI. |
@tvm-bot rerun |
src/tir/transforms/local_pad.cc
Outdated
PrimExpr predicate_lhs_; | ||
|
||
friend class LocalPadder; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I would love to propose that we restructure the logic of this class a little bit.
Looks like the analyzer is interested in the following patterns:
if A </<=/>=/> X:
B[Y] = ...
If so, there isn't much reason to use a visitor pattern because recursion didn't actually happen. Instead, let's go with a more plain and readable fashion, for example
// inputs:
IfThenElse if_then_else;
// extract the lhs & rhs of the if-condition
PrimExpr predicate_lhs{nullptr};
PrimExpr predicate_rhs{nullptr};
if (const auto *op = if_then_else->condition.as<LENode>()) {
predicate_lhs = op->a;
predicate_rhs = op->a;
} else if (...) {
// use a macro or something to deal with LT, GE, GT
}
// then let's analyze the body statement
const BufferStoreNode* buffer_store = if_then_else->then_case.as<BufferStoreNode>();
ICHECK(buffer_store);
if (StructuralEqual()(buffer_store->indices[0], predicate_lhs)) {
... // some logic here
} else {
... // some logic here
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid we cannot directly do it like this. The reason is because the predicates are usually combined together into a single one and hence we need some way of splitting them. The implementation you provide might not be able to handle situations like the following:
if (inlineable_predicate1 && non_inlineable_predicate2 && inlineable_predicate3)
A_shared[...] = A[...];
| // after transformation
if (noninlineable_predicate2)
A_shared[...] = inlineable_predicate1 && inlineable_predicate3 ? A[...] : padded_value;
The PR has been going towards positive direction
hi, @ArmageddonKnight
|
@renfeier The reason is ebcause we are refactoring the implementation, so the pass itself is temporarily commented out. Sorry I was quite busy with university business and will finish the refactoring recently. |
|
@junrushao1994 As was discussed, I have fixed the implementation. Please review it again. |
Hmm ... seems that the Cortex CI pipelines are always interrupted for some reason, and this is happening on the main branch as well. |
20c9054
to
80bb3ee
Compare
@junrushao1994 The refactored implementation has passed the CI tests. Please review it when you have time available. Thanks. |
0ce3787
to
5fa292a
Compare
Hi @junrushao , it has been sometime since this PR is submitted. May I know whether there are any updates on this? And whether further changes are required? |
@ArmageddonKnight @junrushao What is the status of this PR or DietCode upstreaming in general? I'm interested in dynamic shape tuning, and I can help this effort. |
@masahi PadEinsum can achieve something similar since the padding is in the shared memory |
This PR is for the code generation changes required for dynamic MetaScheduler (see apache/tvm-rfcs#72 for the RFC, #11516 for the tracking issue describing the changes). Any feedback or comments are welcome.
FYI, @comaniac @junrushao1994