Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DietCode] Local Padding #11793

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ArmageddonKnight
Copy link
Contributor

This PR is for the code generation changes required for dynamic MetaScheduler (see apache/tvm-rfcs#72 for the RFC, #11516 for the tracking issue describing the changes). Any feedback or comments are welcome.

FYI, @comaniac @junrushao1994

@comaniac
Copy link
Contributor

Also cc @Hzfengsy @vinx13 @spectrometerHBH @masahi

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending out the PR! We might want to deliberate on the implementation to ensure its correctness. To be clear, running a pass inside a Schedule class will invalidate all the scheduling states and thus lead to incorrect results.

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to clarify some conceptions so that we can be on the same page:

  1. postproc is one of the stages in meta-schedule, which also means it is part of Schedule
  2. During Scheduling, we can only mutate mod with schedule primitives. IRMutator is not allowed.
  3. Schedule transformations with primitives can be traced by printing the tracing path.

However, this PR tries to directly mutate mod with IRMutator in postproc (also at schedule stages). I recommend doing the following steps and we can move on:

  1. Add a schedule primitive called "padding", which can pad local buffers during schedule
  2. Call the padding primitive at postproc

Please let me know if you have any questions @ArmageddonKnight

@comaniac
Copy link
Contributor

Per offline discussion with @junrushao1994 and @ArmageddonKnight, here is the current action items:

  1. The local padding pass will be moved to TIR transformation, meaning that local padding becomes an implicit transformation similar to loop partitioning. A config will be exposed to control whether to turn on or off (default off) to keep all current workloads unchanged.
  2. In the local padding implementation, the logic related to var node name hints will be improved to leverage a more reliable factor (e.g., pointer reference).

@ArmageddonKnight ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from e4d5ee8 to c27bd64 Compare June 23, 2022 20:50
@ArmageddonKnight
Copy link
Contributor Author

@junrushao1994 @Hzfengsy I have finished the revision. Please have a second look when you have time.

Also cc @comaniac

@ArmageddonKnight
Copy link
Contributor Author

ArmageddonKnight commented Jun 26, 2022

It seems that for some reason the CI build is stopped (as I am unable to query the current CI status), would it be possible to re-trigger the CI.

@ArmageddonKnight
Copy link
Contributor Author

@tvm-bot rerun

src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved
PrimExpr predicate_lhs_;

friend class LocalPadder;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I would love to propose that we restructure the logic of this class a little bit.

Looks like the analyzer is interested in the following patterns:

if A </<=/>=/> X:
  B[Y] = ...

If so, there isn't much reason to use a visitor pattern because recursion didn't actually happen. Instead, let's go with a more plain and readable fashion, for example

// inputs:
IfThenElse if_then_else;
// extract the lhs & rhs of the if-condition
PrimExpr predicate_lhs{nullptr};
PrimExpr predicate_rhs{nullptr};
if (const auto *op = if_then_else->condition.as<LENode>()) {
  predicate_lhs = op->a;
  predicate_rhs = op->a;
} else if (...) {
  // use a macro or something to deal with LT, GE, GT
}
// then let's analyze the body statement
const BufferStoreNode* buffer_store = if_then_else->then_case.as<BufferStoreNode>();
ICHECK(buffer_store);
if (StructuralEqual()(buffer_store->indices[0], predicate_lhs)) {
  ... // some logic here
} else {
  ... // some logic here
}

Copy link
Contributor Author

@ArmageddonKnight ArmageddonKnight Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid we cannot directly do it like this. The reason is because the predicates are usually combined together into a single one and hence we need some way of splitting them. The implementation you provide might not be able to handle situations like the following:

if (inlineable_predicate1 && non_inlineable_predicate2 && inlineable_predicate3)
  A_shared[...] = A[...];

|  // after transformation

if (noninlineable_predicate2)
  A_shared[...] = inlineable_predicate1 && inlineable_predicate3 ? A[...] : padded_value;

@junrushao junrushao dismissed their stale review July 4, 2022 01:46

The PR has been going towards positive direction

@renfeier
Copy link

hi, @ArmageddonKnight
it seems the tvm transform config "tir.enable_local_pad "does not work since the same schedule build result kernel src code are the same when set config tir.enable.lcal_pad true/false, when i use the test example you upload before, example code will show belows:

def save_kernel_source(kernel, log_kernel_filename):
kernel_src=kernel.imported_modules[0].get_source()
if log_kernel_filename is not None:
with open(log_kernel_filename, 'w') as fout:
fout.write("{}".format(kernel_src))
else:
print({}.format(kernel_src))

@tvm.testing.requires_gpu
@tvm.testing.requires_cuda
def test_dense_local_padding():
"""
Test that local padding is delivering the correct compute outcome.
"""
x_np = np.random.uniform(-0.1, 0.1, size=(960, 770)).astype(np.float32)
w_np = np.random.uniform(-0.1, 0.1, size=(770, 2304)).astype(np.float32)
y_np = np.matmul(x_np, w_np)
y_empty = np.empty(shape=y_np.shape, dtype=y_np.dtype)
tir_sched = Schedule(Dense_960x770x2304)
sample_dense_sched(tir_sched)
with tvm.transform.PassContext(config={"tir.enable_local_pad": False}):
nopad_cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda")
save_kernel_source(nopad_cuda_kernel, "nolocalpad_kernel.cu")
with tvm.transform.PassContext(config={"tir.enable_local_pad": True}):
cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda")
save_kernel_source(cuda_kernel, "localpad_kernel.cu")

 cuda_ctx = tvm.cuda()
 module_data = [x_np, w_np, y_empty]
 module_data = [tvm.nd.array(d, device=cuda_ctx) for d in module_data]
 cuda_kernel(*module_data)
 np.testing.assert_allclose(module_data[-1].numpy(), y_np, atol=1e-3, rtol=1e-3)

the localpad_kernel.cu are same with nolocalpad_kernel.cu

@ArmageddonKnight
Copy link
Contributor Author

@renfeier The reason is ebcause we are refactoring the implementation, so the pass itself is temporarily commented out. Sorry I was quite busy with university business and will finish the refactoring recently.

@renfeier
Copy link

refactoring
@ArmageddonKnight
Thank you for the prompt reply. Looking forward to your update

@ArmageddonKnight
Copy link
Contributor Author

ArmageddonKnight commented Aug 6, 2022

@junrushao1994 As was discussed, I have fixed the implementation. Please review it again.

@ArmageddonKnight
Copy link
Contributor Author

ArmageddonKnight commented Aug 8, 2022

Hmm ... seems that the Cortex CI pipelines are always interrupted for some reason, and this is happening on the main branch as well.

@ArmageddonKnight ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from 20c9054 to 80bb3ee Compare August 8, 2022 23:38
@ArmageddonKnight
Copy link
Contributor Author

@junrushao1994 The refactored implementation has passed the CI tests. Please review it when you have time available. Thanks.

@ArmageddonKnight ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from 0ce3787 to 5fa292a Compare August 26, 2022 05:10
@ArmageddonKnight
Copy link
Contributor Author

ArmageddonKnight commented Aug 30, 2022

Hi @junrushao , it has been sometime since this PR is submitted. May I know whether there are any updates on this? And whether further changes are required?

@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
@masahi
Copy link
Member

masahi commented Dec 12, 2022

@ArmageddonKnight @junrushao What is the status of this PR or DietCode upstreaming in general? I'm interested in dynamic shape tuning, and I can help this effort.

@masahi
Copy link
Member

masahi commented Dec 13, 2022

This looks similar to #12750, maybe we don't need this? cc @vinx13

@vinx13
Copy link
Member

vinx13 commented Dec 13, 2022

@masahi PadEinsum can achieve something similar since the padding is in the shared memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants