[DietCode] Local Padding #11793

ArmageddonKnight · 2022-06-21T01:25:17Z

This PR is for the code generation changes required for dynamic MetaScheduler (see apache/tvm-rfcs#72 for the RFC, #11516 for the tracking issue describing the changes). Any feedback or comments are welcome.

FYI, @comaniac @junrushao1994

comaniac · 2022-06-21T01:30:04Z

Also cc @Hzfengsy @vinx13 @spectrometerHBH @masahi

junrushao

Thanks for sending out the PR! We might want to deliberate on the implementation to ensure its correctness. To be clear, running a pass inside a Schedule class will invalidate all the scheduling states and thus lead to incorrect results.

Hzfengsy

I'd love to clarify some conceptions so that we can be on the same page:

postproc is one of the stages in meta-schedule, which also means it is part of Schedule
During Scheduling, we can only mutate mod with schedule primitives. IRMutator is not allowed.
Schedule transformations with primitives can be traced by printing the tracing path.

However, this PR tries to directly mutate mod with IRMutator in postproc (also at schedule stages). I recommend doing the following steps and we can move on:

Add a schedule primitive called "padding", which can pad local buffers during schedule
Call the padding primitive at postproc

Please let me know if you have any questions @ArmageddonKnight

comaniac · 2022-06-23T18:57:25Z

Per offline discussion with @junrushao1994 and @ArmageddonKnight, here is the current action items:

The local padding pass will be moved to TIR transformation, meaning that local padding becomes an implicit transformation similar to loop partitioning. A config will be exposed to control whether to turn on or off (default off) to keep all current workloads unchanged.
In the local padding implementation, the logic related to var node name hints will be improved to leverage a more reliable factor (e.g., pointer reference).

ArmageddonKnight · 2022-06-26T01:21:49Z

@junrushao1994 @Hzfengsy I have finished the revision. Please have a second look when you have time.

Also cc @comaniac

ArmageddonKnight · 2022-06-26T23:37:01Z

It seems that for some reason the CI build is stopped (as I am unable to query the current CI status), would it be possible to re-trigger the CI.

ArmageddonKnight · 2022-06-27T21:04:21Z

@tvm-bot rerun

src/driver/driver_api.cc

src/tir/transforms/local_pad.cc

junrushao · 2022-07-04T01:24:42Z

src/tir/transforms/local_pad.cc

+  PrimExpr predicate_lhs_;
+
+  friend class LocalPadder;
+};


Generally, I would love to propose that we restructure the logic of this class a little bit.

Looks like the analyzer is interested in the following patterns:

if A </<=/>=/> X: B[Y] = ...

If so, there isn't much reason to use a visitor pattern because recursion didn't actually happen. Instead, let's go with a more plain and readable fashion, for example

// inputs: IfThenElse if_then_else; // extract the lhs & rhs of the if-condition PrimExpr predicate_lhs{nullptr}; PrimExpr predicate_rhs{nullptr}; if (const auto *op = if_then_else->condition.as<LENode>()) { predicate_lhs = op->a; predicate_rhs = op->a; } else if (...) { // use a macro or something to deal with LT, GE, GT } // then let's analyze the body statement const BufferStoreNode* buffer_store = if_then_else->then_case.as<BufferStoreNode>(); ICHECK(buffer_store); if (StructuralEqual()(buffer_store->indices[0], predicate_lhs)) { ... // some logic here } else { ... // some logic here }

I am afraid we cannot directly do it like this. The reason is because the predicates are usually combined together into a single one and hence we need some way of splitting them. The implementation you provide might not be able to handle situations like the following:

if (inlineable_predicate1 && non_inlineable_predicate2 && inlineable_predicate3) A_shared[...] = A[...]; | // after transformation if (noninlineable_predicate2) A_shared[...] = inlineable_predicate1 && inlineable_predicate3 ? A[...] : padded_value;

The PR has been going towards positive direction

tests/python/unittest/test_tir_transform_local_pad.py

renfeier · 2022-07-29T04:13:26Z

hi, @ArmageddonKnight
it seems the tvm transform config "tir.enable_local_pad "does not work since the same schedule build result kernel src code are the same when set config tir.enable.lcal_pad true/false, when i use the test example you upload before, example code will show belows:

def save_kernel_source(kernel, log_kernel_filename):
kernel_src=kernel.imported_modules[0].get_source()
if log_kernel_filename is not None:
with open(log_kernel_filename, 'w') as fout:
fout.write("{}".format(kernel_src))
else:
print({}.format(kernel_src))

@tvm.testing.requires_gpu
@tvm.testing.requires_cuda
def test_dense_local_padding():
"""
Test that local padding is delivering the correct compute outcome.
"""
x_np = np.random.uniform(-0.1, 0.1, size=(960, 770)).astype(np.float32)
w_np = np.random.uniform(-0.1, 0.1, size=(770, 2304)).astype(np.float32)
y_np = np.matmul(x_np, w_np)
y_empty = np.empty(shape=y_np.shape, dtype=y_np.dtype)
tir_sched = Schedule(Dense_960x770x2304)
sample_dense_sched(tir_sched)
with tvm.transform.PassContext(config={"tir.enable_local_pad": False}):
nopad_cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda")
save_kernel_source(nopad_cuda_kernel, "nolocalpad_kernel.cu")
with tvm.transform.PassContext(config={"tir.enable_local_pad": True}):
cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda")
save_kernel_source(cuda_kernel, "localpad_kernel.cu")

 cuda_ctx = tvm.cuda()
 module_data = [x_np, w_np, y_empty]
 module_data = [tvm.nd.array(d, device=cuda_ctx) for d in module_data]
 cuda_kernel(*module_data)
 np.testing.assert_allclose(module_data[-1].numpy(), y_np, atol=1e-3, rtol=1e-3)

the localpad_kernel.cu are same with nolocalpad_kernel.cu

ArmageddonKnight · 2022-07-29T20:11:37Z

@renfeier The reason is ebcause we are refactoring the implementation, so the pass itself is temporarily commented out. Sorry I was quite busy with university business and will finish the refactoring recently.

renfeier · 2022-07-31T07:58:03Z

refactoring
@ArmageddonKnight
Thank you for the prompt reply. Looking forward to your update

ArmageddonKnight · 2022-08-06T00:14:29Z

@junrushao1994 As was discussed, I have fixed the implementation. Please review it again.

ArmageddonKnight · 2022-08-08T15:04:57Z

Hmm ... seems that the Cortex CI pipelines are always interrupted for some reason, and this is happening on the main branch as well.

ArmageddonKnight · 2022-08-09T02:20:33Z

@junrushao1994 The refactored implementation has passed the CI tests. Please review it when you have time available. Thanks.

ArmageddonKnight · 2022-08-30T16:33:56Z

Hi @junrushao , it has been sometime since this PR is submitted. May I know whether there are any updates on this? And whether further changes are required?

masahi · 2022-12-12T03:10:14Z

@ArmageddonKnight @junrushao What is the status of this PR or DietCode upstreaming in general? I'm interested in dynamic shape tuning, and I can help this effort.

masahi · 2022-12-13T23:21:45Z

This looks similar to #12750, maybe we don't need this? cc @vinx13

vinx13 · 2022-12-13T23:29:20Z

@masahi PadEinsum can achieve something similar since the padding is in the shared memory

junrushao previously requested changes Jun 21, 2022

View reviewed changes

Hzfengsy requested changes Jun 21, 2022

View reviewed changes

ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from e4d5ee8 to c27bd64 Compare June 23, 2022 20:50

junrushao reviewed Jul 3, 2022

View reviewed changes

src/driver/driver_api.cc Show resolved Hide resolved

junrushao reviewed Jul 3, 2022

View reviewed changes

src/tir/transforms/local_pad.cc Outdated Show resolved Hide resolved

junrushao reviewed Jul 4, 2022

View reviewed changes

tests/python/unittest/test_tir_transform_local_pad.py Outdated Show resolved Hide resolved

ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from 20c9054 to 80bb3ee Compare August 8, 2022 23:38

[DietCode] Add local padding

5fa292a

ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from 0ce3787 to 5fa292a Compare August 26, 2022 05:10

areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DietCode] Local Padding #11793

[DietCode] Local Padding #11793

ArmageddonKnight commented Jun 21, 2022

comaniac commented Jun 21, 2022

junrushao left a comment

Hzfengsy left a comment

comaniac commented Jun 23, 2022

ArmageddonKnight commented Jun 26, 2022

ArmageddonKnight commented Jun 26, 2022 •

edited

Loading

ArmageddonKnight commented Jun 27, 2022

junrushao Jul 4, 2022

ArmageddonKnight Jul 5, 2022 •

edited

Loading

renfeier commented Jul 29, 2022

ArmageddonKnight commented Jul 29, 2022

renfeier commented Jul 31, 2022

ArmageddonKnight commented Aug 6, 2022 •

edited

Loading

ArmageddonKnight commented Aug 8, 2022 •

edited

Loading

ArmageddonKnight commented Aug 9, 2022

ArmageddonKnight commented Aug 30, 2022 •

edited

Loading

masahi commented Dec 12, 2022

masahi commented Dec 13, 2022 •

edited

Loading

vinx13 commented Dec 13, 2022 •

edited

Loading

[DietCode] Local Padding #11793

Are you sure you want to change the base?

[DietCode] Local Padding #11793

Conversation

ArmageddonKnight commented Jun 21, 2022

comaniac commented Jun 21, 2022

junrushao left a comment

Choose a reason for hiding this comment

Hzfengsy left a comment

Choose a reason for hiding this comment

comaniac commented Jun 23, 2022

ArmageddonKnight commented Jun 26, 2022

ArmageddonKnight commented Jun 26, 2022 • edited Loading

ArmageddonKnight commented Jun 27, 2022

junrushao Jul 4, 2022

Choose a reason for hiding this comment

ArmageddonKnight Jul 5, 2022 • edited Loading

Choose a reason for hiding this comment

renfeier commented Jul 29, 2022

hi, @ArmageddonKnight it seems the tvm transform config "tir.enable_local_pad "does not work since the same schedule build result kernel src code are the same when set config tir.enable.lcal_pad true/false, when i use the test example you upload before, example code will show belows:

ArmageddonKnight commented Jul 29, 2022

renfeier commented Jul 31, 2022

ArmageddonKnight commented Aug 6, 2022 • edited Loading

ArmageddonKnight commented Aug 8, 2022 • edited Loading

ArmageddonKnight commented Aug 9, 2022

ArmageddonKnight commented Aug 30, 2022 • edited Loading

masahi commented Dec 12, 2022

masahi commented Dec 13, 2022 • edited Loading

vinx13 commented Dec 13, 2022 • edited Loading

ArmageddonKnight commented Jun 26, 2022 •

edited

Loading

ArmageddonKnight Jul 5, 2022 •

edited

Loading

hi, @ArmageddonKnight
it seems the tvm transform config "tir.enable_local_pad "does not work since the same schedule build result kernel src code are the same when set config tir.enable.lcal_pad true/false, when i use the test example you upload before, example code will show belows:

ArmageddonKnight commented Aug 6, 2022 •

edited

Loading

ArmageddonKnight commented Aug 8, 2022 •

edited

Loading

ArmageddonKnight commented Aug 30, 2022 •

edited

Loading

masahi commented Dec 13, 2022 •

edited

Loading

vinx13 commented Dec 13, 2022 •

edited

Loading