[BugFix][TIR] Fix dynamic smem merge leaf alloc #16216

nox-410 · 2023-12-08T07:24:17Z

The MergeDynamicSharedMemoryAllocations currently will first free and then allocate for each scope. However, when a buffer is allocated and freed within a leaf scope, it will run into a free before alloc bug.

This commit solves this bug by delaying the leaf free after the alloc is done.

A simple test case is also added

import tvm
import tvm.testing
from tvm.script import tir as T

class TestLeafAllocFree(tvm.testing.CompareBeforeAfter):
    transform = tvm.tir.transform.MergeDynamicSharedMemoryAllocations()

    def before(self):
        @T.prim_func
        def func():
            threadIdx_x = T.launch_thread("threadIdx.x", 128)
            A_sh_data = T.allocate([128], "float32", "shared.dyn")
            B_sh_data = T.allocate([128], "float32", "shared.dyn")
            A_sh = T.decl_buffer([128], "float32", data=A_sh_data, scope="shared.dyn")
            B_sh = T.decl_buffer([128], "float32", data=B_sh_data, scope="shared.dyn")
            B_sh[threadIdx_x] = A_sh[threadIdx_x]
        return func

    def expected(self):
        @T.prim_func
        def func():
            threadIdx_x = T.launch_thread("threadIdx.x", 128)
            buf_dyn_shmem = T.allocate([1024], "uint8", "shared.dyn")
            A_sh = T.decl_buffer((128,), data=buf_dyn_shmem, scope="shared.dyn")
            B_sh = T.decl_buffer((128,), data=buf_dyn_shmem, scope="shared.dyn")
            B_sh[threadIdx_x + 128] = A_sh[threadIdx_x]
        return func


if __name__ == "__main__":
    tvm.testing.main()

Hzfengsy · 2023-12-12T04:28:19Z

cc @Lunderberg @jinhongyii. Would be great if you can help review.

Lunderberg

Good catch! It looks like this edge case occurs when (1) there's a dynamic memory allocation and (2) there is only a single statement that ever accesses that dynamic memory allocation. This is definitely a case that we should handle, and the implementation looks good.

That said, the existence of this edge case looks like we could provide even better. If a buffer is only ever accessed in a single location, it suggests that there could be additional opportunities to improve the behavior.

BufferLoad: The only access of the buffer is a read from it. This is a read to uninitialized memory, which is already undefined behavior. If the only access to a buffer is a BufferLoad, we can raise a compile-time error.
BufferStore: The only access of the buffer is a store into it. This is a write to memory that is never accessed, which has no impact on observable behavior. If the only access to a buffer is a BufferStore, we can replace the BufferStore(buf, indices, value) with Evaluate(value).
builtin::tvm_access_ptr(): This is the only case when a buffer accessed exactly once can be well-defined and have an observable effect. For example, making a data pointer to pass to an external function for use as a scratch space. We probably should track whether this access pointer is assigned to a variable, as the linear_seq_ is incorrect if the access pointer is defined once, but used multiple times.

(These aren't improvements that should be blocking for the current PR, as this PR is already an improvement on the current implementation, but pointing out potential further improvements. If you'd like to handle any of the follow-ups, ping me and I can review them.)

nox-410 · 2023-12-14T02:14:39Z

Thank you for the detailed comments @Lunderberg , I found this bug when I try to allocate a workspace for an extern function (your 3rd case). I have not met the linear_seq_ problem you mentioned. I alloc a new workspace once I make an func call and the pass seems to reuse them correctly for now.

Lunderberg · 2023-12-14T21:13:12Z

I have not met the linear_seq_ problem you mentioned. I alloc a new workspace once I make an func call and the pass seems to reuse them correctly for now.

Sounds good, and so long as you're aware of the possibility. I wasn't able to make a test case that would trigger the issue, as attempting to make it resulted in other errors occurring while collecting the shared allocations. It's enough of an edge case that I'm not too worried about it.

nox-410 added 3 commits December 8, 2023 07:23

[BugFix][TIR] Fix dynamic smem merge leaf alloc

22706a6

lint

a80abf8

lint

ae06f22

nox-410 marked this pull request as ready for review December 9, 2023 11:28

nox-410 marked this pull request as draft December 10, 2023 06:27

fix allocs not correctly updated.

3170c6e

nox-410 marked this pull request as ready for review December 10, 2023 14:48

Lunderberg approved these changes Dec 13, 2023

View reviewed changes

tqchen merged commit c8bfdb2 into apache:main Dec 14, 2023
19 checks passed

nox-410 deleted the merge_dyn_leaf_alloc_free branch December 16, 2023 10:33

ysh329 mentioned this pull request Jan 11, 2024

[Release] v0.15.0 Release Candidate Notes #16391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][TIR] Fix dynamic smem merge leaf alloc #16216

[BugFix][TIR] Fix dynamic smem merge leaf alloc #16216

nox-410 commented Dec 8, 2023

Hzfengsy commented Dec 12, 2023

Lunderberg left a comment

nox-410 commented Dec 14, 2023

Lunderberg commented Dec 14, 2023

[BugFix][TIR] Fix dynamic smem merge leaf alloc #16216

[BugFix][TIR] Fix dynamic smem merge leaf alloc #16216

Conversation

nox-410 commented Dec 8, 2023

Hzfengsy commented Dec 12, 2023

Lunderberg left a comment

Choose a reason for hiding this comment

nox-410 commented Dec 14, 2023

Lunderberg commented Dec 14, 2023