-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix][TIR] Fix dynamic smem merge leaf alloc #16216
Conversation
cc @Lunderberg @jinhongyii. Would be great if you can help review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It looks like this edge case occurs when (1) there's a dynamic memory allocation and (2) there is only a single statement that ever accesses that dynamic memory allocation. This is definitely a case that we should handle, and the implementation looks good.
That said, the existence of this edge case looks like we could provide even better. If a buffer is only ever accessed in a single location, it suggests that there could be additional opportunities to improve the behavior.
-
BufferLoad
: The only access of the buffer is a read from it. This is a read to uninitialized memory, which is already undefined behavior. If the only access to a buffer is aBufferLoad
, we can raise a compile-time error. -
BufferStore
: The only access of the buffer is a store into it. This is a write to memory that is never accessed, which has no impact on observable behavior. If the only access to a buffer is aBufferStore
, we can replace theBufferStore(buf, indices, value)
withEvaluate(value)
. -
builtin::tvm_access_ptr()
: This is the only case when a buffer accessed exactly once can be well-defined and have an observable effect. For example, making a data pointer to pass to an external function for use as a scratch space. We probably should track whether this access pointer is assigned to a variable, as thelinear_seq_
is incorrect if the access pointer is defined once, but used multiple times.
(These aren't improvements that should be blocking for the current PR, as this PR is already an improvement on the current implementation, but pointing out potential further improvements. If you'd like to handle any of the follow-ups, ping me and I can review them.)
Thank you for the detailed comments @Lunderberg , I found this bug when I try to allocate a workspace for an extern function (your 3rd case). I have not met the linear_seq_ problem you mentioned. I alloc a new workspace once I make an func call and the pass seems to reuse them correctly for now. |
Sounds good, and so long as you're aware of the possibility. I wasn't able to make a test case that would trigger the issue, as attempting to make it resulted in other errors occurring while collecting the shared allocations. It's enough of an edge case that I'm not too worried about it. |
The MergeDynamicSharedMemoryAllocations currently will first free and then allocate for each scope. However, when a buffer is allocated and freed within a leaf scope, it will run into a free before alloc bug.
This commit solves this bug by delaying the leaf free after the alloc is done.
A simple test case is also added