-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INTERPRETER] Support unary ops #3279
Conversation
@@ -244,7 +244,7 @@ def kernel(Z, X, SIZE: tl.constexpr): | |||
z_ref = eval(expr if numpy_expr is None else numpy_expr) | |||
# triton result | |||
x_tri = to_triton(x, device=device, dst_type=dtype_x) | |||
z_tri = to_triton(np.empty_like(z_ref), device=device, dst_type=dtype_x) | |||
z_tri = to_triton(np.empty_like(x), device=device, dst_type=dtype_x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. There was a memory leak in our tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my info, how does this leak memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let z_ref
be a scalar, np.empty_like
is also a scalar, and so is z_tri
. to_triton
doesn't splat a scalar, but only reinterpret a value to the given dst_type
. However, the output of the kernel on the next line will be an array of 128 elements.
I'm wondering whether we have similar problems at other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's an out-of-bounds memory access (I usually reserve "memory leak" for memory that's allocated and never freed, but maybe you use a different definition). btw .contiguous()
is another way to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably to_triton should just call contiguous().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
No description provided.