-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INTERPRETER] Support padding_option of tl.load #3599
[INTERPRETER] Support padding_option of tl.load #3599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add the interpreter annotation to test_block_ptr_matmul_no_scf
Hello, love block pointers. Why is the mask value forced as an enum in padding_option rather than through the For example, using block pointers for writing a softmax kernel I want to load with masking generating -inf? Can we support Similarly, if I ran a |
We tired to mimick the tensor map functionality in cuda. See |
I see, and this is a fast path with only two options. Much faster than the explicit checking and compare that would you be open to a ‘use padding_mode’ or ‘other’ version of this when you can specify either (but not both)? [or could you infer zero later on and specialise to the backend, such as this NVIDIA op]? |
On GPU, yes.
TMA is temporarily disabled in Triton, so there's only subtle difference between those two options. Using the padding mode will be slightly faster in practice at this moment. But once TMA is back, please use the padding mode. |
This would make debugging kernels using block pointer easier.