Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INTERPRETER] Support padding_option of tl.load #3599

Merged
merged 3 commits into from
Apr 8, 2024

Conversation

tongyuantongyu
Copy link
Contributor

This would make debugging kernels using block pointer easier.

@tongyuantongyu tongyuantongyu requested a review from ptillet as a code owner April 8, 2024 13:53
Copy link
Contributor

@Jokeren Jokeren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add the interpreter annotation to test_block_ptr_matmul_no_scf

@Jokeren Jokeren merged commit 7641ac1 into triton-lang:main Apr 8, 2024
5 checks passed
@tongyuantongyu tongyuantongyu deleted the interpreter_block_ptr_pad branch April 9, 2024 13:56
@oliverdutton
Copy link

oliverdutton commented Apr 11, 2024

Hello, love block pointers.

Why is the mask value forced as an enum in padding_option rather than through the other spec?

For example, using block pointers for writing a softmax kernel I want to load with masking generating -inf?

Can we support other but only allowing a scalar value?

Similarly, if I ran a tl.min you've want inf. Or a cumprod (not a great idea I know) it'd be 1 for null op.

@Jokeren
Copy link
Contributor

Jokeren commented Apr 11, 2024

We tired to mimick the tensor map functionality in cuda. See CUtensorMapFloatOOBfill

@oliverdutton
Copy link

oliverdutton commented Apr 11, 2024

I see, and this is a fast path with only two options. Much faster than the explicit checking and compare that other would do?

would you be open to a ‘use padding_mode’ or ‘other’ version of this when you can specify either (but not both)? [or could you infer zero later on and specialise to the backend, such as this NVIDIA op]?

@Jokeren
Copy link
Contributor

Jokeren commented Apr 11, 2024

I see, and this is a fast path with only two options. Much faster than the explicit checking and compare that other would do?

On GPU, yes.

would you be open to a ‘use padding_mode’ or ‘other’ version of this when you can specify either (but not both)? [or could you infer zero later on and specialise to the backend, such as this NVIDIA op]?

TMA is temporarily disabled in Triton, so there's only subtle difference between those two options. Using the padding mode will be slightly faster in practice at this moment.

But once TMA is back, please use the padding mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants