[INTERPRETER] Support padding_option of tl.load #3599

tongyuantongyu · 2024-04-08T13:53:33Z

This would make debugging kernels using block pointer easier.

Jokeren

Please also add the interpreter annotation to test_block_ptr_matmul_no_scf

oliverdutton · 2024-04-11T13:32:25Z

Hello, love block pointers.

Why is the mask value forced as an enum in padding_option rather than through the other spec?

For example, using block pointers for writing a softmax kernel I want to load with masking generating -inf?

Can we support other but only allowing a scalar value?

Similarly, if I ran a tl.min you've want inf. Or a cumprod (not a great idea I know) it'd be 1 for null op.

Jokeren · 2024-04-11T15:49:54Z

We tired to mimick the tensor map functionality in cuda. See CUtensorMapFloatOOBfill

oliverdutton · 2024-04-11T16:04:21Z

I see, and this is a fast path with only two options. Much faster than the explicit checking and compare that other would do?

would you be open to a ‘use padding_mode’ or ‘other’ version of this when you can specify either (but not both)? [or could you infer zero later on and specialise to the backend, such as this NVIDIA op]?

Jokeren · 2024-04-11T16:15:57Z

I see, and this is a fast path with only two options. Much faster than the explicit checking and compare that other would do?

On GPU, yes.

would you be open to a ‘use padding_mode’ or ‘other’ version of this when you can specify either (but not both)? [or could you infer zero later on and specialise to the backend, such as this NVIDIA op]?

TMA is temporarily disabled in Triton, so there's only subtle difference between those two options. Using the padding mode will be slightly faster in practice at this moment.

But once TMA is back, please use the padding mode.

[INTERPRETER] Support padding_option of tl.load

fed8a1d

tongyuantongyu requested a review from ptillet as a code owner April 8, 2024 13:53

Jokeren approved these changes Apr 8, 2024

View reviewed changes

Jokeren requested changes Apr 8, 2024

View reviewed changes

tongyuantongyu added 2 commits April 8, 2024 22:04

Enable test_block_ptr_matmul_no_scf for interpreter test

70e9d59

Fix typo

92994f3

Jokeren approved these changes Apr 8, 2024

View reviewed changes

Jokeren merged commit 7641ac1 into triton-lang:main Apr 8, 2024
5 checks passed

tongyuantongyu deleted the interpreter_block_ptr_pad branch April 9, 2024 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INTERPRETER] Support padding_option of tl.load #3599

[INTERPRETER] Support padding_option of tl.load #3599

tongyuantongyu commented Apr 8, 2024

Jokeren left a comment

oliverdutton commented Apr 11, 2024 •

edited

Loading

Jokeren commented Apr 11, 2024

oliverdutton commented Apr 11, 2024 •

edited

Loading

Jokeren commented Apr 11, 2024 •

edited

Loading

[INTERPRETER] Support padding_option of tl.load #3599

[INTERPRETER] Support padding_option of tl.load #3599

Conversation

tongyuantongyu commented Apr 8, 2024

Jokeren left a comment

Choose a reason for hiding this comment

oliverdutton commented Apr 11, 2024 • edited Loading

Jokeren commented Apr 11, 2024

oliverdutton commented Apr 11, 2024 • edited Loading

Jokeren commented Apr 11, 2024 • edited Loading

oliverdutton commented Apr 11, 2024 •

edited

Loading

oliverdutton commented Apr 11, 2024 •

edited

Loading

Jokeren commented Apr 11, 2024 •

edited

Loading