Allow pipelining loads that don't feed directly into tl.dot. #3415

jlebar · 2024-03-19T19:55:11Z

Allow pipelining more loads indirectly used by a dot.

Previously if a load was used indirectly by a dot but there were non-unary
operations between the load and the dot, we couldn't pipeline it.

Also clean up some TODOs.

PR chain

👉 Allow pipelining loads that don't feed directly into tl.dot. #3415 👈 YOU ARE HERE

ThomasRaoux · 2024-03-19T20:53:39Z

I didn't look at all the changes but there is already annotations that can be set on the loop to force pipelining. I think this is what we would want to ise for this case instead of adding to the existing logic that detects load + dot pattern.
Here is the PR that enabled it:
#3092

jlebar · 2024-03-19T21:44:47Z

We are already passing num_stages as an argument to the kernel. It appears this does the same thing as passing num_stages to the for loop's tl.range op. https://github.com/openai/triton/blob/8fed348280c32ef22c0e44b69bcd145420f78710/python/triton/compiler/code_generator.py#L933

To be clear, the issue isn't that the loop is not pipelined, it's that this particular load within the loop is not pipelined, because it does not feed into the dot directly.

Am I misunderstanding something?

ThomasRaoux · 2024-03-19T22:06:14Z

We are already passing num_stages as an argument to the kernel. It appears this does the same thing as passing num_stages to the for loop's tl.range op.

https://github.com/openai/triton/blob/8fed348280c32ef22c0e44b69bcd145420f78710/python/triton/compiler/code_generator.py#L933

To be clear, the issue isn't that the loop is not pipelined, it's that this particular load within the loop is not pipelined, because it does not feed into the dot directly.

Am I misunderstanding something?

In addition to passing a per-loop num_stages the idea was to also bypass the heuristic that only pipelines matmul loop. So I would think we would pipeline every load that can be pipelined in this case. It's possible the PR I pointed out behaves differently than what I thought. I do think it would be nice to not have to rely on load feeding into a dot for this case. (of course unless we mark explicitly ops there will always be a part of heuristic) but I though the "general" case would do what you want here.
I could be wrong and the code may be missing something. If possible I think we should avoid changing the heuristic.

For context the heuristic is there for both historic and user simplicity reasons. Historic because we use to only have the global control and the compiler had to pick the loop to pipeline and for simplicity as the existing pattern checks usually always benefits from being pipelined.

jlebar · 2024-03-21T20:43:48Z

Updated the PR per our discussion

lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp

include/triton/Dialect/Triton/IR/Utility.h

jlebar · 2024-03-25T20:09:51Z

Thank you for the review. :)

Previously if a load was used indirectly by a dot but there were non-unary operations between the load and the dot, we couldn't pipeline it. Also clean up some TODOs. GPC: more-pipelining

Effectively roll back #3415 due to internal test failures. GPC: rollback-3415

[Backend] Disable pipelining change from #3415. Effectively roll back #3415 due to internal test failures.

…lang#3415) Allow pipelining more loads indirectly used by a dot. Previously if a load was used indirectly by a dot but there were non-unary operations between the load and the dot, we couldn't pipeline it. Also clean up some TODOs.

PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549. GPC: dot-pipelining

PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549.

Effectively roll back triton-lang/triton#3415 due to internal test failures. GPC: rollback-3415

jlebar requested a review from ptillet as a code owner March 19, 2024 19:55

jlebar marked this pull request as draft March 19, 2024 19:55

jlebar force-pushed the dev-jlebar/more-pipelining branch 2 times, most recently from 80296e1 to ab4bd63 Compare March 21, 2024 19:04

jlebar marked this pull request as ready for review March 21, 2024 20:43

jlebar requested a review from ThomasRaoux March 21, 2024 20:43

pawelszczerbuk reviewed Mar 25, 2024

View reviewed changes

lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp Outdated Show resolved Hide resolved

ThomasRaoux approved these changes Mar 25, 2024

View reviewed changes

include/triton/Dialect/Triton/IR/Utility.h Show resolved Hide resolved

Allow pipelining more loads indirectly used by a dot.

40cc26c

Previously if a load was used indirectly by a dot but there were non-unary operations between the load and the dot, we couldn't pipeline it. Also clean up some TODOs. GPC: more-pipelining

jlebar force-pushed the dev-jlebar/more-pipelining branch from ab4bd63 to 40cc26c Compare March 25, 2024 20:30

jlebar enabled auto-merge (squash) March 25, 2024 20:31

jlebar merged commit 15770f3 into main Mar 25, 2024
5 checks passed

jlebar deleted the dev-jlebar/more-pipelining branch March 25, 2024 20:41

pawelszczerbuk mentioned this pull request Mar 25, 2024

[Backend] remove invalid loadOps from pipelining candidates #3455

Closed

jlebar added a commit that referenced this pull request Mar 26, 2024

[Backend] Disable pipelining change from #3415.

f310623

Effectively roll back #3415 due to internal test failures. GPC: rollback-3415

jlebar mentioned this pull request Mar 26, 2024

[Backend] Disable pipelining change from https://github.com/openai/triton/pull/3415. #3472

Merged

jlebar added a commit that referenced this pull request Mar 26, 2024

[Backend] Disable pipelining change from #3415.

919e3d6

Effectively roll back #3415 due to internal test failures. GPC: rollback-3415

jlebar added a commit that referenced this pull request Mar 27, 2024

[Backend] Disable pipelining change from #3415. (#3472)

74f1d52

[Backend] Disable pipelining change from #3415. Effectively roll back #3415 due to internal test failures.

ThomasRaoux pushed a commit that referenced this pull request Mar 27, 2024

[Backend] Disable pipelining change from #3415. (#3472)

8afdaf3

[Backend] Disable pipelining change from #3415. Effectively roll back #3415 due to internal test failures.

ThomasRaoux pushed a commit that referenced this pull request Mar 27, 2024

[Backend] Disable pipelining change from #3415. (#3472)

1373bda

[Backend] Disable pipelining change from #3415. Effectively roll back #3415 due to internal test failures.

chenyang78 mentioned this pull request Mar 27, 2024

[Backend] Correctly apply convert_layout to LoadOp's mask #3469

Merged

ptillet pushed a commit that referenced this pull request Apr 1, 2024

[Backend] Disable pipelining change from #3415. (#3472)

2c5c1ff

[Backend] Disable pipelining change from #3415. Effectively roll back #3415 due to internal test failures.

jlebar mentioned this pull request Apr 3, 2024

[Backend] Allow dot operand pipelining in more circumstances. #3560

Closed

jlebar mentioned this pull request Apr 4, 2024

[Backend] Allow dot operand pipelining in more circumstances. #3561

Merged

minjang pushed a commit to minjang/triton that referenced this pull request May 14, 2024

[Backend] Disable pipelining change from triton-lang/triton#3415.

71e48a2

Effectively roll back triton-lang/triton#3415 due to internal test failures. GPC: rollback-3415

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow pipelining loads that don't feed directly into tl.dot. #3415

Allow pipelining loads that don't feed directly into tl.dot. #3415

jlebar commented Mar 19, 2024 •

edited

Loading

ThomasRaoux commented Mar 19, 2024 •

edited

Loading

jlebar commented Mar 19, 2024

ThomasRaoux commented Mar 19, 2024

jlebar commented Mar 21, 2024

jlebar commented Mar 25, 2024

Allow pipelining loads that don't feed directly into tl.dot. #3415

Allow pipelining loads that don't feed directly into tl.dot. #3415

Conversation

jlebar commented Mar 19, 2024 • edited Loading

PR chain

ThomasRaoux commented Mar 19, 2024 • edited Loading

jlebar commented Mar 19, 2024

ThomasRaoux commented Mar 19, 2024

jlebar commented Mar 21, 2024

jlebar commented Mar 25, 2024

jlebar commented Mar 19, 2024 •

edited

Loading

ThomasRaoux commented Mar 19, 2024 •

edited

Loading