-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow pipelining loads that don't feed directly into tl.dot. #3415
Conversation
I didn't look at all the changes but there is already annotations that can be set on the loop to force pipelining. I think this is what we would want to ise for this case instead of adding to the existing logic that detects load + dot pattern. |
We are already passing To be clear, the issue isn't that the loop is not pipelined, it's that this particular load within the loop is not pipelined, because it does not feed into the dot directly. Am I misunderstanding something? |
In addition to passing a per-loop num_stages the idea was to also bypass the heuristic that only pipelines matmul loop. So I would think we would pipeline every load that can be pipelined in this case. It's possible the PR I pointed out behaves differently than what I thought. I do think it would be nice to not have to rely on load feeding into a dot for this case. (of course unless we mark explicitly ops there will always be a part of heuristic) but I though the "general" case would do what you want here. For context the heuristic is there for both historic and user simplicity reasons. Historic because we use to only have the global control and the compiler had to pick the loop to pipeline and for simplicity as the existing pattern checks usually always benefits from being pipelined. |
80296e1
to
ab4bd63
Compare
Updated the PR per our discussion |
lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp
Outdated
Show resolved
Hide resolved
Thank you for the review. :) |
Previously if a load was used indirectly by a dot but there were non-unary operations between the load and the dot, we couldn't pipeline it. Also clean up some TODOs. GPC: more-pipelining
ab4bd63
to
40cc26c
Compare
Effectively roll back #3415 due to internal test failures. GPC: rollback-3415
Effectively roll back #3415 due to internal test failures. GPC: rollback-3415
…lang#3415) Allow pipelining more loads indirectly used by a dot. Previously if a load was used indirectly by a dot but there were non-unary operations between the load and the dot, we couldn't pipeline it. Also clean up some TODOs.
PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549. GPC: dot-pipelining
PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549. GPC: dot-pipelining
PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549. GPC: dot-pipelining
PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549. GPC: dot-pipelining
PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549. GPC: dot-pipelining
PR #3472 partially rolled back PR #3415 due to internal test failures. This PR rolls forward the change as much as we currently can, allowing *most* but not all relevant loads to be pipelined. There is still a TritonGPU -> LLVM codegen bug in Triton that we have not been able to fix, but now we catch it with asserts, PR #3549.
Effectively roll back triton-lang/triton#3415 due to internal test failures. GPC: rollback-3415
Allow pipelining more loads indirectly used by a dot.
Previously if a load was used indirectly by a dot but there were non-unary
operations between the load and the dot, we couldn't pipeline it.
Also clean up some TODOs.
PR chain