-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unaligned matmul work #13104
Comments
@qcolombet Adding this as a new issue to track unaligned matmul work, @mattwalsh for vis |
In progress - reasonable approximation of performance - similar to aligned case right now (aligned 1, prime number sizes). |
@manishucsd do we have the unaligned cases already tracked in your perf framework and in CI? |
Synced-up with @manishucsd and the perf framework works out-of-the-box for unaligned cases. |
Quick update here, we are getting closer to landing the perf improvements @nicolasvasilache grabbed with his new transform dialect strategy. (See https://github.com/openxla/iree/blob/c7925912b2f76b34335ab3d6949cd87a0c4f6071/compiler/src/iree/compiler/Codegen/TransformDialectStrategies/GPU/Common.cpp#L465) When we turn this on we should be "only" 2-3x slower than cuBLAS (instead of ~20x). I.e., we're not out of the wood but we're getting there. To close the gap:
What we are missing to already get this part of the improvements:
|
Quick update on that front: Also, instead of doing smarter mask checking in each loop iteration, I believe we could peel the loop so that only the iteration with the masking (hence the last one) has to do these checks. On a different front, I found two issues related to the tensor core strategy:
I'll file an issue for that too. |
Here is the issue for the "pad failed to apply": #13448 |
Filed #13451 for the miscompile. |
PR is up for landing unaligned - @nicolasvasilache |
@qcolombet Can this be considered closed with the 'soft' landing of unaligned matmuls? Let us know what work remains here. |
Let's wait for #13492 to land. |
#13492 landed, let's close this. |
No description provided.
The text was updated successfully, but these errors were encountered: