Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned matmul work #13104

Closed
Tracked by #13242
allieculp opened this issue Apr 14, 2023 · 12 comments
Closed
Tracked by #13242

Unaligned matmul work #13104

allieculp opened this issue Apr 14, 2023 · 12 comments
Assignees

Comments

@allieculp
Copy link

No description provided.

@allieculp allieculp changed the title Unaligned matmul Unaligned matmul work Apr 14, 2023
@allieculp
Copy link
Author

@qcolombet Adding this as a new issue to track unaligned matmul work, @mattwalsh for vis

@allieculp
Copy link
Author

In progress - reasonable approximation of performance - similar to aligned case right now (aligned 1, prime number sizes).
@manishucsd for visibility

@qcolombet
Copy link
Contributor

@manishucsd do we have the unaligned cases already tracked in your perf framework and in CI?

@qcolombet
Copy link
Contributor

Synced-up with @manishucsd and the perf framework works out-of-the-box for unaligned cases.
However, hooking it up in CI is still being discussed.

@qcolombet
Copy link
Contributor

Quick update here, we are getting closer to landing the perf improvements @nicolasvasilache grabbed with his new transform dialect strategy. (See https://github.com/openxla/iree/blob/c7925912b2f76b34335ab3d6949cd87a0c4f6071/compiler/src/iree/compiler/Codegen/TransformDialectStrategies/GPU/Common.cpp#L465)

When we turn this on we should be "only" 2-3x slower than cuBLAS (instead of ~20x). I.e., we're not out of the wood but we're getting there.

To close the gap:

  • @manishucsd will look at the schedule we generate with the new strategy (we're likely missing some async copies for the padding)
  • I'll take a look at the generated code to see if there is anything we can improve

What we are missing to already get this part of the improvements:

@qcolombet
Copy link
Contributor

To close the gap:

  • @manishucsd will look at the schedule we generate with the new strategy (we're likely missing some async copies for the padding)
  • I'll take a look at the generated code to see if there is anything we can improve

Quick update on that front:
For the generated code I believe we have some redundant computations in the main loop: when masking for the padded dimensions we check each row individually whereas if I'm not mistaken we should only need to check one of the padded row because the padded rows are either all present or all masked. I.e., we should be able to do only one check instead of N (4 in this case).
Now, I don't believe this will help performance that much since the bottleneck is on the memory if I'm reading the profile correctly.

Also, instead of doing smarter mask checking in each loop iteration, I believe we could peel the loop so that only the iteration with the masking (hence the last one) has to do these checks.
Finally, given the gymnastic we do here, I think it is probably more valuable to do the padding at the graph level.

On a different front, I found two issues related to the tensor core strategy:

  • It miscompiles when we use --td-matmul-strategy-use-mma-sync (e.g., 2044 in K dimension, we get 2012 instead of 2044.) I'll file an issue for that
  • It crashes when I use '2048x1024x2044' (MxNxK):
error.mlir:6:10: error: transform.structured.pad failed to apply
    %2 = linalg.matmul ins(%arg0, %arg1 : tensor<2048x2044xf32>, tensor<2044x1024xf32>) outs(%1 : tensor<2048x1024xf32>) -> tensor<2048x1024xf32>

I'll file an issue for that too.

@qcolombet
Copy link
Contributor

Here is the issue for the "pad failed to apply": #13448

@qcolombet
Copy link
Contributor

Filed #13451 for the miscompile.

@allieculp
Copy link
Author

PR is up for landing unaligned - @nicolasvasilache

@allieculp
Copy link
Author

@qcolombet Can this be considered closed with the 'soft' landing of unaligned matmuls? Let us know what work remains here.

@qcolombet
Copy link
Contributor

Let's wait for #13492 to land.

@qcolombet
Copy link
Contributor

#13492 landed, let's close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants