Simplest gemm example with 3.x APIs #1742

dukallis · 2024-08-23T19:25:24Z

dukallis
Aug 23, 2024

Hello everyone.
I have an access to V100 and I wanted to try gemm example from quickstart.md, the one that uses 3.x APIs. I copied this code block and handled with include path so that clangd won't complaint. It seems there is a missing 4th template parameter when aliasing CollectiveEpilogue with cutlass::epilogue::collective::DefaultEpilogue. I found something called

cutlass::epilogue::NoSmemWarpSpecialized

and put it as 4th template parameter. But after I changed OperatorClass to be arch::OpClassSimt everything is broken. Documentation is very lacking in some list of possible values for parameters to be chosen.

I've been trying really hard to wrap my head around gemm_api_3x.md, but I'm stuck. I've I wasn't able to tinker parameters so that template magic turn out to be correct.

Is it even possible to use new APIs with old cards like Volta?
Can someone provide an example of the simplest GEMM possible using modern CUTLASS 3.0 APIs?

Two matrices with float elements.
Need only their product. Epilogue does nothing with output matrix and just writes the result into global memory.

Answered by thakkarV

Sep 4, 2024

Generally, I am interested whether it's possible to construct sgemm or convolution using new 3.x Collective, Kernel and Device APIs provided that I have underlying CuTe atoms specified correctly and then applied make_tiled_mma and make_tiled_copy to them?

Yes. Please see https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/default_gemm_configuration.hpp for inspiration. A similar template config can be used for Volta/Turing and they should just work OOTB. We have some of these kernels internally that maybe @ccecka and I can work on upstreaming as single file examples in the future

View full answer

thakkarV · 2024-08-23T22:09:26Z

thakkarV
Aug 23, 2024
Collaborator

We only support Hopper with the 3.x API in full, as documented in our readme file. Support for Volta and Ampere is via the 2.x API. That said the CuTe atoms for every architecture starting with Pascal are provided, so you can still build custom kernels using it if you want. You can use the CuTe tutorials as a reference to build those on top of.

4 replies

dukallis Sep 3, 2024
Author

I'm a bit lost in how to link CuTe atoms with higher level API. From the tutorial it seems like I need to specify atoms correctly and then do make_tiled_mma and make_tiled_copy. When I have them will they fit to both CUTLASS APIs or are they intended for 3.x API only?

I understand that CollectiveBuilder is not supported for generations prior to Hopper. Generally, I am interested whether it's possible to construct sgemm or convolution using new 3.x Collective, Kernel and Device APIs provided that I have underlying CuTe atoms specified correctly and then applied make_tiled_mma and make_tiled_copy to them?

WhoisZihan Sep 4, 2024

What is still missing in supporting Ampere with 3.x api?

thakkarV Sep 4, 2024
Collaborator

Well, functionally everything works on all GPUs starting with Maxwell in 3.x API. Ampere has the mainloops etc too and the swizzle epilogue, however, three key things are missing that make it not a full fledge first class citizen of 3.x:

Builder API support for Ampere arch
performance tuning for peak utilization to bring it on par with 2.x (this is a minor one, the perf should already be near SOL)
feature parity with Hopper 3.x feature set

thakkarV Sep 4, 2024
Collaborator

Generally, I am interested whether it's possible to construct sgemm or convolution using new 3.x Collective, Kernel and Device APIs provided that I have underlying CuTe atoms specified correctly and then applied make_tiled_mma and make_tiled_copy to them?

Yes. Please see https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/default_gemm_configuration.hpp for inspiration. A similar template config can be used for Volta/Turing and they should just work OOTB. We have some of these kernels internally that maybe @ccecka and I can work on upstreaming as single file examples in the future

Answer selected by dukallis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplest gemm example with 3.x APIs #1742

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Simplest gemm example with 3.x APIs #1742

dukallis Aug 23, 2024

Replies: 1 comment · 4 replies

thakkarV Aug 23, 2024 Collaborator

dukallis Sep 3, 2024 Author

WhoisZihan Sep 4, 2024

thakkarV Sep 4, 2024 Collaborator

thakkarV Sep 4, 2024 Collaborator

dukallis
Aug 23, 2024

Replies: 1 comment 4 replies

thakkarV
Aug 23, 2024
Collaborator

dukallis Sep 3, 2024
Author

thakkarV Sep 4, 2024
Collaborator

thakkarV Sep 4, 2024
Collaborator