support unalignment input for conv2d fprop stage=2 Fix for issue #242 #246

mengchihe · 2021-04-21T06:58:14Z

Hi, I add some support for conv2d fprop when input shape is unalignment for issue #242.
I don't use AlignmentA/B like gemm to tell data iterator the granularity of each load, since that will change all the interface in default_conv2d_fprop.h. So I just distinguish whether input data is aligned or not, and read one element each time when not alignment. I enlarge the mask in data iterator and try not affect the original performance when shape is alignment.
I just change 2d fprop yet to see if there is any comments. If this patch is okay, I'll go on to support backward and conv3d, thanks.

…IA#242

hwu36 · 2021-04-21T14:19:49Z

Thank you very much!!! You are really fast.

This is very important. Important enough to have its own slide in a tier-1 GTC talk if it is done. So, we need to do it perfectly which means we may have multiple iterations between you and us.

Back to the code. It needs to be done almost the same as GEMM. If a interface needs to be changed, then change it to be the same as the GEMM counterpart. Don't create new kernel level structs. Alignment needs to be a int rather than a bool. Alignment = 1 means it is aligned to 1 element. For example, if the input is fp16. It should support alignment 1, 2, 4, 8. The max one is 128bit aligned.

mengchihe · 2021-04-22T02:42:13Z

Okay, I'll change it.
Is this mean that I shouldn't add more partial specialization in default_conv2d_fprop.h and add AlignmentA/B into all existing partial specializations. Since partial specializations can not have default template arguments, we needs to add these two parameters in all example/test .cu files which call DefaultConv2dFprop.
Or may be I should do like gemm which has Gemm and DefaultGemm, and add a new struct named Conv2dFprop which will call default_conv2d_fprop.h. By this way I should change DefaultConv2dFprop in all .cu files into Conv2dFprop.
Bot ways needs to change existed .cu codes, and the second way is more similar to gemm. Which way do you prefer, or is there any way that can avoid changing .cu files. Thanks.

hwu36 · 2021-04-22T03:20:56Z

Can we just change https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/kernel/default_conv2d_fprop.h#L50-L70
to

/// Defines a kernel for Conv2dFprop
template <
  typename ElementA,
  typename LayoutA,
  typename ElementB,
  typename LayoutB,
  typename ElementC,
  typename LayoutC,
  typename ElementAccumulator,
  typename OperatorClass,
  typename ArchTag,
  typename ThreadblockShape,
  typename WarpShape,
  typename InstructionShape,
  typename EpilogueOutputOp,
  typename ThreadblockSwizzle,
  int Stages,
  typename MathOperatorTag,
  int AlignmentA = 128 / cutlass::sizeof_bits<ElementA>::value,
  int AlignmentB = 128 / cutlass::sizeof_bits<ElementB>::value,
  conv::IteratorAlgorithm IteratorAlgorithm = IteratorAlgorithm::kAnalytic,
  conv::StrideSupport StrideSupport = StrideSupport::kStrided
> struct DefaultConv2dFprop;

Then, add AlignmentA and AlignmentB to all the partial specialization below. I think you don't need to change any .cu file then unless you would like to use non-default smaller alignment.

BTW, we only need this feature for fp16/bf16/tf32 tensor core kernels so far.

mengchihe · 2021-04-22T03:37:56Z

Oops, exposing my poor cpp level, I didn't know that partial specialization also have the same default configuration.

I'll change like that, thank you.

hwu36 · 2021-04-22T03:57:06Z

Oops, exposing my poor cpp level, I didn't know that partial specialization also have the same default configuration.

Everyone starts from somewhere. You start with a high profile project. 😄

Don't restrict your change to stage=2. The change is almost the same for stage>2

hwu36

I think you need add unit tests in https://github.com/NVIDIA/cutlass/tree/master/test/unit/conv/device to test your changes.

include/cutlass/conv/kernel/default_conv2d_fprop.h

include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h

include/cutlass/conv/kernel/default_conv2d_fprop.h

hwu36

Thanks. We will run more internal tests before we merge this. As I said in the other pull request, our test system has backlogs. It may take a while.

mengchihe · 2021-04-26T15:31:43Z

Okay, thanks.

hwu36 · 2021-08-16T15:01:22Z

Sorry for the very late response. I tried your change and they are great, I just need to do some minor changes and I will push to your branch directly.

Thank you very much. It is a great new feature to everyone!

mengchihe · 2021-08-18T06:14:45Z

Sorry for the very late response. I tried your change and they are great, I just need to do some minor changes and I will push to your branch directly.

Thank you very much. It is a great new feature to everyone!

Okay, I'll upload backward then

support unalignment input for conv2d fprop stage=2 Fix for issue NVID…

7ec3a87

…IA#242

support setting load granularity for conv2d fprop

bb35a3b

hwu36 reviewed Apr 22, 2021

View reviewed changes

add unit test for non int4 load

f4b0a33

hwu36 requested changes Apr 26, 2021

View reviewed changes

hwu36 reviewed Apr 26, 2021

View reviewed changes

zheng95z mentioned this pull request May 6, 2021

Conv2d performance issue for large n & for small c and k. #255

Closed

Merge remote-tracking branch 'origin/master' into small_alignment

598e354

hwu36 added 2 commits September 7, 2021 20:39

Merge remote-tracking branch 'origin/master' into small_alignment

4e8af93

refine the implementation

59e2aa5

hwu36 approved these changes Sep 8, 2021

View reviewed changes

hwu36 merged commit 9ac2558 into NVIDIA:master Sep 8, 2021

hwu36 mentioned this pull request Dec 14, 2022

[FEA] Conv3d for SIMT #733

Closed

hwu36 mentioned this pull request Jun 1, 2023

[QST] Is possible to add support of single input/output channel for conv3d_fprop? #966

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support unalignment input for conv2d fprop stage=2 Fix for issue #242 #246

support unalignment input for conv2d fprop stage=2 Fix for issue #242 #246

mengchihe commented Apr 21, 2021

hwu36 commented Apr 21, 2021

mengchihe commented Apr 22, 2021

hwu36 commented Apr 22, 2021

mengchihe commented Apr 22, 2021

hwu36 commented Apr 22, 2021

hwu36 left a comment •

edited

Loading

hwu36 left a comment

mengchihe commented Apr 26, 2021

hwu36 commented Aug 16, 2021

mengchihe commented Aug 18, 2021

support unalignment input for conv2d fprop stage=2 Fix for issue #242 #246

support unalignment input for conv2d fprop stage=2 Fix for issue #242 #246

Conversation

mengchihe commented Apr 21, 2021

hwu36 commented Apr 21, 2021

mengchihe commented Apr 22, 2021

hwu36 commented Apr 22, 2021

mengchihe commented Apr 22, 2021

hwu36 commented Apr 22, 2021

hwu36 left a comment • edited Loading

Choose a reason for hiding this comment

hwu36 left a comment

Choose a reason for hiding this comment

mengchihe commented Apr 26, 2021

hwu36 commented Aug 16, 2021

mengchihe commented Aug 18, 2021

hwu36 left a comment •

edited

Loading