Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support unalignment input for conv2d fprop stage=2 Fix for issue #242 #246

Merged
merged 6 commits into from
Sep 8, 2021

Conversation

mengchihe
Copy link

Hi, I add some support for conv2d fprop when input shape is unalignment for issue #242.
I don't use AlignmentA/B like gemm to tell data iterator the granularity of each load, since that will change all the interface in default_conv2d_fprop.h. So I just distinguish whether input data is aligned or not, and read one element each time when not alignment. I enlarge the mask in data iterator and try not affect the original performance when shape is alignment.
I just change 2d fprop yet to see if there is any comments. If this patch is okay, I'll go on to support backward and conv3d, thanks.

@hwu36
Copy link
Collaborator

hwu36 commented Apr 21, 2021

Thank you very much!!! You are really fast.

This is very important. Important enough to have its own slide in a tier-1 GTC talk if it is done. So, we need to do it perfectly which means we may have multiple iterations between you and us.

Back to the code. It needs to be done almost the same as GEMM. If a interface needs to be changed, then change it to be the same as the GEMM counterpart. Don't create new kernel level structs. Alignment needs to be a int rather than a bool. Alignment = 1 means it is aligned to 1 element. For example, if the input is fp16. It should support alignment 1, 2, 4, 8. The max one is 128bit aligned.

@mengchihe
Copy link
Author

Okay, I'll change it.
Is this mean that I shouldn't add more partial specialization in default_conv2d_fprop.h and add AlignmentA/B into all existing partial specializations. Since partial specializations can not have default template arguments, we needs to add these two parameters in all example/test .cu files which call DefaultConv2dFprop.
Or may be I should do like gemm which has Gemm and DefaultGemm, and add a new struct named Conv2dFprop which will call default_conv2d_fprop.h. By this way I should change DefaultConv2dFprop in all .cu files into Conv2dFprop.
Bot ways needs to change existed .cu codes, and the second way is more similar to gemm. Which way do you prefer, or is there any way that can avoid changing .cu files. Thanks.

@hwu36
Copy link
Collaborator

hwu36 commented Apr 22, 2021

Can we just change https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/kernel/default_conv2d_fprop.h#L50-L70
to

/// Defines a kernel for Conv2dFprop
template <
  typename ElementA,
  typename LayoutA,
  typename ElementB,
  typename LayoutB,
  typename ElementC,
  typename LayoutC,
  typename ElementAccumulator,
  typename OperatorClass,
  typename ArchTag,
  typename ThreadblockShape,
  typename WarpShape,
  typename InstructionShape,
  typename EpilogueOutputOp,
  typename ThreadblockSwizzle,
  int Stages,
  typename MathOperatorTag,
  int AlignmentA = 128 / cutlass::sizeof_bits<ElementA>::value,
  int AlignmentB = 128 / cutlass::sizeof_bits<ElementB>::value,
  conv::IteratorAlgorithm IteratorAlgorithm = IteratorAlgorithm::kAnalytic,
  conv::StrideSupport StrideSupport = StrideSupport::kStrided
> struct DefaultConv2dFprop;

Then, add AlignmentA and AlignmentB to all the partial specialization below. I think you don't need to change any .cu file then unless you would like to use non-default smaller alignment.

BTW, we only need this feature for fp16/bf16/tf32 tensor core kernels so far.

@mengchihe
Copy link
Author

Oops, exposing my poor cpp level, I didn't know that partial specialization also have the same default configuration.

I'll change like that, thank you.

@hwu36
Copy link
Collaborator

hwu36 commented Apr 22, 2021

Oops, exposing my poor cpp level, I didn't know that partial specialization also have the same default configuration.

Everyone starts from somewhere. You start with a high profile project. 😄

Don't restrict your change to stage=2. The change is almost the same for stage>2

Copy link
Collaborator

@hwu36 hwu36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need add unit tests in https://github.com/NVIDIA/cutlass/tree/master/test/unit/conv/device to test your changes.

Copy link
Collaborator

@hwu36 hwu36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We will run more internal tests before we merge this. As I said in the other pull request, our test system has backlogs. It may take a while.

@mengchihe
Copy link
Author

Okay, thanks.

@hwu36
Copy link
Collaborator

hwu36 commented Aug 16, 2021

Sorry for the very late response. I tried your change and they are great, I just need to do some minor changes and I will push to your branch directly.

Thank you very much. It is a great new feature to everyone!

@mengchihe
Copy link
Author

Sorry for the very late response. I tried your change and they are great, I just need to do some minor changes and I will push to your branch directly.

Thank you very much. It is a great new feature to everyone!

Okay, I'll upload backward then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants