-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support unalignment input for conv2d fprop stage=2 Fix for issue #242 #246
Conversation
Thank you very much!!! You are really fast. This is very important. Important enough to have its own slide in a tier-1 GTC talk if it is done. So, we need to do it perfectly which means we may have multiple iterations between you and us. Back to the code. It needs to be done almost the same as GEMM. If a interface needs to be changed, then change it to be the same as the GEMM counterpart. Don't create new kernel level structs. |
Okay, I'll change it. |
Can we just change https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/kernel/default_conv2d_fprop.h#L50-L70
Then, add BTW, we only need this feature for fp16/bf16/tf32 tensor core kernels so far. |
Oops, exposing my poor cpp level, I didn't know that partial specialization also have the same default configuration. I'll change like that, thank you. |
Everyone starts from somewhere. You start with a high profile project. 😄 Don't restrict your change to stage=2. The change is almost the same for stage>2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need add unit tests in https://github.com/NVIDIA/cutlass/tree/master/test/unit/conv/device to test your changes.
include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h
Show resolved
Hide resolved
include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h
Outdated
Show resolved
Hide resolved
include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h
Outdated
Show resolved
Hide resolved
include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h
Outdated
Show resolved
Hide resolved
include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We will run more internal tests before we merge this. As I said in the other pull request, our test system has backlogs. It may take a while.
Okay, thanks. |
Sorry for the very late response. I tried your change and they are great, I just need to do some minor changes and I will push to your branch directly. Thank you very much. It is a great new feature to everyone! |
Okay, I'll upload backward then |
Hi, I add some support for conv2d fprop when input shape is unalignment for issue #242.
I don't use AlignmentA/B like gemm to tell data iterator the granularity of each load, since that will change all the interface in default_conv2d_fprop.h. So I just distinguish whether input data is aligned or not, and read one element each time when not alignment. I enlarge the mask in data iterator and try not affect the original performance when shape is alignment.
I just change 2d fprop yet to see if there is any comments. If this patch is okay, I'll go on to support backward and conv3d, thanks.