Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Conv Transpose BF16 #30877

Merged
merged 6 commits into from
Feb 18, 2021
Merged

Add Conv Transpose BF16 #30877

merged 6 commits into from
Feb 18, 2021

Conversation

wozna
Copy link
Contributor

@wozna wozna commented Feb 3, 2021

PR types

Others

PR changes

OPs

Describe

This PR:

  • change conv_tranpose op mkldnn kernel to use MKLDNNHandlerT
  • add BF16 support for conv_tranpose op

@paddle-bot-old
Copy link

paddle-bot-old bot commented Feb 3, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

: platform::MKLDNNHandlerT<T, mkldnn::deconvolution_forward>(
dev_ctx, mkldnn_engine, cpu_place,
platform::CreateKey(dev_ctx, framework::vectorize(input->dims()),
unique_name)) {
const bool is_test = ctx.Attr<bool>("is_test");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I like very much this PR . The only thing that is missing is that inside a ConvTransposeMKLDNNHandler you should call isCached() method not to create MD again. Please look to other ops implemented with MKLDNNHandlerT like pool, softmax etc.

Copy link
Contributor

@arogowie-intel arogowie-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general there is huge overlap with oneDNN conv kernel (not surprisingly) so it would be good to have this common part in one place. But such refactoring is rather better for another PR.

paddle/fluid/platform/mkldnn_reuse.h Outdated Show resolved Hide resolved
@@ -253,6 +255,11 @@ class MKLDNNHandlerT {
std::static_pointer_cast<dnnl::memory>(dev_ctx_.GetBlob(target_key));

if (target_memory_p == nullptr) {
if (custom_func) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand still after this custom reorder the condition user_md != target_md may be true and this would result in second reorder. Is it intentionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intentionally. Custom reorder function is used to set an appropriate reorder for data_format. However, user_md! = target_md may differ in data type. In the case of bf16, the original weights (user_md) are float, while for calculations (target_md) we need them in bf16 and it is converted in this reorder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then if the additional thing you want to do is data type conversion I'd suggest to create a helper function for this task explicitly named anything close to ConvertMemDataType. With it there would be clear intention of control-flow logic: 1st reorder data, 2nd convert memory data type. This AcquireMemoryWithReorder function is already very complicated, very hard to understand it's control-flow and I suppose hard to debug. Not mentioning it's maintenance and testing. IMHO this function is doing to much things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arogowie-intel custom function was introduced because there was no NCWH format enum in oneDNN so we needed to do reorder ourselves. This is outdated as after we impleented custom reorder relevant enum was added. So actual task is to remove custom reorder and there is a issue for that in our tracker.

paddle/fluid/platform/mkldnn_reuse.h Outdated Show resolved Hide resolved
paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc Outdated Show resolved Hide resolved
Comment on lines 166 to 167
auto fwd_prop_kind = is_test ? mkldnn::prop_kind::forward_inference
: mkldnn::prop_kind::forward_training;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already check forcing is_test attribute to be true at the beginning . So is this ternary operator needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be changing operators anyway to support training, so I think it's worth leaving it.

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc Outdated Show resolved Hide resolved
weights_tz, platform::MKLDNNGetDataType<K>(),
(g == 1) ? filter->format() : MKLDNNMemoryFormat::goihw);

// Custom Reorder from IOHW to OIHW
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't oneDNN support such reorder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related to group convolution, so we have to specify that format explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arogowie-intel This code is reflecting the diffrence among oneDNN and PaddlePaddle in implementing groups. in oneDNN groups are another dimension e.g. shape without groups OIWH (weights) when there are more than one groups then It becomes GOIHW (5 dimensional). In PaddlePaddle OIHW and GOIHW are expressed in 4D data. It just weights of second group are glued (concatenated) to the end of weights of first group. That is why when groups are present we cannot rely on format inside tensor and we need to change from OIHW to GOIHW

@lidanqing-intel
Copy link
Contributor

@wozna

2021-02-10 02:15:11 ****************
2021-02-10 02:15:11 0. Unittest is not allowed to be disabled.
2021-02-10 02:15:11 You must have one RD (kolinwei(Recommend), or luotao1) approval for the usage of @unittest.skip or @unittest.skipIf.
2021-02-10 02:15:11 +@unittest.skipIf(not core.supports_bfloat16(),
2021-02-10 02:15:11 1. The error message you wrote in PADDLE_ENFORCE{_**} or PADDLE_THROW does not meet our error message writing specification. Possible errors include 1. the error message is empty / 2. the error message is too short / 3. the error type is not specified. Please read the specification [ https://github.com/PaddlePaddle/Paddle/wiki/Paddle-Error-Message-Writing-Specification ], then refine the error message. If it is a mismatch, please request chenwhql (Recommend), luotao1 or lanxianghit review and approve.
2021-02-10 02:15:11 The PADDLE_ENFORCE{_**} or PADDLE_THROW entries that do not meet the specification are as follows:
2021-02-10 02:15:11 PADDLE_ENFORCE_NE(input->format(), MKLDNNMemoryFormat::undef, + "Got wrong format for Input tensor.")); 
2021-02-10 02:15:11 2. Developers are not allowed to set the check_dygraph field directly, which is set to True by default. If you need to change the check_dygraph field, you must have one RD (phlrain (Recommend), fuyinno4 (Recommend for kunlun) or lanxianghit) review and approve. 
2021-02-10 02:15:11 The code that do not meet the specification are as follows:
2021-02-10 02:15:11  python/paddle/fluid/tests/unittests/mkldnn/test_conv2d_transpose_bf16_mkldnn_op.py : 
2021-02-10 02:15:11 +        self.check_output(check_dygraph=(self.use_mkldnn == False)) 
2021-02-10 02:15:11 There are 3 approved errors.
2021-02-10 02:15:11 ****************

jczaja
jczaja previously approved these changes Feb 11, 2021
Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arogowie-intel
Copy link
Contributor

Looks good. Just a general note on auto - it deduces plain type if you pass it a reference. Thus it entails a copy. Please pay attention to use auto& or const auto& whenever possible to avoid those copies.

@luotao1 luotao1 merged commit caf9d39 into PaddlePaddle:develop Feb 18, 2021
@wozna wozna deleted the bf16_conv_trans branch February 24, 2023 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants