Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] MKLDNN layouts #10291

Closed

Conversation

pzelazko-intel
Copy link
Contributor

DO NOT MERGE IT

Please take a look at MKLDNN layout Proof of Concept and assess if we can stay with this design. This code is not finished - there are debug VLOG prints, some parts are missing and the code needs cleanup.

We need MKLDNN layouts for MKLDNN kernels to be performed efficiently - especially for convolution and fully-connected OPs.

In #8305 it was recommended to create MKLDNNLoDTensor class deriving from LoDTensor. But that approach was causing too many problems - in many places Tensor class is used explicitly and I would need to differentiate between LoDTensor and MKLDNNLoDTensor.

I came up with a different approach - I've added Tensor::ExtendedData interface and a pointer to this interface in Tensor class. It is a placeholder for additional data. I use it for MKLDNN layout case - mkldnn::memory::format and mkldnn::engine are held in MKLDNNTensorData, which derives from Tensor::ExtendedData. If Tensor layout is set to kMKLDNN, then I can treat this tensor as holding data in MKLDNN format layout. To make use of such a tensor, I've created decorator classes MKLDNNTensor and MutableMKLDNNTensor. The data is kept still in Placeholder within Tensor.

Reorders to MKLDNN format happen within convolution and fully-connected, because:

  • For these two OPs MKLDNN library can decide which layout is best based on input parameters like the size of the input, stride, padding etc.
  • Other MKLDNN OPs don't offer significant performance boost on MKLDNN layouts

When the next OP expected kernel is not a MKLDNN one, then we transform tensor to NCHW layout.

@luotao1
Copy link
Contributor

luotao1 commented May 7, 2018

@pzelazko-intel Sorry for the late reply.
It is better to create MKLDNNLoDTensor class deriving from LoDTensor. Some reasons are in #8305 (comment)

For examples, fluid_op_A->mkldnn_op_B->fluid_op_C, if mkldnn_op_B doesn't use LoDTensor, how should fluid_op_C get the LoD information?

@pzelazko-intel
Copy link
Contributor Author

@luotao1 LoDTensor is still used, because it inherits from Tensor, which I modified to be capable of keeping data in MKLDNN format.
Plain Tensor is used in many places, like https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/data_transform.cc#L29.
In this case I cannot perform data transformation from MKLDNN format to let say NCHW, because MKLDNN specific data would be kept in MKLDNNLoDTensor.

Please let me know what you think.

@jacquesqiao
Copy link
Member

jacquesqiao commented May 7, 2018

I an not sure what is the problem of create MKLDNNLoDTensor class deriving from LoDTensor? In fluid, most of the Tensors are actually LoDTensors, if operator do not need LoD information, it can treat LoDTensor as Tensor, I think we can also make MKLDNNLoDTensor deriving from LoDTensor, so that we can keep LoD information.

Plain Tensor is used in many places, like https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/data_transform.cc#L29.

Data transform use plain tensor here because it only need to use the data inside the tensor, most of the inputs are actually LoDTensor.

@pzelazko-intel
Copy link
Contributor Author

Data transform use plain tensor here because it only need to use the data inside the tensor, most of the inputs are actually LoDTensor.

The problem is that DataTransform does not have information about MKLDNN format (which would be in MKLDNNLoDTensor).

@jacquesqiao
Copy link
Member

The problem is that DataTransform does not have information about MKLDNN format (which would be in MKLDNNLoDTensor).

Cay we use tensor->layout to judge if a tensor is MKLDNNLoDTensor and then cast it to MKLDNNLoDTensor?

@pzelazko-intel
Copy link
Contributor Author

Cay we use tensor->layout to judge if a tensor is MKLDNNLoDTensor and then cast it to MKLDNNLoDTensor?

We could, but in mentioned DataTransform and in OperatorWithKernel::RunImpl we instantiate Tensor class, not LoDTensor: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/operator.cc#L575.

Having to cast Tensor to MKLDNNLoDTensor, we would have to do it in many places, making our code full of if-else constructions.

@luotao1
Copy link
Contributor

luotao1 commented May 9, 2018

Having to cast Tensor to MKLDNNLoDTensor, we would have to do it in many places, making our code full of if-else constructions.

How about new files named mkldnn_lod_tensor.h and mkldnn_lod_tensor.cc, and implement the cast in these files, just like MKLDNNMatrix.h ?

@pzelazko-intel
Copy link
Contributor Author

Having to cast Tensor to MKLDNNLoDTensor, we would have to do it in many places, making our code full of if-else constructions.

Where exactly is there a cast? Anyway cast won't solve my problem, because DataTransform operates on Tensor instance: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/operator.cc#L575.

@jacquesqiao
Copy link
Member

The relationship is MKLDNNLoDTensor -> LoDTensor -> Tensor, so DataTransform can still operates on Tensor(MKLDNNLoDTensor or LoDTensor) instance and do cast if needed.

@pzelazko-intel
Copy link
Contributor Author

Yes, but the output of DataTransform is an instance of plain Tensor: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/operator.cc#L575.

Moreover please take a look at: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/data_transform.cc#L33 - there are again plain Tensor instances used.

How would you overcome this obstacle?

@luotao1
Copy link
Contributor

luotao1 commented May 9, 2018

@pzelazko-intel I have another question, could we use mkldnn op do inference if MKLDNN layout not finished?

@pzelazko-intel
Copy link
Contributor Author

@luotao1 yes, but it will have worse performance than version with MKLDNN layouts.

@luotao1
Copy link
Contributor

luotao1 commented May 9, 2018

Oh, could some PRs (run inference model with current mkldnn ops in c++ end) created first to validate the whole process?

@pzelazko-intel
Copy link
Contributor Author

@luotao1 I'm not sure if I understand - you mean PR with some inference model test with MKLDNN OPs enabled?

@jacquesqiao
Copy link
Member

For MKLDNNTensor, what is needed in DataTransform? Because for the current LoDTensor, in DataTransform, we only need the field in Tensor.

@pzelazko-intel
Copy link
Contributor Author

MKLDNNTensor has additionally information about MKLDNN format used and MKLDNN engine (MKLDNNTensorData class), but the latter one can be extracted from MKLDNNDeviceContext. So we have to know MKLDNN format in which the data is organized. There are many such formats and MKLDNN can choose which one to use, so I would not recommend to simply add every one of them to the the DataLayout. Instead, we could add field like mkldnn_format to the Tensor in place of extended_data if you don't like concept of extended data.

@jacquesqiao
Copy link
Member

I notice that in the current implementation extended_data is one filed of Tensor, does it means that we can get all the information we need to do data transform from Tensor and then construct a MKLDNNTensor with the data inside the transformed tensor, like: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/operator.cc#L578

@pzelazko-intel
Copy link
Contributor Author

It may work with some workarounds, because in DataTransform we reorder only from MKLDNN format to common ones.

So eventually you don't like extended_data concept and want me to reimplement it to MKLDNNLoDTensor, right? Could you please check if other changes are acceptable? That can save me from future reimplementations.

Thank you

@jacquesqiao
Copy link
Member

Sorry, maybe my expression is not so clear, I think to add extended_data in Tensor is Ok, what I want to do is to make clear what is needed for data_transform. If all we need is inside tensor itself(such as layout, data, and extended_data) then deriving from Tensor or deriving from LoDTensor seems to have no different, but we need the LoD information to pass along with operators running.

@pzelazko-intel
Copy link
Contributor Author

If I modified Tensor and Tensor is a base class for LoDTensor, then LoD information still passes along operators? Moreover I assumed that doing reorders I don't need to modify LoD information. Am I right?

@luotao1
Copy link
Contributor

luotao1 commented May 10, 2018

If I modified Tensor and Tensor is a base class for LoDTensor, then LoD information still passes along operators?

Yes, you are right.

Moreover I assumed that doing reorders I don't need to modify LoD information. Am I right?

Yes, you are right. The LoD information is independent of Tensor.

@jacquesqiao
Copy link
Member

@pzelazko-intel After data transform, we will copy the transformed data inside tensor back to it's original var type, the lod infromation will also be copied.

auto* trans_var = new_scope.Var(var_name);
std::shared_ptr<Tensor> out(new Tensor);
DataTransform(expected_kernel_key, kernel_type_for_var, *tensor_in,
out.get());
CopyVariableWithTensor(*var, *(out.get()), trans_var);

void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor,
Variable* out_var) {
if (in_var.IsType<LoDTensor>()) {
auto& in_lod_tensor = in_var.Get<LoDTensor>();
auto* tran_lod_tensor = out_var->GetMutable<LoDTensor>();
tran_lod_tensor->set_lod(in_lod_tensor.lod());
tran_lod_tensor->set_layout(in_lod_tensor.layout());
tran_lod_tensor->ShareDataWith(tensor);
} else if (in_var.IsType<SelectedRows>()) {
auto& in_selected_rows = in_var.Get<SelectedRows>();
auto* trans_selected_rows = out_var->GetMutable<SelectedRows>();
trans_selected_rows->set_height(in_selected_rows.height());
trans_selected_rows->set_rows(in_selected_rows.rows());
trans_selected_rows->mutable_value()->ShareDataWith(tensor);
} else {
PADDLE_THROW("unknown var type");
}

If we add MKLDNNLoDTensor, we also need to do this, and the LoD information should maintain in the MKLDNNLoDTensor. That is why we think MKLDNNLoDTensor should derive from LoDTensor, because it need to store the LoD information inside it.

@pzelazko-intel
Copy link
Contributor Author

@jacquesqiao I agree, but please notice that in proposed solution I don't create any class deriving from Tensor. MKLDNNTensor is only a decorator for Tensor class.

If we are on the same page and you have no other questions, then please let me know if I should stay with the current design or reimplement it to MKLDNNLoDTensor deriving from LoDTensor.

Thank you

@jacquesqiao
Copy link
Member

jacquesqiao commented May 14, 2018

@pzelazko-intel After discussion, we think use decorator is ok, we should move on now, thanks!

@luotao1
Copy link
Contributor

luotao1 commented May 14, 2018

@pzelazko-intel Discussed with @jacquesqiao and @Superjomn , if you can ensure the lod information transfer among different ops, we think that you can stay with your current design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants