Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cinn_launch_op for using CINN to optimize graph #36600

Merged
merged 15 commits into from
Nov 1, 2021

Conversation

CtfGo
Copy link
Contributor

@CtfGo CtfGo commented Oct 21, 2021

PR types

New features

PR changes

OPs

Describe

增加CinnLaunchOp,负责执行Cinn子图编译的结果,要点如下:

  1. 在子图划分的BuildCinnPass中,每个子图在原图中会被替换为该CinnLaunchOp,由它来调用Cinn进行子图编译、执行的功能。
  2. CinnLaunchOp的输入/输出即为子图的输入和输出,另外增加compilation_key属性,它可由该属性key从全局Cache中获取子图对象、编译结果,该属性由BuildCinnPass在创建Op时进行设置
  3. CinnLaunchOp功能实现的流程为:
    - 从全局Cache中获取子图对象
    - 从全局Cache中获取子图编译结果,未命中cache时进行即时编译
    - 根据编译结果的变量信息(数据类型、shape)初始化运行时数据,分配内存/显存
    - 将运行时数据打包为参数,调用cinn的可执行对象runtime program进行计算
    - 子图运行结果通过参数指针同步到paddle侧的tensor

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS})
register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op cinn_launch_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cinn_launch_op also be guarded by the WITH_CINN flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, here it is declared as an excluded op with the EXCLUDES keyword of register_operators, like sync_batch_norm_op has the similar definition.

&name2argument, &hold_buffers);

// prepare output variables
auto output_tensors = details::GetConstTensors(scope, Outputs(kOutputs));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to replace the auto of this line to actual type because when I read GetConstTensors text, I thought it is tensor. After that, I don't know it is map<string, const LodTensor*> until I read the code of GetConstTensors.

To increase readability, I suggest to put actual types instead of auto when you emphasize the type or the type may confuse readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I changed some auto usage to their actual type.

register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS})
register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op cinn_launch_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also guard cinn_launch_op by WITH_CINN?

Both input and output of this operator are a set of variables
which are input and output of the graph respectively that will be
compiled and executed in this operator.
In addition, there is a attribute named 'compilation_key' should be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"there is a attribute" -> "there is an attribute"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

namespace paddle {
namespace operators {

using LoDTensor = framework::LoDTensor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly recommend not to use using declaration in a header file, even if it is in a namespace. Same comment for other places.

See reference from:
https://abseil.io/tips/119
https://stackoverflow.com/questions/6175705/scope-of-using-declaration-within-a-namespace
or you can find a lot of references online about it.

Copy link
Contributor Author

@CtfGo CtfGo Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, all using declaration in header file were removed to source file.

zhhsplendid
zhhsplendid previously approved these changes Oct 29, 2021
Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

thisjiang
thisjiang previously approved these changes Oct 29, 2021
Copy link
Contributor

@thisjiang thisjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CtfGo CtfGo dismissed stale reviews from thisjiang and zhhsplendid via 2daaa75 October 29, 2021 11:51
zhhsplendid
zhhsplendid previously approved these changes Oct 29, 2021
Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

zhhsplendid
zhhsplendid previously approved these changes Oct 29, 2021
Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for op benchmark ci. 是否考虑将cinn op相关的代码单独放一个子目录?

@zhhsplendid zhhsplendid merged commit 0a963ee into PaddlePaddle:develop Nov 1, 2021
Comment on lines +173 to +175
if (WITH_GPU)
nv_test(cinn_launch_op_test SRCS cinn_launch_op_test.cc DEPS cinn_compiler cinn_launch_op elementwise_add_op)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只支持GPU的test的吗?CPU的Test不需要吗?

set necessarily to get corresponding ir::Graph object of the graph
or its computation result.

It accomplishs the computation of graph following several steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accomplishs --> accomplishes

Comment on lines +67 to +68
const auto& cinn_runtime_program = cinn_compiled_object.runtime_program;
const auto& compiled_scope = *(cinn_compiled_object.scope.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto& cinn_runtime_program = cinn_compiled_object.runtime_program;
const auto& compiled_scope = *(cinn_compiled_object.scope.get());
const auto& cinn_runtime_program = *(cinn_compiled_object.runtime_program);
const auto& compiled_scope = *(cinn_compiled_object.scope);

"is not equivalent, paddle is [%s] "
"but compiled result is [%s].",
pd_name, paddle_tensor->dims(), compiled_dim));
// TODO(CtfGo): check the underlying data type is equivalent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个check还是加上吧?

if (!paddle_tensor->IsInitialized()) {
// TODO(CtfGo): support mutable corresponding c++ type with the
// compilation type
paddle_tensor->mutable_data<float>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面这部分有想过如何处理吗?

const auto& cinn_name = paddle2cinn_varmap.count(pd_name)
? paddle2cinn_varmap.at(pd_name)
: pd_name;
std::unique_ptr<cinn_buffer_t> buffer_ptr(new cinn_buffer_t());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::make_unique instead.

const auto& cinn_name = paddle2cinn_varmap.count(pd_name)
? paddle2cinn_varmap.at(pd_name)
: pd_name;
std::unique_ptr<cinn_buffer_t> buffer_ptr(new cinn_buffer_t());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::make_unique instead.

const std::vector<std::string>& input_var_names,
const std::vector<std::string>& output_var_names) {
std::unordered_set<std::string> all_paddle_names, all_cinn_names;
for_each(paddle2cinn_varmap.begin(), paddle2cinn_varmap.end(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不用加上std::吗?

auto* paddle_tensor = var_ptr->GetMutable<LoDTensor>();
auto compiled_ddim = framework::make_ddim(cinn_tensor->shape().data());
// TODO(CtfGo): support mutable corresponding c++ type
paddle_tensor->mutable_data<float>(compiled_ddim, place);
Copy link
Contributor

@wzzju wzzju Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面这部分有想过如何处理其他数据类型吗?可以考虑使用CinnTensor::Type()方法获取对应的cpp type。

&hold_buffers);
}
// Step 4. Launch CINN to execute the compilation runtime program
cinn_runtime_program->Execute(&name2argument);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要确认一下Execute是同步阻塞在这里,等执行完毕才会往下走吧?否则一旦退出这个作用域,hold_buffers中unique_ptr指向的内存就被回收了。

@CtfGo CtfGo deleted the cinn_lanuch_op branch November 1, 2021 10:48
ghost pushed a commit to piotrekobi/Paddle that referenced this pull request Nov 3, 2021
增加CinnLaunchOp,负责执行Cinn子图编译的结果,要点如下:
1. 在子图划分的BuildCinnPass中,每个子图在原图中会被替换为该CinnLaunchOp,由它来调用Cinn进行子图编译、执行的功能。
2. CinnLaunchOp的输入/输出即为子图的输入和输出,另外增加`compilation_key`属性,它可由该属性key从全局Cache中获取子图对象、编译结果,该属性由BuildCinnPass在创建Op时进行设置
3. CinnLaunchOp功能实现的流程为:
        - 从全局Cache中获取子图对象
        - 从全局Cache中获取子图编译结果,未命中cache时进行即时编译
        - 根据编译结果的变量信息(数据类型、shape)初始化运行时数据,分配内存/显存
        - 将运行时数据打包为参数,调用cinn的可执行对象runtime program进行计算
        - 子图运行结果通过参数指针同步到paddle侧的tensor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants