-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cinn_launch_op for using CINN to optimize graph #36600
Conversation
Thanks for your contribution! |
… cinn_lanuch_op
… cinn_lanuch_op
… cinn_lanuch_op
… cinn_lanuch_op
register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op | ||
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS}) | ||
register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op | ||
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op cinn_launch_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should cinn_launch_op
also be guarded by the WITH_CINN
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, here it is declared as an excluded op with the EXCLUDES
keyword of register_operators, like sync_batch_norm_op
has the similar definition.
&name2argument, &hold_buffers); | ||
|
||
// prepare output variables | ||
auto output_tensors = details::GetConstTensors(scope, Outputs(kOutputs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to replace the auto
of this line to actual type because when I read GetConstTensors
text, I thought it is tensor. After that, I don't know it is map<string, const LodTensor*>
until I read the code of GetConstTensors
.
To increase readability, I suggest to put actual types instead of auto
when you emphasize the type or the type may confuse readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I changed some auto
usage to their actual type.
register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op | ||
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS}) | ||
register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op | ||
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op cinn_launch_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also guard cinn_launch_op
by WITH_CINN
?
Both input and output of this operator are a set of variables | ||
which are input and output of the graph respectively that will be | ||
compiled and executed in this operator. | ||
In addition, there is a attribute named 'compilation_key' should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"there is a attribute" -> "there is an attribute"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
namespace paddle { | ||
namespace operators { | ||
|
||
using LoDTensor = framework::LoDTensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highly recommend not to use using declaration
in a header file, even if it is in a namespace. Same comment for other places.
See reference from:
https://abseil.io/tips/119
https://stackoverflow.com/questions/6175705/scope-of-using-declaration-within-a-namespace
or you can find a lot of references online about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, all using declaration
in header file were removed to source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… cinn_lanuch_op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for op benchmark ci. 是否考虑将cinn op相关的代码单独放一个子目录?
if (WITH_GPU) | ||
nv_test(cinn_launch_op_test SRCS cinn_launch_op_test.cc DEPS cinn_compiler cinn_launch_op elementwise_add_op) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
只支持GPU的test的吗?CPU的Test不需要吗?
set necessarily to get corresponding ir::Graph object of the graph | ||
or its computation result. | ||
|
||
It accomplishs the computation of graph following several steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accomplishs
--> accomplishes
const auto& cinn_runtime_program = cinn_compiled_object.runtime_program; | ||
const auto& compiled_scope = *(cinn_compiled_object.scope.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto& cinn_runtime_program = cinn_compiled_object.runtime_program; | |
const auto& compiled_scope = *(cinn_compiled_object.scope.get()); | |
const auto& cinn_runtime_program = *(cinn_compiled_object.runtime_program); | |
const auto& compiled_scope = *(cinn_compiled_object.scope); |
"is not equivalent, paddle is [%s] " | ||
"but compiled result is [%s].", | ||
pd_name, paddle_tensor->dims(), compiled_dim)); | ||
// TODO(CtfGo): check the underlying data type is equivalent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个check还是加上吧?
if (!paddle_tensor->IsInitialized()) { | ||
// TODO(CtfGo): support mutable corresponding c++ type with the | ||
// compilation type | ||
paddle_tensor->mutable_data<float>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面这部分有想过如何处理吗?
const auto& cinn_name = paddle2cinn_varmap.count(pd_name) | ||
? paddle2cinn_varmap.at(pd_name) | ||
: pd_name; | ||
std::unique_ptr<cinn_buffer_t> buffer_ptr(new cinn_buffer_t()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::make_unique
instead.
const auto& cinn_name = paddle2cinn_varmap.count(pd_name) | ||
? paddle2cinn_varmap.at(pd_name) | ||
: pd_name; | ||
std::unique_ptr<cinn_buffer_t> buffer_ptr(new cinn_buffer_t()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::make_unique
instead.
const std::vector<std::string>& input_var_names, | ||
const std::vector<std::string>& output_var_names) { | ||
std::unordered_set<std::string> all_paddle_names, all_cinn_names; | ||
for_each(paddle2cinn_varmap.begin(), paddle2cinn_varmap.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不用加上std::
吗?
auto* paddle_tensor = var_ptr->GetMutable<LoDTensor>(); | ||
auto compiled_ddim = framework::make_ddim(cinn_tensor->shape().data()); | ||
// TODO(CtfGo): support mutable corresponding c++ type | ||
paddle_tensor->mutable_data<float>(compiled_ddim, place); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面这部分有想过如何处理其他数据类型吗?可以考虑使用CinnTensor::Type()
方法获取对应的cpp type。
&hold_buffers); | ||
} | ||
// Step 4. Launch CINN to execute the compilation runtime program | ||
cinn_runtime_program->Execute(&name2argument); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要确认一下Execute
是同步阻塞在这里,等执行完毕才会往下走吧?否则一旦退出这个作用域,hold_buffers中unique_ptr指向的内存就被回收了。
增加CinnLaunchOp,负责执行Cinn子图编译的结果,要点如下: 1. 在子图划分的BuildCinnPass中,每个子图在原图中会被替换为该CinnLaunchOp,由它来调用Cinn进行子图编译、执行的功能。 2. CinnLaunchOp的输入/输出即为子图的输入和输出,另外增加`compilation_key`属性,它可由该属性key从全局Cache中获取子图对象、编译结果,该属性由BuildCinnPass在创建Op时进行设置 3. CinnLaunchOp功能实现的流程为: - 从全局Cache中获取子图对象 - 从全局Cache中获取子图编译结果,未命中cache时进行即时编译 - 根据编译结果的变量信息(数据类型、shape)初始化运行时数据,分配内存/显存 - 将运行时数据打包为参数,调用cinn的可执行对象runtime program进行计算 - 子图运行结果通过参数指针同步到paddle侧的tensor
PR types
New features
PR changes
OPs
Describe
增加CinnLaunchOp,负责执行Cinn子图编译的结果,要点如下:
compilation_key
属性,它可由该属性key从全局Cache中获取子图对象、编译结果,该属性由BuildCinnPass在创建Op时进行设置- 从全局Cache中获取子图对象
- 从全局Cache中获取子图编译结果,未命中cache时进行即时编译
- 根据编译结果的变量信息(数据类型、shape)初始化运行时数据,分配内存/显存
- 将运行时数据打包为参数,调用cinn的可执行对象runtime program进行计算
- 子图运行结果通过参数指针同步到paddle侧的tensor