Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRR] C++ DRR (Declarative Rewrite Rule) of Paddle #55859

Merged
merged 232 commits into from
Oct 18, 2023

Conversation

yuanlehome
Copy link
Contributor

@yuanlehome yuanlehome commented Aug 1, 2023

PR types

New features

PR changes

Others

Description

一、背景与目标

背景
目前Paddle正在进行IR升级工作,其核心目标是设计一套能够由多方共用且优于当前的IR基础设施。引入的新IR会统一整个Paddle体系内的IR表示。
Pass作为对IR进行优化(常量折叠、死代码消除、运算融合等)的关键组件,也需要基于新IR重新进行设计并解决原Pass体系中存在的各种问题。通过对Paddle内现有Pass进行统计分类,发现DAG->DAG PatternRewrite类型的Pass数量占比过半。为了提升用户在新IR上开发Pass的使用体验并且降低后续全量Pass迁移的成本,我们有必要对PatternRewrite这一类型的Pass做进一步的设计优化,使用户通过简单的接口调用即可完成在新IR上的模式匹配替换Pass。

目标
针对DAG->DAG PatternRewrite的Pass场景,提供一套简洁易用,用户开发成本较低的API接口,用来实现在新IR上子图匹配替换Pass。

image

DRR(Declarative Rewrite Rule) Pass API并不是IR,而是对IR的统一封装,目的是让用户集中在对优化逻辑的处理上,而不需要关心对底层IR的处理。其C++形式示例如下:

void SimpliedRemoveRedundentReshapePass(DrrPassContext* ctx) {
// Source patterns:待匹配的子图
  SourcePattern pat = ctx->SourcePattern();
  const auto& reshape = pat.Op("pd_op.reshape");
  // 匹配两个重叠reshape, 对输入输出张量分别设置为 arg0 和 ret,方便后面加约束
  pat.Tensor("ret") = reshape(reshape(pat.Tensor("arg0")));

  // Constrains:约束规则
  // 要求 arg0 和 ret 的 shape 相同, 其中 Tensor.shape的类型是抽象层面的PatternShape, 而不是普通 Shape
  // RequireEqual:封装的通用API,用户无需关心基础数据结构
  pat.RequireEqual(pat.Tensor("ret").shape(), pat.Tensor("arg0").shape());
  // RequireNativeCall:用于处理基础API不能提供的操作,其中的lambda表达式会在Pass运行期执行,因此会与底层数据结构相关
  pat.RequireNativeCall([](MatchContext* match_ctx) -> bool {
    // MatchContext拿到的已经是实际值,而非PatternShape,因此需指定实际类型
    return match_ctx->Tensor("ret").shape<std::vector<int>>() == match_ctx->Tensor("arg0").shape<std::vector<int>>();
  });
 
  // Result patterns:要替换为的子图
  ResultPattern res = pat.ResultPattern();
  // 使用 arg0 替换 ret 
  // 所有 ret 参数均在Source Pattern中使用,对 ret 的替换等同于对 ret 的 producer op的删除
  res.Tensor("ret").Assign(res.Tensor("arg0"));
}

二、设计方案

整体概览
image
DRR Pass是在新IR上,针对DAG->DAG PatternRewrite的Pass场景,提供一套简洁易用,用户开发成本较低的API接口。主要支持的方向包括分布式训练、编译器前端以及推理在新IR上的Pass优化。

主体设计
image

本方案采用了分层的设计思想,将整个模块从上到下分为了三层:

  1. 最上层是用户接口,提供用户使用DRR的API接口
  2. 中间是DAG子图表示,用于将用户通过DRR API接口开发的代码转化为内部的DAG数据结构
  3. 最底层是新IR Program的匹配替换,根据上一层得到的中间DAG子图在新IR Program上做匹配,子图匹配成功后进行替换。

Others

Pcard-71500

@paddle-bot
Copy link

paddle-bot bot commented Aug 1, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented Aug 9, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ gongshaotian
✅ yuanlehome
❌ zyfncg
You have signed the CLA already but the status is still pending? Let us recheck it.

Tensor& Op::operator()(const Tensor& arg1, const Tensor& arg2) const {
std::vector<const Tensor*> inputs{&arg1, &arg2};
auto& out = pattern_graph_->AddTmpTensor(std::shared_ptr<Tensor>(new Tensor(
prefix + op_type_name_ + "_" + std::to_string(count++), pattern_graph_)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count是static变量,多线程场景是否会有访问冲突问题?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为thread_local

${PADDLE_SOURCE_DIR}/paddle/fluid/pir/dialect/op_generator/op_creator_drr_gen.py
)
set(op_compat_yaml_file ${PADDLE_SOURCE_DIR}/paddle/phi/api/yaml/op_compat.yaml)
set(op_forward_yaml_file1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前在如下目录中还存在一个 yaml 文件,该 yaml 文件中定义了 pir 单独定义的算子,是否需要纳入该自动生成体系中?:paddle/fluid/pir/dialect/operator/ir/ops.yaml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

paddle/fluid/pir/dialect/operator/ir/ops.yaml应该是加入进来了,对应的是这两个解析后的文件吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可能需要,这个我们后面加上~

cc_test_old(pattern_rewrite_test SRCS pattern_rewrite_test.cc DEPS
${PATTERN_REWRITE_TEST_DEPS})
if(NOT APPLE)
cc_test_old(pattern_rewrite_test SRCS pattern_rewrite_test.cc DEPS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议使用paddle_test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用paddle_test单测链接在win上会有问题

winter-wang
winter-wang previously approved these changes Oct 17, 2023
Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanlehome yuanlehome merged commit cce81ea into PaddlePaddle:develop Oct 18, 2023
28 checks passed
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Oct 24, 2023
* fix cudnn 8.7+ bug on cudnnConvolutionBiasActivationForward

* add drr_rewrite_pattern.h and remove_redundent_reshape demo

* add drr_context and pattern_graph class

* add test case

* fix cmake file

* fix compile bug

* fix runtime bug and refine code

* add MatchContext

* update code

* add impl of tensor_interface

* fix compile bug

* change smart ptr to pointor

* change smart to pointor

* change smart to pointor

* Replace 'weak_ptr' with pointer

* modify weak_ptr use count==0 judgment logic

* change smart to pointor

change smart to pointor

Replace 'weak_ptr' with pointer

modify weak_ptr use count==0 judgment logic

Replace the declaration and call of weakptr with pointer

* add match

* add match

* remove OperationInterface

* update

* Add Rewrite impl of DrrRewritePattern

* refine code

* rename ir_value to get in IrValue

* fix header include

* add CreateOperation template demo

* Add GraphTopo class in pattern_graph

* Reimplementing the GraphTopo class using queue

* Reimplementing the GraphTopo class using queue

* Optimize the access method of visited tensor

* Considering that the inputs of opcall may be empty

* Overloading the operator() method of Op, supporting dual tensor inputs

* support attr

* 1. Add Op class support for multi input and multi output function. 2. Add DRR duplicate TransposeOp merge testing code

* 1. Add transferOP in createOption func

* fix bug

* fix NotifyOperationRemoved

* refine code

* Fix axis bug in perm

* mupdate share_ptr

* update

* refine drr_test ut

* Modify according to review

* modify reshape_op

* format code

* support vector<int> for attr

* fix drr test

* refine code

* Resolve compilation loop dependencies

* add RequireNativeCall

* support native_call in drr api

* temp tensor prefix fix

* refine code

* suport Tensor Assgin API in ResultPattern

* refine test code

* refactor ther drr_pattern class

* refine test case

* rename  DrrPatternBuilder to DrrPatternBase

* fix compile bug

* adjust include

* Add log info in DrrRewritePattern

* use ir::get_type_name

* use ir::get_type_name

* support compute attrbute in drr pattern

* refine code

* Add fusion testing code for fullOp and expandOp

* Standardize code format

* Replace IR_THROW() with PADDLE_THROW()

* refine code

* add attention fuse demo

* update

* fix compile error

* add multihead_matmul fuse pattern

* fix multihead_matmul

* Update drr_attention_fuse_test.cc

add buildprogram

* fix drr_attention_fuse_test compile

* add fused_gemm_epilogue in drr

* attr support std::vector<int64_t>

* add debug log

* update

* fix some bug

* fix confilct

* support subgraph replace in source pattern graph for drr

* Improve the implementation of Drr and multihead_matmul_fuse_pass

* add ReorderBlockOpsPass

* fix drr_attention_fuse_pass

* update

* update reorder_block_ops_pass

* revert fusedgemm

* update

* Add Bottom2UpMatch() func

* merge code

* fix bug

* add log & fix bug

* refine cpp type trait

* using oprand() & num_oprand() replace oprands()

* fix conflict

* fix compile

* fix pd.xxx to pd_op.xxx

* fix bug of delete op in drr

* add PatternGraphMatchV2 & FindOutputOp func

* refactor ir operation creator

* fix include pir

* fix ir

* merging

* Split out dfsvisitor func from FindOutputOp func

* fix bug

* fix output op in source pattern bug

* Debugging drr_test drr_attention_fuse_test passed!

* Debugging drr_fuse_linear_test passed!

* Optimize the PatternGraphMatchV2 function interface and overload the operator= method in  MatchContextImpl

* Modify comments and function names

* auto code-gen for creating ir operation in drr

* delete debug log

* optimize the interface of MatchFromOutputToInput()

* Optimize SourcePatternGraph::OutputNodes judgment logic

* polish code

* using default operator=() in MatchContextImpl

* fix merge conflict

* create test case: drr_same_name_test

* fix duplicate binding of ir op bug

* Rename drr_same_name_test to drr_same_type_binding_test & Add graphical notes

* refactor logic of insert point for creating new operation in drr

* update

* fix compile error

* fix some bug

* fix codestyle

* fix bug

* Update anchor node judgment logic

* fix bug of link pir

* fix codestyle

* self review v1

* refine code format

* set thread_local for count in op class

* fix compile on mac

* remove unused .h in value.cc

* fix compile

---------

Co-authored-by: zyfncg <zhangyunfei07@baidu.com>
Co-authored-by: gongshaotian <gstian5555@outlook.com>
Co-authored-by: gongshaotian <>
Co-authored-by: gongshaotian <141618702+gongshaotian@users.noreply.github.com>
jiahy0825 pushed a commit to jiahy0825/Paddle that referenced this pull request Oct 26, 2023
* fix cudnn 8.7+ bug on cudnnConvolutionBiasActivationForward

* add drr_rewrite_pattern.h and remove_redundent_reshape demo

* add drr_context and pattern_graph class

* add test case

* fix cmake file

* fix compile bug

* fix runtime bug and refine code

* add MatchContext

* update code

* add impl of tensor_interface

* fix compile bug

* change smart ptr to pointor

* change smart to pointor

* change smart to pointor

* Replace 'weak_ptr' with pointer

* modify weak_ptr use count==0 judgment logic

* change smart to pointor

change smart to pointor

Replace 'weak_ptr' with pointer

modify weak_ptr use count==0 judgment logic

Replace the declaration and call of weakptr with pointer

* add match

* add match

* remove OperationInterface

* update

* Add Rewrite impl of DrrRewritePattern

* refine code

* rename ir_value to get in IrValue

* fix header include

* add CreateOperation template demo

* Add GraphTopo class in pattern_graph

* Reimplementing the GraphTopo class using queue

* Reimplementing the GraphTopo class using queue

* Optimize the access method of visited tensor

* Considering that the inputs of opcall may be empty

* Overloading the operator() method of Op, supporting dual tensor inputs

* support attr

* 1. Add Op class support for multi input and multi output function. 2. Add DRR duplicate TransposeOp merge testing code

* 1. Add transferOP in createOption func

* fix bug

* fix NotifyOperationRemoved

* refine code

* Fix axis bug in perm

* mupdate share_ptr

* update

* refine drr_test ut

* Modify according to review

* modify reshape_op

* format code

* support vector<int> for attr

* fix drr test

* refine code

* Resolve compilation loop dependencies

* add RequireNativeCall

* support native_call in drr api

* temp tensor prefix fix

* refine code

* suport Tensor Assgin API in ResultPattern

* refine test code

* refactor ther drr_pattern class

* refine test case

* rename  DrrPatternBuilder to DrrPatternBase

* fix compile bug

* adjust include

* Add log info in DrrRewritePattern

* use ir::get_type_name

* use ir::get_type_name

* support compute attrbute in drr pattern

* refine code

* Add fusion testing code for fullOp and expandOp

* Standardize code format

* Replace IR_THROW() with PADDLE_THROW()

* refine code

* add attention fuse demo

* update

* fix compile error

* add multihead_matmul fuse pattern

* fix multihead_matmul

* Update drr_attention_fuse_test.cc

add buildprogram

* fix drr_attention_fuse_test compile

* add fused_gemm_epilogue in drr

* attr support std::vector<int64_t>

* add debug log

* update

* fix some bug

* fix confilct

* support subgraph replace in source pattern graph for drr

* Improve the implementation of Drr and multihead_matmul_fuse_pass

* add ReorderBlockOpsPass

* fix drr_attention_fuse_pass

* update

* update reorder_block_ops_pass

* revert fusedgemm

* update

* Add Bottom2UpMatch() func

* merge code

* fix bug

* add log & fix bug

* refine cpp type trait

* using oprand() & num_oprand() replace oprands()

* fix conflict

* fix compile

* fix pd.xxx to pd_op.xxx

* fix bug of delete op in drr

* add PatternGraphMatchV2 & FindOutputOp func

* refactor ir operation creator

* fix include pir

* fix ir

* merging

* Split out dfsvisitor func from FindOutputOp func

* fix bug

* fix output op in source pattern bug

* Debugging drr_test drr_attention_fuse_test passed!

* Debugging drr_fuse_linear_test passed!

* Optimize the PatternGraphMatchV2 function interface and overload the operator= method in  MatchContextImpl

* Modify comments and function names

* auto code-gen for creating ir operation in drr

* delete debug log

* optimize the interface of MatchFromOutputToInput()

* Optimize SourcePatternGraph::OutputNodes judgment logic

* polish code

* using default operator=() in MatchContextImpl

* fix merge conflict

* create test case: drr_same_name_test

* fix duplicate binding of ir op bug

* Rename drr_same_name_test to drr_same_type_binding_test & Add graphical notes

* refactor logic of insert point for creating new operation in drr

* update

* fix compile error

* fix some bug

* fix codestyle

* fix bug

* Update anchor node judgment logic

* fix bug of link pir

* fix codestyle

* self review v1

* refine code format

* set thread_local for count in op class

* fix compile on mac

* remove unused .h in value.cc

* fix compile

---------

Co-authored-by: zyfncg <zhangyunfei07@baidu.com>
Co-authored-by: gongshaotian <gstian5555@outlook.com>
Co-authored-by: gongshaotian <>
Co-authored-by: gongshaotian <141618702+gongshaotian@users.noreply.github.com>
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
* fix cudnn 8.7+ bug on cudnnConvolutionBiasActivationForward

* add drr_rewrite_pattern.h and remove_redundent_reshape demo

* add drr_context and pattern_graph class

* add test case

* fix cmake file

* fix compile bug

* fix runtime bug and refine code

* add MatchContext

* update code

* add impl of tensor_interface

* fix compile bug

* change smart ptr to pointor

* change smart to pointor

* change smart to pointor

* Replace 'weak_ptr' with pointer

* modify weak_ptr use count==0 judgment logic

* change smart to pointor

change smart to pointor

Replace 'weak_ptr' with pointer

modify weak_ptr use count==0 judgment logic

Replace the declaration and call of weakptr with pointer

* add match

* add match

* remove OperationInterface

* update

* Add Rewrite impl of DrrRewritePattern

* refine code

* rename ir_value to get in IrValue

* fix header include

* add CreateOperation template demo

* Add GraphTopo class in pattern_graph

* Reimplementing the GraphTopo class using queue

* Reimplementing the GraphTopo class using queue

* Optimize the access method of visited tensor

* Considering that the inputs of opcall may be empty

* Overloading the operator() method of Op, supporting dual tensor inputs

* support attr

* 1. Add Op class support for multi input and multi output function. 2. Add DRR duplicate TransposeOp merge testing code

* 1. Add transferOP in createOption func

* fix bug

* fix NotifyOperationRemoved

* refine code

* Fix axis bug in perm

* mupdate share_ptr

* update

* refine drr_test ut

* Modify according to review

* modify reshape_op

* format code

* support vector<int> for attr

* fix drr test

* refine code

* Resolve compilation loop dependencies

* add RequireNativeCall

* support native_call in drr api

* temp tensor prefix fix

* refine code

* suport Tensor Assgin API in ResultPattern

* refine test code

* refactor ther drr_pattern class

* refine test case

* rename  DrrPatternBuilder to DrrPatternBase

* fix compile bug

* adjust include

* Add log info in DrrRewritePattern

* use ir::get_type_name

* use ir::get_type_name

* support compute attrbute in drr pattern

* refine code

* Add fusion testing code for fullOp and expandOp

* Standardize code format

* Replace IR_THROW() with PADDLE_THROW()

* refine code

* add attention fuse demo

* update

* fix compile error

* add multihead_matmul fuse pattern

* fix multihead_matmul

* Update drr_attention_fuse_test.cc

add buildprogram

* fix drr_attention_fuse_test compile

* add fused_gemm_epilogue in drr

* attr support std::vector<int64_t>

* add debug log

* update

* fix some bug

* fix confilct

* support subgraph replace in source pattern graph for drr

* Improve the implementation of Drr and multihead_matmul_fuse_pass

* add ReorderBlockOpsPass

* fix drr_attention_fuse_pass

* update

* update reorder_block_ops_pass

* revert fusedgemm

* update

* Add Bottom2UpMatch() func

* merge code

* fix bug

* add log & fix bug

* refine cpp type trait

* using oprand() & num_oprand() replace oprands()

* fix conflict

* fix compile

* fix pd.xxx to pd_op.xxx

* fix bug of delete op in drr

* add PatternGraphMatchV2 & FindOutputOp func

* refactor ir operation creator

* fix include pir

* fix ir

* merging

* Split out dfsvisitor func from FindOutputOp func

* fix bug

* fix output op in source pattern bug

* Debugging drr_test drr_attention_fuse_test passed!

* Debugging drr_fuse_linear_test passed!

* Optimize the PatternGraphMatchV2 function interface and overload the operator= method in  MatchContextImpl

* Modify comments and function names

* auto code-gen for creating ir operation in drr

* delete debug log

* optimize the interface of MatchFromOutputToInput()

* Optimize SourcePatternGraph::OutputNodes judgment logic

* polish code

* using default operator=() in MatchContextImpl

* fix merge conflict

* create test case: drr_same_name_test

* fix duplicate binding of ir op bug

* Rename drr_same_name_test to drr_same_type_binding_test & Add graphical notes

* refactor logic of insert point for creating new operation in drr

* update

* fix compile error

* fix some bug

* fix codestyle

* fix bug

* Update anchor node judgment logic

* fix bug of link pir

* fix codestyle

* self review v1

* refine code format

* set thread_local for count in op class

* fix compile on mac

* remove unused .h in value.cc

* fix compile

---------

Co-authored-by: zyfncg <zhangyunfei07@baidu.com>
Co-authored-by: gongshaotian <gstian5555@outlook.com>
Co-authored-by: gongshaotian <>
Co-authored-by: gongshaotian <141618702+gongshaotian@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants