Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.102】 move fused_embedding_eltwise_layernorm/fusion_transpose_flatten_concat/fused_fc_elementwise_layernorm to phi #57865

Merged
merged 8 commits into from
Oct 10, 2023

Conversation

zeroRains
Copy link
Contributor

@zeroRains zeroRains commented Oct 3, 2023

PR types

Others

PR changes

Others

Description

move fused_embedding_eltwise_layernorm/fusion_transpose_flatten_concat/fused_fc_elementwise_layernorm to phi
#57262

@paddle-bot
Copy link

paddle-bot bot commented Oct 3, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Oct 3, 2023
@CLAassistant
Copy link

CLAassistant commented Oct 3, 2023

CLA assistant check
All committers have signed the CLA.

auto dim_output = phi::make_ddim({batch, seq_len, hidden});
out->set_dims(dim_output);
// out->share_lod(ids);
// context->ShareLoD("Ids", /*->*/ "Out");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面可以单独提个PR 这里注释给删掉,out->share_lod(ids); 这行为什么给注释掉?

Copy link
Contributor Author

@zeroRains zeroRains Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为在phi 中的share_lod需要一个tensor类型的变量作为参数,但是在这个算子里面的ids是一个vector<DesneTensor*>的类型,暂时不知道怎么处理,所以先留着。这种情况应该怎么处理呢@yuanlehome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为在phi 中的share_lod需要一个tensor类型的变量作为参数,但是在这个算子里面的ids是一个vector<DesneTensor*>的类型,暂时不知道怎么处理,所以先留着。这种情况应该怎么处理呢@yuanlehome

share第0个,out->share_lod(*ids[0]);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,等内存访问异常BUG解决后一起合并

Copy link
Contributor

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for OP-Benchmark

@luotao1 luotao1 merged commit 5ae0019 into PaddlePaddle:develop Oct 10, 2023
@yuanlehome
Copy link
Contributor

问下,这个 “开启FLAGS_enable_new_ir_in_executor=1,单测也可以运行成功” 验证了吗?

@zeroRains
Copy link
Contributor Author

zeroRains commented Oct 10, 2023

问下,这个 “开启FLAGS_enable_new_ir_in_executor=1,单测也可以运行成功” 验证了吗?

额,test_ir_embedding_eltwise_layernorm_fuse_pass这个单测有一个内存访问异常的报错,但是我没看出来哪里有问题。可以麻烦您看一下吗?@yuanlehome

其他两个算子的验证都通过了。

}LPYL }N{Z(2FAEP1FP58HA

@yuanlehome
Copy link
Contributor

yuanlehome commented Oct 10, 2023

问下,这个 “开启FLAGS_enable_new_ir_in_executor=1,单测也可以运行成功” 验证了吗?

额,test_ir_embedding_eltwise_layernorm_fuse_pass这个单测有一个内存访问异常的报错,但是我没看出来哪里有问题。可以麻烦您看一下吗?@yuanlehome

其他两个算子的验证都通过了。

}LPYL }N{Z(2FAEP1FP58HA

可以 set GLOG_v=5 看下日志,定位下出错位置

@zeroRains
Copy link
Contributor Author

问下,这个 “开启FLAGS_enable_new_ir_in_executor=1,单测也可以运行成功” 验证了吗?

额,test_ir_embedding_eltwise_layernorm_fuse_pass这个单测有一个内存访问异常的报错,但是我没看出来哪里有问题。可以麻烦您看一下吗?@yuanlehome
其他两个算子的验证都通过了。
}LPYL }N{Z(2FAEP1FP58HA

可以 set GLOG_v=5 看下日志,定位下出错位置

我在kernel中使用了VLOG的方式进行输出定位,但是在我使用命令GLOG_v=5 ctest -R embedding_eltwise_layernorm_fuse_pass -VV运行单测时,却没有显示我在kernel中设置的VLOG输出,请问这是为什么呢?
@yuanlehome

image

@yuanlehome
Copy link
Contributor

问下,这个 “开启FLAGS_enable_new_ir_in_executor=1,单测也可以运行成功” 验证了吗?

额,test_ir_embedding_eltwise_layernorm_fuse_pass这个单测有一个内存访问异常的报错,但是我没看出来哪里有问题。可以麻烦您看一下吗?@yuanlehome
其他两个算子的验证都通过了。
}LPYL }N{Z(2FAEP1FP58HA

可以 set GLOG_v=5 看下日志,定位下出错位置

我在kernel中使用了VLOG的方式进行输出定位,但是在我使用命令GLOG_v=5 ctest -R embedding_eltwise_layernorm_fuse_pass -VV运行单测时,却没有显示我在kernel中设置的VLOG输出,请问这是为什么呢? @yuanlehome

image

对应的实现是这个kernel paddle/phi/kernels/fusion/gpu/fused_embedding_eltwise_layernorm_kernel.cu

@zeroRains
Copy link
Contributor Author

fused_embedding_eltwise_layernorm_kernel

好吧,我看错了。。。sorry

@yuanlehome
Copy link
Contributor

fused_embedding_eltwise_layernorm_kernel

好吧,我看错了。。。sorry

image

我这边定位到在kernel最后copy out,会挂掉,你可以继续看看是不是输入有问题,导致结果计算出错

@yuanlehome
Copy link
Contributor

fused_embedding_eltwise_layernorm_kernel

好吧,我看错了。。。sorry

image

我这边定位到在kernel最后copy out,会挂掉,你可以继续看看是不是输入有问题,导致结果计算出错

加一个同步,PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize());
image
发现PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize())这一行就挂掉了,说明前面计算有,kernel实现有问题

@zeroRains
Copy link
Contributor Author

fused_embedding_eltwise_layernorm_kernel

好吧,我看错了。。。sorry

image
我这边定位到在kernel最后copy out,会挂掉,你可以继续看看是不是输入有问题,导致结果计算出错

加一个同步,PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize()); image 发现PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize())这一行就挂掉了,说明前面计算有,kernel实现有问题

应该是这个函数EmbEltwiseLayernormKernel出了问题,这个是从paddle/fluid/operators/math/bert_encoder_functor.cu中复制过来的(没怎么改过)应该不会有问题才对,感觉是哪一步索引出了问题。如果从输入数据的角度考虑的话,会不会是这个异步复制数据的操作还没复制完成就执行了kernel导致CUDA地址索引异常呢?
image

@zeroRains
Copy link
Contributor Author

fused_embedding_eltwise_layernorm_kernel

好吧,我看错了。。。sorry

image
我这边定位到在kernel最后copy out,会挂掉,你可以继续看看是不是输入有问题,导致结果计算出错

加一个同步,PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize()); image 发现PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize())这一行就挂掉了,说明前面计算有,kernel实现有问题

应该是这个函数EmbEltwiseLayernormKernel出了问题,这个是从paddle/fluid/operators/math/bert_encoder_functor.cu中复制过来的(没怎么改过)应该不会有问题才对,感觉是哪一步索引出了问题。如果从输入数据的角度考虑的话,会不会是这个异步复制数据的操作还没复制完成就执行了kernel导致CUDA地址索引异常呢? image

尝试使用cudaStreamSynchronize()对复制进行同步,也没能解决问题,(:з」∠)

Frida-a pushed a commit to Frida-a/Paddle that referenced this pull request Oct 14, 2023
…transpose_flatten_concat/fused_fc_elementwise_layernorm to phi (PaddlePaddle#57865)

* transplant fused_embedding_elt_wise_layer_norm_kernel

* fix the error

* fix some bug

* move the transpose to phi but new IR have a bug in output==nullptr. embedding_eltwise_op also have the bug in new IR. because the wrong memory accesss

* remove some useless code

* move fused_fc_elementwise_layernorm to phi, but have a bug in making

* fix the bug in build the fused_fc_elementwise_layernorm_kernel and pass the test with new IR

* try to fix the bug
@yuanlehome
Copy link
Contributor

fused_embedding_eltwise_layernorm_kernel

好吧,我看错了。。。sorry

image
我这边定位到在kernel最后copy out,会挂掉,你可以继续看看是不是输入有问题,导致结果计算出错

加一个同步,PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize()); image 发现PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize())这一行就挂掉了,说明前面计算有,kernel实现有问题

应该是这个函数EmbEltwiseLayernormKernel出了问题,这个是从paddle/fluid/operators/math/bert_encoder_functor.cu中复制过来的(没怎么改过)应该不会有问题才对,感觉是哪一步索引出了问题。如果从输入数据的角度考虑的话,会不会是这个异步复制数据的操作还没复制完成就执行了kernel导致CUDA地址索引异常呢? image

尝试使用cudaStreamSynchronize()对复制进行同步,也没能解决问题,(:з」∠)

这个我看一下,你先不用管了哈

@zeroRains
Copy link
Contributor Author

fused_embedding_eltwise_layernorm_kernel

好吧,我看错了。。。sorry

image
我这边定位到在kernel最后copy out,会挂掉,你可以继续看看是不是输入有问题,导致结果计算出错

加一个同步,PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize()); image 发现PADDLE_ENFORCE_GPU_SUCCESS(cudaDeviceSynchronize())这一行就挂掉了,说明前面计算有,kernel实现有问题

应该是这个函数EmbEltwiseLayernormKernel出了问题,这个是从paddle/fluid/operators/math/bert_encoder_functor.cu中复制过来的(没怎么改过)应该不会有问题才对,感觉是哪一步索引出了问题。如果从输入数据的角度考虑的话,会不会是这个异步复制数据的操作还没复制完成就执行了kernel导致CUDA地址索引异常呢? image

尝试使用cudaStreamSynchronize()对复制进行同步,也没能解决问题,(:з」∠)

这个我看一下,你先不用管了哈

好的,麻烦你了

jiahy0825 pushed a commit to jiahy0825/Paddle that referenced this pull request Oct 16, 2023
…transpose_flatten_concat/fused_fc_elementwise_layernorm to phi (PaddlePaddle#57865)

* transplant fused_embedding_elt_wise_layer_norm_kernel

* fix the error

* fix some bug

* move the transpose to phi but new IR have a bug in output==nullptr. embedding_eltwise_op also have the bug in new IR. because the wrong memory accesss

* remove some useless code

* move fused_fc_elementwise_layernorm to phi, but have a bug in making

* fix the bug in build the fused_fc_elementwise_layernorm_kernel and pass the test with new IR

* try to fix the bug
@yuanlehome
Copy link
Contributor

这个PR #58115 修复了 test/ir/test_ir_embedding_eltwise_layernorm_fuse_pass.py 单测在 export FLAGS_enable_new_ir_in_executor=1 时的运行报错。

@zeroRains zeroRains deleted the hei branch October 25, 2023 12:04
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
…transpose_flatten_concat/fused_fc_elementwise_layernorm to phi (PaddlePaddle#57865)

* transplant fused_embedding_elt_wise_layer_norm_kernel

* fix the error

* fix some bug

* move the transpose to phi but new IR have a bug in output==nullptr. embedding_eltwise_op also have the bug in new IR. because the wrong memory accesss

* remove some useless code

* move fused_fc_elementwise_layernorm to phi, but have a bug in making

* fix the bug in build the fused_fc_elementwise_layernorm_kernel and pass the test with new IR

* try to fix the bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants