Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPU] update xdnn adamw_v2 #63108

Merged
merged 3 commits into from
May 27, 2024
Merged

Conversation

houj04
Copy link
Contributor

@houj04 houj04 commented Mar 29, 2024

PR Category

Operator Mechanism

PR Types

Performance

Description

修改点:

  • 支持调用XDNN里面新的adamw_v2函数,支持将beta1_powbeta2_pow以标量方式传入,而非XPU显存下的指针。
  • 由于上一条的修改,并且经过我在某个模型上的测试,下面的这个 adam batax_pow not in cpu #48626 可以回滚掉了,没有发现精度问题。(相关的另一个PR use full directly if device is CPU and in dygraph, for optimizer #48189
  • 支持用环境变量XPU_PADDLE_ADAMW_ROUND_BF16_OUTPUT来打开round_bf16_output功能,该配置可以在数据类型为bfloat16的时候,adamw_v2输出计算结果的时候,以round to nearest even的方式,而非传统的直接截断掉尾部16个bit。

本PR对速度没有什么影响,虽然干掉了adamw算子中的两个scale操作,但是实际跑起来的速度是持平的。考虑掉删除了优化器中的“针对XPU的特殊处理”,所以还是有点意义的。理论上在组网代码中不应该有这种“仅XPU特殊”的东西。

Copy link

paddle-bot bot commented Mar 29, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link

paddle-bot bot commented Mar 29, 2024

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

Copy link
Contributor

@lj970926 lj970926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -386,22 +379,23 @@ void AdamwDenseKernelKL3(const Context& dev_ctx,
reinterpret_cast<XPUType*>(dev_ctx.template Alloc<T>(param_out)),
master_in_data,
master_out_data,
param.numel());
param.numel(),
round_bf16_output);
PADDLE_ENFORCE_XDNN_SUCCESS(r, "adamw_v2");
}
if (!use_global_beta_pow) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在use_global_beta_pow是true吗?我看llama7b也是需要有scale

Copy link
Contributor Author

@houj04 houj04 Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是,use_global_beta_pow是false。
如果48626那个PR给revert掉的话,beta1_powbeta2_pow就会在CPU上了,和GPU一样,在CPU上更新这两个标量。需要找实际模型跑一下能不能revert掉。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在某个模型上面测试过了,需要回滚掉48626那个PR,这样beta1_powbeta2_pow才会放到CPU上面去。然后就可以用新的adamw_v2直接读这个标量用来计算了。

Copy link

paddle-ci-bot bot commented Apr 6, 2024

Sorry to inform you that 13de79f's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@QingshuChen QingshuChen merged commit f3c7518 into PaddlePaddle:develop May 27, 2024
32 checks passed
@houj04 houj04 added the XPU label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants