Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug fix] fix sharding stage1 allgather overlap bug, which needs to forbiden pin memory #8594

Merged

Conversation

iosmers
Copy link
Contributor

@iosmers iosmers commented Jun 13, 2024

PR types

Bug fixes

PR changes

Others

Description

1、sharding stage overlap不能正确生效,原因是没有禁用pin memory
2、必须使用paddle.io.reader 里的 use_pinned_memory, paddle.io.base里面的use_pinned_memory(False)无法生效,原因未知

Copy link

paddle-bot bot commented Jun 13, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.40%. Comparing base (5bdf751) to head (09449fc).
Report is 242 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8594      +/-   ##
===========================================
- Coverage    54.42%   54.40%   -0.02%     
===========================================
  Files          632      632              
  Lines        99451    99495      +44     
===========================================
+ Hits         54129    54134       +5     
- Misses       45322    45361      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

额,你要不判断一下,开了 overlap 再 关闭 pin,这个对普通模型应该是降速的吧。

@iosmers iosmers force-pushed the fix_stage1_overlap_forbiden_pin_mem branch from d1613ac to 45f6314 Compare June 13, 2024 08:47
if (
"enable_stage1_allgather_overlap" in training_args.sharding_parallel_config
or "enable_stage1_broadcast_overlap" in training_args.sharding_parallel_config
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from paddle.io.reader import use_pinned_memory

import 也改到这里吧

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit 439f8f3 into PaddlePaddle:develop Jun 13, 2024
7 of 12 checks passed
@iosmers iosmers deleted the fix_stage1_overlap_forbiden_pin_mem branch June 13, 2024 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants