-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug fix] fix sharding stage1 allgather overlap bug, which needs to forbiden pin memory #8594
[bug fix] fix sharding stage1 allgather overlap bug, which needs to forbiden pin memory #8594
Conversation
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8594 +/- ##
===========================================
- Coverage 54.42% 54.40% -0.02%
===========================================
Files 632 632
Lines 99451 99495 +44
===========================================
+ Hits 54129 54134 +5
- Misses 45322 45361 +39 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
额,你要不判断一下,开了 overlap 再 关闭 pin,这个对普通模型应该是降速的吧。
d1613ac
to
45f6314
Compare
if ( | ||
"enable_stage1_allgather_overlap" in training_args.sharding_parallel_config | ||
or "enable_stage1_broadcast_overlap" in training_args.sharding_parallel_config | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from paddle.io.reader import use_pinned_memory
import 也改到这里吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Bug fixes
PR changes
Others
Description
1、sharding stage overlap不能正确生效,原因是没有禁用pin memory
2、必须使用paddle.io.reader 里的 use_pinned_memory, paddle.io.base里面的use_pinned_memory(False)无法生效,原因未知