-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dp main_grad #7293
dp main_grad #7293
Conversation
Thanks for your contribution! |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #7293 +/- ##
===========================================
- Coverage 59.34% 58.00% -1.35%
===========================================
Files 567 579 +12
Lines 83355 86266 +2911
===========================================
+ Hits 49466 50037 +571
- Misses 33889 36229 +2340 ☔ View full report in Codecov by Sentry. |
# Multi-gpu training | ||
if self.args.world_size > 1 and not self.args.use_hybrid_parallel: | ||
model = paddle.DataParallel(model) | ||
# Distributed training (should be after fp16 initialization) | ||
|
||
if self.args.amp_master_grad: | ||
mix_precision_utils.MixPrecisionLayer(model, dtype=self.amp_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的model的话,dp已经包了一层。确认一下 MixPrecisionLayer 再包一次影响不?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里这样包一次是可以的
paddlenlp/trainer/training_args.py
Outdated
@@ -810,9 +810,10 @@ def __post_init__(self): | |||
self.pipeline_parallel_degree <= 1 | |||
and self.tensor_parallel_degree <= 1 | |||
and (not self.sharding or ShardingOption.FULL_SHARD in self.sharding) | |||
and self.use_hybrid_parallel | |||
): | |||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是只有stage3 不行,这里可以把判断改简单些。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Description
dp main_grad