-
-
Notifications
You must be signed in to change notification settings - Fork 16.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable AdamW Optimizer #6153
Comments
@bilzard great, thank you! So the main difference between AdamW and Adam is that AdamW scales with batch-size automatically? We automatically scale the loss by the batch size for SGD/Adam in YOLOv5 here. Does this mean AdamW will be double-scaled, or does it automatically compensate so not a problem? Lines 165 to 167 in d95978a
|
Hi, thank you for the fast responding.
That's what I initially thought, but since I couldn't find any articles to support it, I concluded it was my misunderstanding. My understanding is that AdamW is an improvement version of Adam with respect to the weight decay update algorithm[1]. However, I don't think it is necessary to change my PR. I think some people may want to adopt AdamW for the reasons above. I didn't know about the loss scaling in SGD. Thank you for letting me know. I found the article [3] in web forum [2] that suggests scaling the learning rate proportionally to the batch size in the case of SGD. However, I could not find any evidence that the learning rate should be scaled by such a simple rule for adaptive optimizers such as Adam/AdamW. I don't think many people who use YOLOv5 will be aware of such scaling at first. |
P.S. Looking at some of the recent Issue responses, it seems that sometimes a PR is merged in the middle of a discussion. It's good to see a quick response, but isn't it a bit dangerous? |
@bilzard yes the idea is that users can change --batch-size without worrying about anything else. We've done a --batch-size study in #2452 to confirm that results are essentially independent from batch size with SGD due to the scaling we have in place in loss.py above. With Adam and AdamW I'm not sure. Note I also applied PR #6152 changes to the YOLOv5 classifier branch here to maintain consistency: Thanks for the feedback! Usually we get criticism in the other direction, that PRs stay open too long. |
@glenn-jocher Thank you for sharing the experiment result. I created PR for this change on my forked repository. Do you mind merging this on this original repository? Or, there is another option that a user can control this behavior by command line option like Note: PR has been updated with the latter choice. |
@bilzard wait there are two topics here:
The LR not adjusting automatically may be an issue, as someone will need to pair --optimizer Adam with a hyp.yaml file with a much lower learning rate to get similar results. i.e. if lr0=0.1 for SGD then they may want to start with lr0=0.01 for Adam. Regarding modifying the loss scaling we'd need to repeat a few points on the batch-size study using Adam and AdamW to see their real-world results. |
@glenn-jocher O.K. I agree with the idea of repeating the same study for Adam and AdamW. I will wait for that. |
@bilzard SGD/Adam batch size study results are here: https://wandb.ai/glenn-jocher/study-Adam. Dividing the SGD LR by 10 manually for the Adam/AdamW runs. # VOC
for b, m in zip([16, 64, 16, 64, 16, 64], ['SGD', 'SGD', 'Adam', 'Adam', 'AdamW', 'AdamW']): # zip(batch_size, model)
hyp = 'hyp.finetune.yaml' if m.startswith('SGD') else 'hyp.finetuneAdam.yaml'
!python train.py --batch {b} --weights yolov5s.pt --data VOC.yaml --epochs 50 --cache --img 512 --nosave --hyp {hyp} --project study-Adam --name {m}-{b} --optimizer {m} Adam seems to handle batch-size changes without issue, so it seems like no changes are required: |
Search before asking
Description
When we use Adam, we have to tune learning rate along with the batch size.
It is cumbersome; with AdamW, we don't have to re-tune learning rate even if we change batch size.
So, it is nice to be able to use this option.
I have created PR to enable AdamW optimizer. Please check it out.
#6152
Use case
No response
Additional
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: