Skip to content

Commit

Permalink
adjust local learning rate and decay according to gradient accumulation
Browse files Browse the repository at this point in the history
Divide local rate by `iter_size` to normalize the gradient according to
the full minibatch size and not only the computational batch size.

Multiply the local decay by `iter_size` to counter the division of the
local learning rate since the decay is multiplied by the rate in the
update equation.
  • Loading branch information
shelhamer committed May 27, 2015
1 parent 67b1ff3 commit 55585f5
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions src/caffe/solver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@ void SGDSolver<Dtype>::ApplyUpdate() {
ClipGradients();
for (int param_id = 0; param_id < this->net_->params().size(); ++param_id) {
Regularize(param_id);
ComputeUpdateValue(param_id, rate);
ComputeUpdateValue(param_id, rate / this->param_.iter_size());
}
this->net_->Update();
}
Expand All @@ -500,7 +500,8 @@ void SGDSolver<Dtype>::Regularize(int param_id) {
this->net_->params_weight_decay();
Dtype weight_decay = this->param_.weight_decay();
string regularization_type = this->param_.regularization_type();
Dtype local_decay = weight_decay * net_params_weight_decay[param_id];
Dtype local_decay = weight_decay * net_params_weight_decay[param_id]
* this->param_.iter_size();
switch (Caffe::mode()) {
case Caffe::CPU: {
if (local_decay) {
Expand Down

0 comments on commit 55585f5

Please sign in to comment.