You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I can see here result of for axample using batch size of 16 and batch size 4 and iter_size 4 should be numerically equivalent? and other settings of solver as learning rate and etc. should not affect result?
What is theoretical slowdown when we use batch accumulation?
How it's work internally? Does it store each batch result of forward pass in GPU memory (iter_size times in total), then average\merge batches to create one batch of batch_size size and then do backward pass and update using one batch of batch_size size ?
The text was updated successfully, but these errors were encountered:
As I understand batch accumulation is just a alias for
iter_size
parameter in Caffe solver.DIGITS/digits/model/tasks/caffe_train.py
Line 741 in 6b8995d
As I can see here result of for axample using batch size of 16 and batch size 4 and iter_size 4 should be numerically equivalent? and other settings of solver as learning rate and etc. should not affect result?
What is theoretical slowdown when we use batch accumulation?
How it's work internally? Does it store each batch result of forward pass in GPU memory (
iter_size
times in total), then average\merge batches to create one batch ofbatch_size
size and then do backward pass and update using one batch ofbatch_size
size ?The text was updated successfully, but these errors were encountered: