Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mini-batch Size vs. Memory Limit #1929

Closed
jimmie33 opened this issue Feb 21, 2015 · 3 comments
Closed

Mini-batch Size vs. Memory Limit #1929

jimmie33 opened this issue Feb 21, 2015 · 3 comments
Labels

Comments

@jimmie33
Copy link

Currently mini-batch size N is subject to the memory limit. For example, for training a large model, I cannot use large mini-batch size, otherwise my GPU cannot N training sample at once.

Is it possible that Caffe can support mini-batch size that can be a multiple of input data batch size? My understanding is that it just needs to accumulate the gradients over several batches of input data before doing a model update step. Right?

I wonder if Caffe will support this functionality, or it already does that (I am new to Caffe so I may have missed something)? Or is there any difficulty I overlooked in implementing this functionality?

@shelhamer
Copy link
Member

Already done in #1663! Now that the latest release is out it'll be merged once we double-check the details.

@jimmie33
Copy link
Author

@shelhamer Thanks for your information. This is great!

Is there a way now to control the data batch size and the mini-batch size, based on the new gradient accumulation implementation? I think this needs an extra parameter in the proto files, and it also requires some changes in the solver, right? Are these also be done, or will be done soon?

@rohrbach rohrbach added the ES label Feb 22, 2015
@shelhamer
Copy link
Member

It's already there in #1663 -- now #1977. It's the iter_size setting in the solver config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants