-
-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] manual mini-batching and batch dimension scaling #1437
Comments
Good points. Most of our examples don't do nested subsampling, but maybe we could add a manually-batched version of our LDA example or something similar? If you have another example where this is relevant, we'd definitely welcome a PR. |
@mbabadi agreed we could improve docs about subsampling. I'm inclined to recommend users use |
I think one use case is when we are running inference on the GPU using a large dataset (i.e. calling data.cuda() at once will take a lot of GPU memory) for which the torch data loaders work great since they will spin up a thread of workers that will keep pulling off batches of data and transferring it to the GPU incrementally. We are using data loaders in our examples, but many of our datasets are probably small enough that they can be directly transferred in one shot. |
@neerajprad The use case you suggest cannot be accomplished via |
Ahh, my bad. In that case, we should probably just change our examples to use |
@fritzo @neerajprad I also can not imagine what can not be accomplished by |
In models with mixed levels of nesting (e.g. global_plate > local_plate_1 > local_plate_2 > ...), mibi-batching across different batch dimensions requires introducing proper scale factors for each batch dimension. Pyro handles these scale factors automatically if mini-batching is achieved via
pyro.iarange(..., size=..., subsample_size=...
) orpyro.iarange(..., size=..., subsample=...)
. The latter construct is flexible and allows arbitrary mibi-batching schemes, including big data situations where the full data tensor can not be loaded all at once.Mini-batching, however, is often done manually and externally and not via
pyro.iarange
. In such cases, the appropriate scale factors must also be applied manually viapoutine.scale
. We are being consistent here: manual mini-batching? then manual scaling. However, most of the examples (DMM, VAE, ...) have little to no emphasis on this issue and neglect scaling altogether. While convergence is not a big deal while working with adaptive optimizers, neglecting the scale factors leads to wrong ELBO estimates.poutine.scale
when mini-batching manually to set a good precedent for the new users?The text was updated successfully, but these errors were encountered: