Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] manual mini-batching and batch dimension scaling #1437

Open
1 task
mbabadi opened this issue Oct 8, 2018 · 6 comments
Open
1 task

[feature request] manual mini-batching and batch dimension scaling #1437

mbabadi opened this issue Oct 8, 2018 · 6 comments
Labels
documentation help wanted Issues suitable for, and inviting external contributions

Comments

@mbabadi
Copy link

mbabadi commented Oct 8, 2018

In models with mixed levels of nesting (e.g. global_plate > local_plate_1 > local_plate_2 > ...), mibi-batching across different batch dimensions requires introducing proper scale factors for each batch dimension. Pyro handles these scale factors automatically if mini-batching is achieved via pyro.iarange(..., size=..., subsample_size=...) or pyro.iarange(..., size=..., subsample=...). The latter construct is flexible and allows arbitrary mibi-batching schemes, including big data situations where the full data tensor can not be loaded all at once.

Mini-batching, however, is often done manually and externally and not via pyro.iarange. In such cases, the appropriate scale factors must also be applied manually via poutine.scale. We are being consistent here: manual mini-batching? then manual scaling. However, most of the examples (DMM, VAE, ...) have little to no emphasis on this issue and neglect scaling altogether. While convergence is not a big deal while working with adaptive optimizers, neglecting the scale factors leads to wrong ELBO estimates.

  • Adding a word of caution to the examples about scale factors and/or throwing in poutine.scale when mini-batching manually to set a good precedent for the new users?
@eb8680
Copy link
Member

eb8680 commented Oct 12, 2018

Good points. Most of our examples don't do nested subsampling, but maybe we could add a manually-batched version of our LDA example or something similar? If you have another example where this is relevant, we'd definitely welcome a PR.

@fritzo
Copy link
Member

fritzo commented Oct 16, 2018

@mbabadi agreed we could improve docs about subsampling. I'm inclined to recommend users use pyro.iarange(..., subsample=...) when any minibatching is done, as that clarifies the intention of the code. Do you know of any cases where minibatching cannot be done through pyro.iarange(..., subsample=...)?

@neerajprad
Copy link
Member

neerajprad commented Oct 16, 2018

Do you know of any cases where minibatching cannot be done through pyro.iarange(..., subsample=...)?

I think one use case is when we are running inference on the GPU using a large dataset (i.e. calling data.cuda() at once will take a lot of GPU memory) for which the torch data loaders work great since they will spin up a thread of workers that will keep pulling off batches of data and transferring it to the GPU incrementally. We are using data loaders in our examples, but many of our datasets are probably small enough that they can be directly transferred in one shot.

@fritzo
Copy link
Member

fritzo commented Oct 16, 2018

@neerajprad The use case you suggest cannot be accomplished via iarange(..., subsample_size=...), but it can be accomplished via iarange(..., subsample=...) (that is the motivating use case behind the subsample kwarg).

@neerajprad
Copy link
Member

The use case you suggest cannot be accomplished via iarange(..., subsample_size=...), but it can be accomplished via iarange(..., subsample=...) (that is the motivating use case behind the subsample kwarg).

Ahh, my bad. In that case, we should probably just change our examples to use subsample=, which will do the correct scaling.

@mbabadi
Copy link
Author

mbabadi commented Oct 17, 2018

@fritzo @neerajprad I also can not imagine what can not be accomplished by iarange(..., subsample=...)! A callable subsampler can take care of both incremental data loading and optionally sending the minibatch to CUDA. That would be great if you could simply encourage the usage of this motif in the examples.

@eb8680 eb8680 added help wanted Issues suitable for, and inviting external contributions and removed help wanted Issues suitable for, and inviting external contributions good first issue labels Oct 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation help wanted Issues suitable for, and inviting external contributions
Projects
None yet
Development

No branches or pull requests

4 participants