-
-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjusting the iarange dim allocation scheme #1115
Comments
questions:
|
Since the batch shape is completely determined by
I don't have full context into this, but you can always wrap the model internals inside an |
How does this affect model composability? For example suppose I compose a model def temporal_model(times):
with pyro.iarange("time"):
...
def spatial_model(pixels):
with pyro.iarange("x", 320):
with pyro.iarange("y", 200):
...
def model(videos):
with pyro.iarange("batch", len(videos), dim=-3):
temporal_noise = temporal_model(
torch.arange(len(videos[0])).expand(len(videos), -1)
spatial_noise = spatial_model(videos)
... Here Now I've built a more complex Would the proposed change preserve this assumption that model components are more stable than model aggregates, and that most bookkeeping should be at the aggregate level? Are any extra utilities or patterns needed to make it easy to compose models? |
Since this is only exercised at the time of inference, I don't think it should need any change to the component models. With Also assume that the user decides to not rely on implicit broadcasting and manually does an expand by in |
We currently need to specify this but we can just use |
It is currently nice that the def model(videos):
with pyro.iarange("batch", len(videos)): # <--- automatically allocated
with pyro.iarange("padding", 1): # <--- manually padded
temporal_noise = temporal_model(
torch.arange(len(videos[0])).expand(len(videos), -1)
spatial_noise = spatial_model(videos)
... Aside: This discussion seems better suited for a design doc tha a github issue. |
Good idea, I will move this to a design doc, but I want to keep this issue open for some time because I am finding this useful to just understand the issues raised, and gather some problematic examples that would need to be addressed. Will sync offline on the issue of composability that you mentioned above. |
Agreed that this issue is metastasizing into a design doc. At any rate, I'm pretty sure I'm still misunderstanding everything, but here are a couple thoughts I had after a discussion with @neerajprad: First, in this proposal Second, here's a sketch of an alternative allocation strategy that favors composition and preserves the current indexing scheme in user-land (again, modulo ellipses to the left of every user-land indexing operation): Consider a slightly-crazy scenario in which we'd want to compose a bunch of different effects that exploit broadcasting in a model that itself has We could give each effect its own identical, independent sandbox: allocate blocks of dimensions to each broadcasting effect, starting with user-specified The order of dimension allocation can then be forward or reverse (or manual, via We'd need an extra construction to delimit blocks of effect dimensions and ensure that
This explicit delimiting construction should allow each broadcasting effect in the hypothetical scenario to be as handled cleanly and separately as in @neerajprad's original comment, as if there were no others, and by not delimiting in One potential problem I see with this proposal is incorporating dependent enumeration (#915) which in principle could require a dimension block of unbounded length. Two hacky solutions are to allocate a large but fixed number of enumeration dimensions (say, 1000 instead of the usual 32 or so), or require enumeration to be below all other |
After some discussions with @fritzo, I have come to the conclusion that the best way forward is to implement automatic distribution broadcasting (#1119), so that users do not need to manually expand distribution instances inside I believe that #1119 will solve for the bulk of the inconvenience associated with using nested iaranges, and introducing this additional complexity of dim ordering doesn't seem worth it at this point. There are some great points here, which we could revisit in a design doc later if we are not happy with the state of iarange broadcasting later. |
Just to clarify: #1119 is completely manual. It addresses only the lowest-level issue. I think a good plan is to
|
Currently, our
iarange
dimension allocator assigns dimensions toiarange
starting from the right - i.e. the outermostiarange
starts at -1, the next one at -2 and so on. Consider the following code:After discussing with @fritzo, @eb8680 and @martinjankowiak, I would like to propose the following change to the
iarange
dimension allocation: when the user specifiesmax_iarange_nesting
, we allocate dims starting from-max_iarange_nesting
and moving to the right, i.e. iarange dims follow their order of appearance in the model and guide. Notice how the code sample above would look like under this scheme:What we would finally like to have with implicit broadcasting (no need to adjust the batch sizes manually):
What does this give us?
Intuitive shape scoping. e.g. in the above case, assume that the
outer
dim was for parallelizing overnum_particles
, this would give samples where the outermostiarange
dimension is also reflected in the outermost dimension of the sample shape and hasshape[0]=num_particles
.More importantly, I think this will make it much easier to implement implicit broadcasting (using the iarange size param to automatically broadcast sample sites rather than relying on the user to
.expand_by
) as discussed in Parallelize ELBO computation over num_particles #791. For instance, in the second example, all this involves is looking at thecond_indep_stack
and doing.expand_by
(iarange_sizes,) + (1,) * max_iarange_nesting - len(cond_stack)
(only for the topmost nodes in the graph). Doing this with our current dim allocation scheme would involve replicating this logic by hand through thedim
arg to iarange to make the batch shapes align correctly, which kind of beats the point of doing implicit broadcasting.Note that this will be more general than the particular issue of parallelizing over
num_particles
. e.g. we could have the following structure where the outermost dimension corresponds to running parallel chains (for HMC) or parallel evolution candidates, and the inner one corresponds to the Monte Carlo estimator for ELBO.Cons:
max_iarange_nesting
(as also mentioned in the design doc). (EDIT: This will not be an issue in practice, as even a manual.expand_by
without.unsqueeze
will be allowed with implicit broadcasting.)I believe that this will be completely obviated by implicit broadcasting, and it is not as much of a cognitive burden when the user anyways is aware ofmax_iarange_nesting
.Let me know your thoughts, and if I may have overlooked something.
Related: #791
The text was updated successfully, but these errors were encountered: