Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize ELBO computation over num particles #1176

Merged
merged 7 commits into from
Jun 8, 2018

Conversation

neerajprad
Copy link
Member

@neerajprad neerajprad commented Jun 5, 2018

Resolves #791.

My initial plan was to write a separate SVI class for this; but it turned out that modifying the base ELBO class was the simplest way to percolate this change to all the other loss classes.

I have modified only a few tests that were hand-parallelizing over num particles, but we can do a broader sweep later after this PR is merged. It is not strictly necessary though since those tests are fast anyways. Also modified the conjugate_gradient test that was not parallelized.

Copy link
Member

@fritzo fritzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

@@ -32,7 +45,63 @@ class ELBO(object):
def __init__(self,
num_particles=1,
max_iarange_nesting=float('inf'),
auto_vectorize=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe call this vectorize_particles in case we want to vectorize some other aspect?

with pyro.iarange("num_particles_vectorized", self.num_particles, dim=-self.max_iarange_nesting):
return fn(*args, **kwargs)

return poutine.broadcast(wrapped_fn)
Copy link
Member

@fritzo fritzo Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth checking whether fn is already wrapped in poutine.broadcast and avoid wrapping a second time? Alternatively, this avoidance check could be done inside poutine.broadcast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll check it in poutine.broadcast itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add fn.__broadcasted__ as an attribute to mark a function that has already been broadcasted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't broadcast be idempotent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes broadcast should be idempotent, I'm asking only whether it is worth avoiding double wrapping for efficiency purposes. If it costs more than 5 lines of code, then it's probably not worth avoiding the double wrap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll remove this.

strict_enumeration_warning=True):
self.num_particles = num_particles
self.max_iarange_nesting = max_iarange_nesting
self.auto_vectorize = auto_vectorize
if self.auto_vectorize:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if self.vectorize_particles and self.num_particles > 1: ...

Runs the guide and runs the model against the guide with
the result packaged as a trace generator.
"""
if self.auto_vectorize:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may error if self.num_particles == 1 since max_iarange_nesting will not have been incremented. Instead gate on if self.vectorize_particles and self.num_particles > 1, as used elsewhere, or simplify the gate above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _vectorized_num_particles method does a special check for num_particles=1, so it should work fine. Note that vectorize_particles also calls poutine.broadcast (the two effect are hard to decouple), so even when the user has num_particles=1 with parallelize_particles=True, we would like to wrap the model and guide inside a poutine.broadcast so that the behavior is consistent.

Copy link
Member

@jpchen jpchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! do you have any basic speed comparisons, at least on the test examples?

@@ -15,6 +23,11 @@ class ELBO(object):
:func:`pyro.iarange` contexts. This is only required to enumerate over
sample sites in parallel, e.g. if a site sets
``infer={"enumerate": "parallel"}``.
:param bool auto_vectorize: Vectorize ELBO computation over `num_particles`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add what it defaults to

check_site_shape(site, self.max_iarange_nesting)

yield model_trace, guide_trace
guide_trace = poutine.trace(guide).get_trace(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method identical sans the for loop over num_particles? the diff shows the whole method has changed but from a cursory glance it looks the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are identical. have you tried the old "cross your eyes at the side-by-side diff" trick?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do the "cmd-f a randomly sampled substring and see if it appears on both sides". 90% of the time, it works every time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's right! Mostly Trace_ELBO, TraceGraph_ELBO and TraceEnum_ELBO will only differ in how they generate a single trace. This is reused when we want to generate traces vectorized over num_particles.

@neerajprad
Copy link
Member Author

looks good! do you have any basic speed comparisons, at least on the test examples?

@jpchen - This will be many orders of magnitude faster than sequentially sampling traces num particles times.

e.g. for test_conjugate_gradients.py 103 m.s. for 10k particles when they are parallelized vs. more than 2 mins otherwise.

I intend to do some comparisons w.r.t. hand parallelization which is what we should be comparing against. I did not observe any significant difference for most models, but there are cases when we end up adding up to 20% more time as compared to hand parallelization. It is still very fast though (all in milliseconds vs seconds or mins).

@@ -248,7 +248,14 @@ def broadcast(fn=None):
... return sample
"""
msngr = BroadcastMessenger()
return msngr(fn) if fn is not None else msngr
if fn is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can omit this logic until we see a real performance cost of double wrapping.

Copy link
Collaborator

@martinjankowiak martinjankowiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super awesome!

lgtm

question: i guess as the code stands we expect the user to know when auto-vectorization will break (e.g. in cases with stochastic control flow etc.)?

@eb8680 eb8680 added the Blocked label Jun 5, 2018
@neerajprad neerajprad force-pushed the parallel-numparticles branch from b9c04e6 to 6e1ac43 Compare June 5, 2018 23:21
@eb8680
Copy link
Member

eb8680 commented Jun 5, 2018

question: i guess as the code stands we expect the user to know when auto-vectorization will break (e.g. in cases with stochastic control flow etc.)?

Yeah, we'd have to do a bit of static analysis for automatic validation

@neerajprad
Copy link
Member Author

question: i guess as the code stands we expect the user to know when auto-vectorization will break (e.g. in cases with stochastic control flow etc.)?

That's true. I think if an iarange does not specify a size arg or if the size arg is not consistent with the sample site's observed batch shape, we should throw an error in broadcast poutine itself (by having some kind of a strict mode under which this operates). That will ensure that we are are not doing something odd or silently failing in those situations.

Copy link
Member

@fritzo fritzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for simplifying the enumeration tests!

pyro.sample("x3", dist.Bernoulli(q).expand_by([num_particles]), infer={"enumerate": enumerate3})
pyro.sample("x1", dist.Bernoulli(q), infer={"enumerate": enumerate1})
pyro.sample("x2", dist.Bernoulli(q), infer={"enumerate": enumerate2})
pyro.sample("x3", dist.Bernoulli(q), infer={"enumerate": enumerate3})

kl = sum(kl_divergence(dist.Bernoulli(q), dist.Bernoulli(p)) for p in [0.1, 0.2, 0.3])
expected_loss = kl.item()
expected_grad = grad(kl, [q])[0]

elbo = TraceEnum_ELBO(max_iarange_nesting=1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can now set max_iarange_nesting=0 since it is incremented internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pyro.sample("x1", dist.Bernoulli(0.1).expand_by([num_particles]))
pyro.sample("x2", dist.Bernoulli(0.2).expand_by([num_particles]))
pyro.sample("x3", dist.Bernoulli(0.3).expand_by([num_particles]))
pyro.sample("x1", dist.Bernoulli(0.1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: outdent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, fixed.

pyro.sample("y", d.expand_by([inner_dim, 1, num_particles]))
with context1, context2:
pyro.sample("z", d.expand_by([inner_dim, outer_dim, num_particles]))
context1 = pyro.iarange("outer", outer_dim, dim=-2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change these dims to -1,-2 now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -26,35 +26,35 @@ def test_subsample_gradient(Elbo, reparameterized, subsample):
pyro.clear_param_store()
data = torch.tensor([-0.5, 2.0])
subsample_size = 1 if subsample else len(data)
num_particles = 50000
precision = 0.06
Normal = dist.Normal if reparameterized else fakes.NonreparameterizedNormal

def model(subsample):
Copy link
Member

@fritzo fritzo Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should wrap these model and guide with @poutine.broadcast and avoid relying on the implicit @poutine.broadcast implied by num_particles>1 and vectorize_particles=True. While it is correct without the explicit decorator, it is better style to be explicit, and users may look to our tests for idiomatic usage.

Copy link
Member Author

@neerajprad neerajprad Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vectorize_particles with num_particles=1 would still use @poutine.broadcast, so as not to have some special surprising behavior for num_particles=1. That was part of the reason why I thought of calling it auto_vectorize to emphasize that its a bit more general than adding an outer iarange, and could be used as a replacement for @poutine.broadcast with or without num_particles. That said, vectorize_particles is a clearer arg name, but it makes the broadcasting side-effect seem unexpected. Will change to using @poutine.broadcast, unless there is a better name that we can use for this arg.


@pytest.mark.parametrize('enumerate_', [None, "sequential", "parallel"])
def test_enum_discrete_vectorized_num_particles(enumerate_):
num_particles = 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you @pytest.mark.parametrize('num_particles', [1, 10]) to test the single-particle edge case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do.

@neerajprad neerajprad force-pushed the parallel-numparticles branch from 31371c5 to 76fa97c Compare June 6, 2018 22:24
@neerajprad neerajprad removed the Blocked label Jun 6, 2018
fritzo
fritzo previously approved these changes Jun 7, 2018
elbo.loss_and_grads(self.model, self.guide, reparameterized=reparameterized)

for i in range(1, N + 1):
for param_prefix in ["loc_q_%d", "log_sig_q_%d", "kappa_q_%d"]:
if i == N and param_prefix == 'kappa_q_%d':
continue
actual_grad = pyro.param(param_prefix % i).grad
assert_equal(actual_grad, 0.0 * actual_grad, prec=0.20, msg="".join([
assert_equal(actual_grad, 0.0 * actual_grad, prec=0.10, msg="".join([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -15,6 +23,11 @@ class ELBO(object):
:func:`pyro.iarange` contexts. This is only required to enumerate over
sample sites in parallel, e.g. if a site sets
``infer={"enumerate": "parallel"}``.
:param bool vectorize_particles: Vectorize ELBO computation over `num_particles`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding more info:
"Whether to vectorize the ELBO computation over num_particles. Defaults to False. This requires static structure in model and guide."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will update.

@neerajprad
Copy link
Member Author

The failure is due to the observations library unable to download MNIST, and is unrelated. I am fine with merging this PR as is, once tests pass. Will address the issue of slowness in .expand, and adding more warnings separately.

Copy link
Member

@jpchen jpchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fritzo fritzo merged commit 865bff6 into pyro-ppl:dev Jun 8, 2018
breakdawn pushed a commit to breakdawn/pyro that referenced this pull request Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants