Future areas of work / improvement for HMC and NUTS #1093

neerajprad · 2018-04-24T22:54:02Z

Please create a separate issue if you are working on a major task (e.g. mass matrix adaptation, or parallel chaining), so that all task specific discussion is contained within that issue.

Enhancements

Minor:

Deciding on a suitable number of warmup iterations. These are currently taken as input arguments, but adapt_step_size=True should have a reasonable default number of warmup iterations, if not specified by the user. e.g. we could default to 50% (as Stan), in which case if num_samples=100 then we would automatically run 100 warmup iterations if none are specified by the user.
Ability to set target acceptance probability during step size adaptation - currently fixed at 0.8. This will be specially useful to bias the adaptation towards smaller step sizes to explore problematic posteriors with regions of high curvature.
Ability to set max_tree_depth to trade off accuracy for speed.
Detect divergent transitions, and log them for analysis. Throwing a NaN here during sampling as we might do now with validation check enabled, is not very useful to the end user.
Implement a progress bar so that the logging info does not clutter the screen (specially in notebooks).

Major:

Multiple parallel chains for HMC (and/or NUTS). For HMC, this could be as simple as using poutine.broadcast to run parallel chains, similar in spirit to parallelizing ELBO computation over num_particles in Vectorize ELBO computation over num particles #1176, or (preferably) use torch.distributed to implement a more general (applicable to NUTS) and scalable solution.
Adapt the mass matrix during the warmup phase, alongside the step size parameter. Currently this is assumed to be a diagnormal.
Better initialization strategies, ~~e.g. generating the initial trace after running ADVI, MAP~~ (EDIT: this can be done by the user independently and the trace so generated can be specified via initial_trace). In addition, providing the option to the user to specify an initial trace to the NUTS/HMC kernel.
Use multinomial instead of slice sampling in NUTS.
Enumerate over discrete latents. Automatically enumerate discrete variables in HMC #1128
(Low priority) Add support for other MCMC algorithms like Gibbs sampling, and use these in conjunction with the HMC/NUTS kernel to allow sampling from models with discrete latent variables.
Parallel chains on CUDA. NOTE: This feature is partially supported. We need to hold traces in the workers until it terminates or the main process does not need them any more. But this way will somehow violate the main reason for using generator: to resolve memory issue for large model. A better mechanism for when to store traces, when to clear it in workers should be implemented.

Diagnostics / Results

Other logging: [Feature request] Improving output messages and warning handling for MCMC #1175, Do not throw NaN warnings during NUTS/HMC adaptation #1196
Utilities to provide inference summary, or summarized stats for posterior over latent sites. For instance, examples/baseball.py implements some summary utilities, but it will be great to have a consistent interface for different inference algorithms, and not have to rely on pandas. PyMC is considering using xarray as a universal format for inference results, including PyStan results.
~~Plotting posterior over latents, like pymc3.plots.traceplot.~~ Plotting posterior is straightforward now with the marginal() method.
Add support for convergence diagnostics like Effective Sample Size for each latent site, Gelman-Rubin convergence diagnostic (once parallel chains are implemented).

JIT

Explore using PyTorch JIT to make HMC models faster, specially smaller models where the Python runtime overhead dominates over any torch tensor computations. See Support PyTorch JIT compilation #1063. Work started in Use JIT traced potential energy computation in HMC #1299.

The text was updated successfully, but these errors were encountered:

neerajprad · 2018-04-24T22:54:18Z

@fehiepsi - Please feel free to edit / add to these.

fehiepsi · 2018-04-25T09:12:52Z

@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.

neerajprad · 2018-04-25T16:49:38Z

@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.

Added that. I think that will be quite useful, but we will need to implement other MCMC kernels first.

LoganWalls · 2018-05-11T17:05:11Z

I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already. But I've run into an issue and I'm not sure how you'd like to proceed in terms of the design. Should I discuss here, or open an issue specifically for the mass-matrix adaptation?

neerajprad · 2018-05-11T17:14:43Z

I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already.

That's great to hear. I would suggest opening a separate issue which we can link to from here, so that it doesn't have to deal with all the noise from this master task.

LoganWalls · 2018-05-14T17:41:45Z

I'd like to suggest adding one or more stochastic gradient approaches (ex: Stochastic Mini-Batch HMC, Stochastic Gradient Langevin Dynamics, etc.) to this list. There does seem to be some concern about the theoretical properties of these algorithms (as seen in this PyMC3 discussion) but I think their potential in applied applications a least merits consideration.

fritzo · 2018-05-23T17:12:55Z

@neerajprad @jpchen @rohitsingh0812 FYI PyMC devs are considering using xarray as a format for inference results of PyMC and PyStan. This seems like a good decision to me, and it would be nice if we could aim for an interchangeable format.

eb8680 · 2018-05-23T17:36:39Z

it would be nice if we could aim for an interchangeable format

What does this buy us? Is this meant to allow us to use arviz for visualization? I think it would be great if we wrote something to convert TracePosteriors into whatever summary format they have in mind, but I don't see a good reason to commit to that as the sole representation of inference output.

neerajprad · 2018-05-23T17:49:47Z

I think since xarray supports numpy, it should be relatively straightforward for us to convert the results of TracePosterior to that format and get any summary/plotting utilities. I like PyMC's traceplot, and it would be great to have access to that without having to develop all of that within Pyro.

fritzo · 2018-05-23T17:50:05Z

I think it would be great if we wrote something to convert TracePosteriors into [xarray]

Yeah, the idea is to leverage the work of other teams who are converting PyStan and PyMC output into a standard format built on xarray. This will enable comparison across PPL systems and algorithms.

neerajprad · 2018-08-10T21:54:46Z

@cfperez - I have updated the task list here, specific to HMC/NUTS. We don't have a separate issue for visualization, and something like traceplot will be useful for all inference algorithms, not just HMC. Some relevant discussion here. I think using arviz would require us to convert torch.Tensor to a pandas dataframe, or xarray. That needs a more involved discussion (on rolling out our own solution, vs. tying ourselves to an external data format / dependency), so please feel free to create a new issue if you would like to work on this!

fehiepsi · 2019-05-07T20:36:13Z

We already have a separated Gibbs sampling issue. Multi chain in CUDA is possible now with PyTorch 1.1.0. Divergence info (which is NUTS tree diverging flag) can be added easily but it seems not important. Feel free to make a separate FR if it is necessary.

riversdark · 2019-06-29T09:29:39Z

Hi @fehiepsi can you explain a little bit why you think the divergence diagnostics are not important? They seem to be critical for telling if NUTS is working properly. Do we have any other means to check convergence in Pyro? Right now the only thing I can find are the effective number of samples and R hat, but none of them are HMC specific.

fehiepsi · 2019-06-29T22:32:01Z

@riversdark That's just my feeling. I haven't read much literature on divergence diagnostics. We can easily add it (we just need to decide where it should go: progress bar or store it). I'll open a FR for it.

neerajprad added the good first issue label Apr 24, 2018

neerajprad added the enhancement label May 4, 2018

LoganWalls mentioned this issue May 11, 2018

Mass Matrix Adaptation for HMC and NUTS #1137

Closed

neerajprad mentioned this issue Jun 7, 2018

[Feature request] Improving output messages and warning handling for MCMC #1175

Closed

eb8680 added help wanted Issues suitable for, and inviting external contributions and removed help wanted Issues suitable for, and inviting external contributions good first issue labels Oct 17, 2018

neerajprad mentioned this issue Oct 19, 2018

Support parallel chains in HMC #1443

Merged

9 tasks

This was referenced Nov 6, 2018

Added NUTS baseball tutorial #1514

Closed

Refactoring MCMC module #1520

Closed

fehiepsi mentioned this issue Dec 28, 2018

Expose target_accept_prob and max_tree_depth for HMC/NUTS #1686

Merged

varenick mentioned this issue Feb 24, 2019

Implementation of Gibbs sampling for MCMC #1772

Open

fehiepsi closed this as completed May 7, 2019

fehiepsi mentioned this issue Jun 29, 2019

[FR] support divergence diagnostics for NUTS #1931

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future areas of work / improvement for HMC and NUTS #1093

Future areas of work / improvement for HMC and NUTS #1093

neerajprad commented Apr 24, 2018 •

edited by fehiepsi

Loading

neerajprad commented Apr 24, 2018

fehiepsi commented Apr 25, 2018

neerajprad commented Apr 25, 2018

LoganWalls commented May 11, 2018 •

edited

Loading

neerajprad commented May 11, 2018

LoganWalls commented May 14, 2018

fritzo commented May 23, 2018

eb8680 commented May 23, 2018

neerajprad commented May 23, 2018

fritzo commented May 23, 2018

neerajprad commented Aug 10, 2018

fehiepsi commented May 7, 2019

riversdark commented Jun 29, 2019

fehiepsi commented Jun 29, 2019

Future areas of work / improvement for HMC and NUTS #1093

Future areas of work / improvement for HMC and NUTS #1093

Comments

neerajprad commented Apr 24, 2018 • edited by fehiepsi Loading

Enhancements

Diagnostics / Results

JIT

neerajprad commented Apr 24, 2018

fehiepsi commented Apr 25, 2018

neerajprad commented Apr 25, 2018

LoganWalls commented May 11, 2018 • edited Loading

neerajprad commented May 11, 2018

LoganWalls commented May 14, 2018

fritzo commented May 23, 2018

eb8680 commented May 23, 2018

neerajprad commented May 23, 2018

fritzo commented May 23, 2018

neerajprad commented Aug 10, 2018

fehiepsi commented May 7, 2019

riversdark commented Jun 29, 2019

fehiepsi commented Jun 29, 2019

neerajprad commented Apr 24, 2018 •

edited by fehiepsi

Loading

LoganWalls commented May 11, 2018 •

edited

Loading