Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future areas of work / improvement for HMC and NUTS #1093

Closed
15 of 17 tasks
neerajprad opened this issue Apr 24, 2018 · 14 comments
Closed
15 of 17 tasks

Future areas of work / improvement for HMC and NUTS #1093

neerajprad opened this issue Apr 24, 2018 · 14 comments
Labels
enhancement help wanted Issues suitable for, and inviting external contributions

Comments

@neerajprad
Copy link
Member

neerajprad commented Apr 24, 2018

Please create a separate issue if you are working on a major task (e.g. mass matrix adaptation, or parallel chaining), so that all task specific discussion is contained within that issue.

Enhancements

Minor:

  • Deciding on a suitable number of warmup iterations. These are currently taken as input arguments, but adapt_step_size=True should have a reasonable default number of warmup iterations, if not specified by the user. e.g. we could default to 50% (as Stan), in which case if num_samples=100 then we would automatically run 100 warmup iterations if none are specified by the user.
  • Ability to set target acceptance probability during step size adaptation - currently fixed at 0.8. This will be specially useful to bias the adaptation towards smaller step sizes to explore problematic posteriors with regions of high curvature.
  • Ability to set max_tree_depth to trade off accuracy for speed.
  • Detect divergent transitions, and log them for analysis. Throwing a NaN here during sampling as we might do now with validation check enabled, is not very useful to the end user.
  • Implement a progress bar so that the logging info does not clutter the screen (specially in notebooks).

Major:

  • Multiple parallel chains for HMC (and/or NUTS). For HMC, this could be as simple as using poutine.broadcast to run parallel chains, similar in spirit to parallelizing ELBO computation over num_particles in Vectorize ELBO computation over num particles #1176, or (preferably) use torch.distributed to implement a more general (applicable to NUTS) and scalable solution.
  • Adapt the mass matrix during the warmup phase, alongside the step size parameter. Currently this is assumed to be a diagnormal.
  • Better initialization strategies, e.g. generating the initial trace after running ADVI, MAP (EDIT: this can be done by the user independently and the trace so generated can be specified via initial_trace). In addition, providing the option to the user to specify an initial trace to the NUTS/HMC kernel.
  • Use multinomial instead of slice sampling in NUTS.
  • Enumerate over discrete latents. Automatically enumerate discrete variables in HMC #1128
  • (Low priority) Add support for other MCMC algorithms like Gibbs sampling, and use these in conjunction with the HMC/NUTS kernel to allow sampling from models with discrete latent variables.
  • Parallel chains on CUDA. NOTE: This feature is partially supported. We need to hold traces in the workers until it terminates or the main process does not need them any more. But this way will somehow violate the main reason for using generator: to resolve memory issue for large model. A better mechanism for when to store traces, when to clear it in workers should be implemented.

Diagnostics / Results

JIT

@neerajprad
Copy link
Member Author

@fehiepsi - Please feel free to edit / add to these.

@fehiepsi
Copy link
Member

@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.

@neerajprad
Copy link
Member Author

@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.

Added that. I think that will be quite useful, but we will need to implement other MCMC kernels first.

@LoganWalls
Copy link
Contributor

LoganWalls commented May 11, 2018

I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already. But I've run into an issue and I'm not sure how you'd like to proceed in terms of the design. Should I discuss here, or open an issue specifically for the mass-matrix adaptation?

@neerajprad
Copy link
Member Author

I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already.

That's great to hear. I would suggest opening a separate issue which we can link to from here, so that it doesn't have to deal with all the noise from this master task.

@LoganWalls
Copy link
Contributor

I'd like to suggest adding one or more stochastic gradient approaches (ex: Stochastic Mini-Batch HMC, Stochastic Gradient Langevin Dynamics, etc.) to this list. There does seem to be some concern about the theoretical properties of these algorithms (as seen in this PyMC3 discussion) but I think their potential in applied applications a least merits consideration.

@fritzo
Copy link
Member

fritzo commented May 23, 2018

@neerajprad @jpchen @rohitsingh0812 FYI PyMC devs are considering using xarray as a format for inference results of PyMC and PyStan. This seems like a good decision to me, and it would be nice if we could aim for an interchangeable format.

@eb8680
Copy link
Member

eb8680 commented May 23, 2018

it would be nice if we could aim for an interchangeable format

What does this buy us? Is this meant to allow us to use arviz for visualization? I think it would be great if we wrote something to convert TracePosteriors into whatever summary format they have in mind, but I don't see a good reason to commit to that as the sole representation of inference output.

@neerajprad
Copy link
Member Author

I think since xarray supports numpy, it should be relatively straightforward for us to convert the results of TracePosterior to that format and get any summary/plotting utilities. I like PyMC's traceplot, and it would be great to have access to that without having to develop all of that within Pyro.

@fritzo
Copy link
Member

fritzo commented May 23, 2018

I think it would be great if we wrote something to convert TracePosteriors into [xarray]

Yeah, the idea is to leverage the work of other teams who are converting PyStan and PyMC output into a standard format built on xarray. This will enable comparison across PPL systems and algorithms.

@neerajprad
Copy link
Member Author

@cfperez - I have updated the task list here, specific to HMC/NUTS. We don't have a separate issue for visualization, and something like traceplot will be useful for all inference algorithms, not just HMC. Some relevant discussion here. I think using arviz would require us to convert torch.Tensor to a pandas dataframe, or xarray. That needs a more involved discussion (on rolling out our own solution, vs. tying ourselves to an external data format / dependency), so please feel free to create a new issue if you would like to work on this!

@eb8680 eb8680 added help wanted Issues suitable for, and inviting external contributions and removed help wanted Issues suitable for, and inviting external contributions good first issue labels Oct 17, 2018
This was referenced Nov 6, 2018
@fehiepsi
Copy link
Member

fehiepsi commented May 7, 2019

We already have a separated Gibbs sampling issue. Multi chain in CUDA is possible now with PyTorch 1.1.0. Divergence info (which is NUTS tree diverging flag) can be added easily but it seems not important. Feel free to make a separate FR if it is necessary.

@fehiepsi fehiepsi closed this as completed May 7, 2019
@riversdark
Copy link
Contributor

Hi @fehiepsi can you explain a little bit why you think the divergence diagnostics are not important? They seem to be critical for telling if NUTS is working properly. Do we have any other means to check convergence in Pyro? Right now the only thing I can find are the effective number of samples and R hat, but none of them are HMC specific.

@fehiepsi
Copy link
Member

@riversdark That's just my feeling. I haven't read much literature on divergence diagnostics. We can easily add it (we just need to decide where it should go: progress bar or store it). I'll open a FR for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Issues suitable for, and inviting external contributions
Projects
None yet
Development

No branches or pull requests

6 participants