TODOs #2

rossviljoen · 2021-06-19T16:48:11Z

The current plan/potential things to do includes:

Finish recreating this example from GPFlow which requires:
- Minibatching & ADAM (possibly with Flux)
- GPU support?
  ~~Natural gradients [2]~~
Add support for non-conjugate likelihoods (this is done in GPFlow and [1] by quadrature)

[1] Hensman, James, Alexander Matthews, and Zoubin Ghahramani. "Scalable variational Gaussian process classification." Artificial Intelligence and Statistics. PMLR, 2015.
[2] Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. "Natural gradients in practice: Non-conjugate variational inference in gaussian process models." International Conference on Artificial Intelligence and Statistics. PMLR, 2018.

willtebbutt · 2021-06-21T08:30:41Z

Minibatching & ADAM (possibly with Flux)

This is definitely something that we need, although we possibly we don't want it to be Flux specific? Does Flux have particular minibatching helpers or something?

GPU support?

This is a good idea, but I would probably put it below getting a basic implementation + abstractions sorted. Possibly the main issue here will be ensuring that we have at least one kernel that plays nicely with this, so you might need to make a small PR to kernel functions or something to sort that out.

Add support for non-conjugate likelihoods (this is done in GPFlow and [1] by quadrature)

I suspect we're going to need to support both quadrature and Monte Carlo approaches here. As @theogf mentioned at the last meeting, although you often have low-dimensional integrals in the reconstruction term, it's not uncommon to have to work with quite high dimensional integrals (e.g. multi-class classification). In those cases, you hit the curse of dimensionality and cubature will tend not to be a particularly fantastic option. That being said, if you can get away with quadrature in a particular problem, it's typically a very good idea.

theogf · 2021-06-21T08:37:09Z

This is definitely something that we need, although we possibly we don't want it to be Flux specific? Does Flux have particular minibatching helpers or something?

Regarding this, Flux does have minibatch helpers via the DataLoader structure and its optimisers are quite practical. That said, this is a VERY heavy dependency and minibatch helpers can probably be found somewhere else. For optimisers there is a current work to take them out of Flux but this is taking forever : https://github.com/FluxML/Optimisers.jl

Possibly the main issue here will be ensuring that we have at least one kernel that plays nicely with this, so you might need to make a small PR to kernel functions or something to sort that out.

I tried some things already and on the kernel functions side the only issue are kernels without the right constructors see JuliaGaussianProcesses/KernelFunctions.jl#299

I suspect we're going to need to support both quadrature and Monte Carlo approaches here.

That's true, but it's probably wiser to just start with quadrature for now, and let the API be general enough such that adding MC integration would not be a burden

willtebbutt · 2021-06-21T08:40:10Z

That's true, but it's probably wiser to just start with quadrature for now, and let the API be general enough such that adding MC integration would not be a burden

Why do you think that this is the case? (Not arguing against it, just curious to understand your reasoning -- I would have imagined that Monte Carlo would be more straightforward to implement)

theogf · 2021-06-21T08:42:15Z

I think it's more performance related, quadrature works much better than sampling for those 1-D or 2-D integrals
Also using packages like FastGaussQuadrature makes it super easy.

willtebbutt · 2021-06-21T08:49:35Z

I'd be interested to know how you've gone about making a AD-friendly quadrature algorithms. I wound up writing an rrule directly here (which I should update to use the new ChainRules style stuff now that it can call back into AD), in my ConjugateComputationVI package.

theogf · 2021-06-21T08:58:33Z

I used Opper and Archambeau 2009, so I just use quadrature on the gradient (and the hessian), here : https://github.com/theogf/AugmentedGaussianProcesses.jl/blob/c7c9e9cf25a278b0855e769ad943d724513df36d/src/inference/quadratureVI.jl#L181
Alternatively, I think this is differentiable : https://github.com/theogf/AugmentedGaussianProcesses.jl/blob/c7c9e9cf25a278b0855e769ad943d724513df36d/src/inference/quadratureVI.jl#L163

willtebbutt · 2021-06-21T09:00:39Z

I used Opper and Archambeau 2009

As in the O(2N) variational parameters parametrisation, or the tricks to compute the gradient w.r.t. the parameters by re-writing them as expectations of gradients / hessians?

theogf · 2021-06-21T09:05:04Z

Well both :)

willtebbutt · 2021-06-21T09:07:38Z

Cool. Regular gradients or natural?

theogf · 2021-06-21T09:12:34Z

Well both :) 😆
https://github.com/theogf/AugmentedGaussianProcesses.jl/blob/c7c9e9cf25a278b0855e769ad943d724513df36d/src/inference/numericalVI.jl#L117
https://github.com/theogf/AugmentedGaussianProcesses.jl/blob/c7c9e9cf25a278b0855e769ad943d724513df36d/src/inference/numericalVI.jl#L144

willtebbutt · 2021-06-21T09:20:47Z

Haha nice. So am I correct in the understanding that my CVI implementation should be basically equivalent to your natural gradient implementation here? (Since CVI is just natural gradients)

theogf · 2021-06-21T09:27:00Z

Hmm I am not completely sure, there might be some marginal differences...
I derived the natural scheme some time ago and forgot what I did exactly...

willtebbutt · 2021-06-21T09:48:30Z

Hmmm I'd be interested to know. Probably we should chat about this at some point.

rossviljoen · 2021-06-21T20:00:32Z

That's true, but it's probably wiser to just start with quadrature for now, and let the API be general enough such that adding MC integration would not be a burden

For this, I imagine I'd want to use GPLikelihoods.jl? (although it doesn't seem to be registered yet unless I'm missing something).

This is definitely something that we need, although we possibly we don't want it to be Flux specific? Does Flux have particular minibatching helpers or something?

Sure - I wasn't intending to have Flux as a dependency (beyond Functors.jl perhaps), just make sure it could integrate reasonably easily.

I was thinking of defining something like the Flux layer @devmotion was talking about in JuliaGaussianProcesses/KernelFunctions.jl#299 which just exposes the parameters and a function to build the model (i.e. something like what's currently in the example) but I don't know if it's better to use ParameterHandling instead?

Regarding this, Flux does have minibatch helpers via the DataLoader structure and its optimisers are quite practical.

It looks like https://github.com/JuliaML/MLDataPattern.jl has some minibatch helpers which could work instead

devmotion · 2021-06-21T20:10:14Z

Regarding quadrature algorithms: I'd recommend https://github.com/SciML/Quadrature.jl, it provides a unified interface for many different quadrature packages and is fully differentiable.

Regarding the Flux layer: I think one should be able to just specify a function that creates a kernel and a vector of parameters, i.e., ParamsKernel(f, params). Then this could be used with ParameterHandling (just call ParamsKernel(reverse(ParameterHandling.flatten(kernel))...)) or Functors (call ParamsKernel(reverse(Functors.functor(kernel))...)). If necessary, one could provide convenience functions that allow users to skip the reverse.

willtebbutt · 2021-06-22T10:04:46Z

Regarding quadrature algorithms: I'd recommend https://github.com/SciML/Quadrature.jl, it provides a unified interface for many different quadrature packages and is fully differentiable.

IIRC I tried Quadrature.jl and couldn't get it to work in the GP use-case. Firstly, I don't think that it supports Gauss-Hermite quadrature (which is really what you want to use). Unfortunately I can't remember what my other issue with it was.

theogf · 2021-06-22T10:34:07Z

TBH Quadrature.jl is a bit an overkill for GPs... All you really need is FastGaussQuadrature.jl

devmotion · 2021-06-22T10:40:38Z

The main disadvantage is that FastGaussQuadrature does not provide any error estimates and is not adaptive. But maybe this does not matter here (much)?

willtebbutt · 2021-06-22T10:42:48Z

But maybe this does not matter here (much)?

My experience (with simple likelihoods) has been that this is indeed the case. Not sure where this starts to be an issue though.

theogf · 2021-06-22T10:43:16Z

Firstly, I don't think that it supports Gauss-Hermite quadrature (which is really what you want to use).

They also have QuadGK as a backend: https://github.com/JuliaMath/QuadGK.jl

willtebbutt · 2021-06-22T10:45:10Z

Isn't that just Gaussian quadrature, rather than Gauss-Hermite?

devmotion · 2021-06-22T10:49:50Z

Yes, it uses adaptive Gauss–Kronrod quadrature.

st-- · 2021-06-23T11:49:22Z

Side note/suggestion- if the discussion continues any further, it might make it easier to follow by separating it into individual issues (e.g. one for quadrature, one for minibatching):)

theogf · 2021-06-23T12:28:17Z

Praise be the mono-issue!

rossviljoen · 2021-06-23T20:42:37Z

Good point! I've opened #3 and #4 so far.

rossviljoen · 2021-09-22T16:24:25Z

Everything discussed here is either done or in separate issues (#15), so I think it's safe to close?

rossviljoen closed this as completed Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODOs #2

TODOs #2

rossviljoen commented Jun 19, 2021 •

edited

Loading

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021 •

edited

Loading

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021 •

edited

Loading

willtebbutt commented Jun 21, 2021

rossviljoen commented Jun 21, 2021

devmotion commented Jun 21, 2021

willtebbutt commented Jun 22, 2021

theogf commented Jun 22, 2021

devmotion commented Jun 22, 2021

willtebbutt commented Jun 22, 2021

theogf commented Jun 22, 2021

willtebbutt commented Jun 22, 2021

devmotion commented Jun 22, 2021

st-- commented Jun 23, 2021

theogf commented Jun 23, 2021

rossviljoen commented Jun 23, 2021

rossviljoen commented Sep 22, 2021

TODOs #2

TODOs #2

Comments

rossviljoen commented Jun 19, 2021 • edited Loading

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021 • edited Loading

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021

willtebbutt commented Jun 21, 2021

theogf commented Jun 21, 2021 • edited Loading

willtebbutt commented Jun 21, 2021

rossviljoen commented Jun 21, 2021

devmotion commented Jun 21, 2021

willtebbutt commented Jun 22, 2021

theogf commented Jun 22, 2021

devmotion commented Jun 22, 2021

willtebbutt commented Jun 22, 2021

theogf commented Jun 22, 2021

willtebbutt commented Jun 22, 2021

devmotion commented Jun 22, 2021

st-- commented Jun 23, 2021

theogf commented Jun 23, 2021

rossviljoen commented Jun 23, 2021

rossviljoen commented Sep 22, 2021

rossviljoen commented Jun 19, 2021 •

edited

Loading

willtebbutt commented Jun 21, 2021 •

edited

Loading

theogf commented Jun 21, 2021 •

edited

Loading