Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic manifold diff sandbox test #1697

Merged
merged 14 commits into from
Mar 27, 2023
Merged

basic manifold diff sandbox test #1697

merged 14 commits into from
Mar 27, 2023

Conversation

dehann
Copy link
Member

@dehann dehann commented Mar 15, 2023

Hi @mateuszbaran,

Hope things are well. I was looking at the new ManifoldDiff and Manopt stuff you put up recently. Thanks for putting that up!

I was wondering if this quick example PR (see files changed tab) was in-line with how you'd recommend we implement?

Best,
Dehann

PS, We'd like to build towards Riemannian optimization even though the current IncrementalInference implementation still has a few remaining AbtractGroupManifold requirements.

PSS, Looks like this basic example has very heavy recompilation load. Will be looking to get down to 0% compile time as most important, then move towards 0 heap allocations from there.

       # and solve
       @time m1 = gradient_descent(M, f, grad_f_FD, data[1])
       
       @info "Basic Manopt test" string(m1')
       @test isapprox(p, m1; atol=0.15)
  0.483038 seconds (421.98 k allocations: 30.606 MiB, 3.25% gc time, 90.31% compilation time: 96% of which was recompilation)
┌ Info: Basic Manopt test
└   string(m1') = "[0.7285775291286883 0.07535342266464667 0.6808058796319686]"
Test Passed

@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Merging #1697 (5791e0c) into master (78d5f54) will decrease coverage by 22.16%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           master    #1697       +/-   ##
===========================================
- Coverage   76.48%   54.33%   -22.16%     
===========================================
  Files          74       73        -1     
  Lines        5601     5595        -6     
===========================================
- Hits         4284     3040     -1244     
- Misses       1317     2555     +1238     
Impacted Files Coverage Δ
src/IncrementalInference.jl 87.50% <ø> (ø)
src/ManifoldsExtentions.jl 49.09% <0.00%> (-32.31%) ⬇️

... and 45 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mateuszbaran
Copy link

Hi!

Thank you, things are going mostly OK. When I was working on finishing ManifoldDiff.jl AD rules it occurred to me that my code is quite far from being polished and well tested. There is a lot of edge cases that are quite difficult to work out. I haven't published some parts yet because I have too few good applications that would show which parts are actually important to polish.

I was wondering if this quick example PR (see files changed tab) was in-line with how you'd recommend we implement?

Best, Dehann

PS, We'd like to build towards Riemannian optimization even though the current IncrementalInference implementation still has a few remaining AbtractGroupManifold requirements.

PSS, Looks like this basic example has very heavy recompilation load. Will be looking to get down to 0% compile time as most important, then move towards 0 heap allocations from there.

Here are some of conclusions from the work I've done:

  1. We haven't put much effort into limiting recompilation in Manifolds.jl yet. I've checked ManifoldsBase.jl and it looks quite clean regarding invalidation/recompilation issues. Manifolds.jl/Manopt.jl recompilation is something I'd like to tackle at some point so I can help with that.
  2. As far as I'm aware, all AD libraries are fairly (re-)compilation-heavy. ForwardDiff.jl adds type-level tags everywhere, Zygote.jl does lots of runtime stuff, ReverseDiff.jl is often quite slow (runtime) in my experience, Enzyme's rule system is still somewhat experimental... There are surely things that can be improved but 0% compilation time with AD doesn't look particularly realistic to me, unless you want that only for specific functions, which would turn it into AOT gradient compilation.
  3. 0 heap allocations with AD is very hard on nontrivial code. In reverse mode IIRC only Enzyme can do that and in forward mode it requires fairly advanced tricks. Have you considered writing gradients, Jacobians and Hessians by hand instead? Note that this also opens up additional optimisation opportunities. For example currently Julia's AD ecosystem doesn't do any inter-procedural (non-inlined) optimisations AFAICT (as opposed to libraries like PyTorch or Tensorflow). This is a complicated topic. I'm open to an online chat, which would make this discussion easier 🙂 .
  4. I have recently spent a decent amount of time comparing Optim.jl's and Manopt.jl's Riemannian optimisation performance on some problems from multivariate data analysis (things like ICA or Varimax). For most (but not all) objective functions Optim.jl converged faster than Manopt.jl. I have a small code that can turn Manifolds.jl projections into Optim.jl retract and project_tangent so Optim.jl can work with any of our manifolds. It's reasonably easy to switch between Optim.jl and Manopt.jl but setting things up for switching isn't particularly obvious. I think the setup I have is better than using Optimization.jl (it doesn't even have an interface for manifold constraints yet).

I hope this helps. Feel free to ask further questions, there is more that I can share but I don't know what would be interesting for you 🙂 .

@dehann
Copy link
Member Author

dehann commented Mar 16, 2023

Oh great thanks for the information. Johan and I spoke on a call earlier to work out the best steps to follow in the short term. We would also like to take you up on the offer to call and chat (thanks!) but we want to prepare and do some leg work before we do, so we don't waste your time. We'll work some examples and reach out to you when ready.

Two immediate steps for us (scope of 2-3 weeks of work) is to get the IncrementalInference.jl computations onto StaticArrays through all the hot-loops, or as far possible. The next is a stop gap solution to gradients which will be limited to a few GroupManifolds only. We would also need to do (what we call) "partial constraints" -- e.g. a camera sees a feature as i,j pixel coordinate... hence dim2 measurement inside a dim6 SpecialEuclidean(3) mechanization.

I have a small code that can turn Manifolds.jl projections into Optim.jl retract and project_tangent so Optim.jl can work with any of our manifolds. It's reasonably easy to switch between Optim.jl and Manopt.jl but setting things up for switching isn't particularly obvious.

Would you perhaps be open to sharing that code snippet? Ultimately we would like to use Manopt with full support for Riemannian operations. We currently use Optim and are considering whether it's worth it to use Optim.manifold as a stop gap while we transition.

There are surely things that can be improved but 0% compilation time with AD doesn't look particularly realistic to me, unless you want that only for specific functions, which would turn it into AOT gradient compilation.

Our current best path would be to only do AD on a few specific functions, which get compiled once or AOT. When we build large cost functions, we'd assemble the cost via a for loop. The non-parametric case would leverage the same few AD functions (using different kind of for loop assembly).

I haven't published some parts yet because I have too few good applications that would show which parts are actually important to polish.

Perhaps this is an area where we can help. At this stage we are likely using only a few manifolds relating to 2D and 3D mechanics with a few sensor or derivative variations.

0 heap allocations with AD is very hard on nontrivial code. In reverse mode IIRC only Enzyme can do that and in forward mode it requires fairly advanced tricks. Have you considered writing gradients, Jacobians and Hessians by hand instead?

I'd rather write a few specializations needed by Manifolds to avoid allocations or type-stability issues. One of the ideas I'm trying to avoid is building LazyLieGroup types and then defining ChainRules.jl definitions for these new "driven" types. I'd rather stay closer to Manifolds.jl native and just overwrite the three or four functions that really make AD or Manopt run fast -- from there we can upstream the specialized functions to JuliaManifolds (if they help / fit in).

@mateuszbaran
Copy link

Would you perhaps be open to sharing that code snippet? Ultimately we would like to use Manopt with full support for Riemannian operations. We currently use Optim and are considering whether it's worth it to use Optim.manifold as a stop gap while we transition.

Sure! I've made a gist here: https://gist.github.com/mateuszbaran/0354c0edfb9cdf25e084a2b915816a09 . It would be good to put it in some package but I'm not sure in which one. I think Optim.Manifold with my glue code is a perfectly fine way of doing Riemannian optimisation. Of course, it has a very limited set of ways of handling Riemannian constraints compared to Manopt.jl but for some problems they are sufficient. It's definitely worth a try if you already use Optim.

Our current best path would be to only do AD on a few specific functions, which get compiled once or AOT. When we build large cost functions, we'd assemble the cost via a for loop. The non-parametric case would leverage the same few AD functions (using different kind of for loop assembly).

I see. If you can write which functions you want to do AD on I may have some more specific thoughts on that.

Perhaps this is an area where we can help. At this stage we are likely using only a few manifolds relating to 2D and 3D mechanics with a few sensor or derivative variations.

My current examples are also limited to a few manifolds (hyperbolic space, sphere, tangent bundle) so any new example could help 🙂 . I was exploring Stiefel and Grassmann constraints but there the issue is that it is too easy to derive gradients using with https://www.matrixcalculus.org/ and hand-optimise the code beyond what AD in Julia could do. Or maybe I'm just looking at this from the wrong side and using https://www.matrixcalculus.org/ is actually too complex for some people interested in this kind of thing? Designing good interfaces is hard and usually underappreciated.

Anyway, back to the main topic, it would be great to incorporate your needs into this work and have another point of view.

I'd rather write a few specializations needed by Manifolds to avoid allocations or type-stability issues. One of the ideas I'm trying to avoid is building LazyLieGroup types and then defining ChainRules.jl definitions for these new "driven" types. I'd rather stay closer to Manifolds.jl native and just overwrite the three or four functions that really make AD or Manopt run fast -- from there we can upstream the specialized functions to JuliaManifolds (if they help / fit in).

That sounds like a good idea, I can help with that when I know which functions are needed.

@dehann dehann merged commit c855c31 into master Mar 27, 2023
@Affie Affie modified the milestones: v0.33.1, v0.34.0 Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants