Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test with Tapir #2289

Merged
merged 22 commits into from
Sep 3, 2024
Merged

Test with Tapir #2289

merged 22 commits into from
Sep 3, 2024

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented Jul 17, 2024

Closes #2247

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.86%. Comparing base (a26ce11) to head (8325529).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2289      +/-   ##
==========================================
+ Coverage   86.79%   86.86%   +0.07%     
==========================================
  Files          24       24              
  Lines        1598     1599       +1     
==========================================
+ Hits         1387     1389       +2     
+ Misses        211      210       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mhauru
Copy link
Member Author

mhauru commented Jul 17, 2024

@willtebbutt, could you please take a look at the logs of the failing CI runs and comment on whether these are likely to be Tapir issues, or something we need to address? In one case zero_tangent goes into infinite recursion, in another we expect LogDensityProblemsAD.ADGradient to return a LogDensityProblemsAD.ADGradientWrapper but we seem to get a Tapir.CoDual instead.

@coveralls
Copy link

coveralls commented Jul 17, 2024

Pull Request Test Coverage Report for Build 10679664210

Details

  • 5 of 5 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.07%) to 87.085%

Totals Coverage Status
Change from base Build 10634847007: 0.07%
Covered Lines: 1389
Relevant Lines: 1595

💛 - Coveralls

mhauru and others added 2 commits July 18, 2024 09:28
@yebai
Copy link
Member

yebai commented Jul 23, 2024

In one case zero_tangent goes into infinite recursion,

@willtebbutt can you help take a look at the zero_tangent recursion issue and fix that?

@willtebbutt
Copy link
Member

willtebbutt commented Jul 23, 2024

Assuming that we're not considering the \ell field part of the public interface, this looks like an issue in Turing.jl -- it assumes here that the \ell field of the ADGradientWrapper is what is needed, rather than calling Base.parent.

(this is in addition to the recursion issue -- taking a look at that now)

@yebai
Copy link
Member

yebai commented Jul 23, 2024

Assuming that we're not considering the \ell field part of the public interface, this looks like an issue in Turing.jl -- it assumes here

@sunxd3 can this be replaced with the new setmodel interface introduced in TuringLang/DynamicPPL.jl#626?

@torfjelde
Copy link
Member

Aye the setmodel methods should be useful there:)

@sunxd3
Copy link
Member

sunxd3 commented Jul 31, 2024

Sorry for letting this go under my radar.

From the look of it, I don't think setmodel is directly useful, but we can modify setvarinfo to mirror the implementation of setmodel.

(model and varinfo are two fields of LogDensityFunction, so it's appropriate to have a setvarinfo that matches it)

Happy to make this modification, should I directly commit to this PR @yebai? It should end up in DynamicPPL, but may not be a good idea to experiment here.

@sunxd3
Copy link
Member

sunxd3 commented Jul 31, 2024

@willtebbutt @mhauru I think the setvarinfo related errors are gone (at least from the look of it)

@mhauru
Copy link
Member Author

mhauru commented Aug 7, 2024

Thanks @sunxd3.

In addition to the infinite recursion one, there's also this:

MethodError: no method matching translate(::Val{Core.Intrinsics.atomic_pointerset})
  
  Closest candidates are:
    translate(::Val{Core.Intrinsics.copysign_float})
     @ Tapir ~/.julia/packages/Tapir/vvmqf/src/rrules/builtins.jl:55
    translate(::Val{Core.Intrinsics.slt_int})
     @ Tapir ~/.julia/packages/Tapir/vvmqf/src/rrules/builtins.jl:64
    translate(::Val{Core.Intrinsics.sdiv_int})
     @ Tapir ~/.julia/packages/Tapir/vvmqf/src/rrules/builtins.jl:64
    ...
  
  Stacktrace:
    [1] lift_intrinsic(::Core.IntrinsicFunction, ::Core.SSAValue, ::GlobalRef, ::Vararg{Any})
      @ Tapir ~/.julia/packages/Tapir/vvmqf/src/interpreter/ir_normalisation.jl:147
    [2] intrinsic_to_function
      @ ~/.julia/packages/Tapir/vvmqf/src/interpreter/ir_normalisation.jl:136 [inlined]
    [3] normalise!(ir::Core.Compiler.IRCode, spnames::Vector{Symbol})
      @ Tapir ~/.julia/packages/Tapir/vvmqf/src/interpreter/ir_normalisation.jl:27
    [4] build_rrule(interp::Tapir.TapirInterpreter{Tapir.DefaultCtx}, sig_or_mi::Core.MethodInstance; safety_on::Bool, silence_safety_messages::Bool)
      @ Tapir ~/.julia/packages/Tapir/vvmqf/src/interpreter/s2s_reverse_mode_ad.jl:769
    [5] build_rrule
      @ ~/.julia/packages/Tapir/vvmqf/src/interpreter/s2s_reverse_mode_ad.jl:747 [inlined]
    [6] (::Tapir.LazyDerivedRule{Tapir.TapirInterpreter{Tapir.DefaultCtx}, Tapir.DerivedRule{MistyClosures.MistyClosure{Core.OpaqueClosure{Tuple{Tapir.CoDual{typeof(Base._unsafe_copyto!), Tapir.NoFData}, Tapir.CoDual{Matrix{Real}, Matrix{Any}}, Tapir.CoDual{Int64, Tapir.NoFData}, Tapir.CoDual{Matrix{Float64}, Matrix{Float64}}, Tapir.CoDual{Int64, Tapir.NoFData}, Tapir.CoDual{Int64, Tapir.NoFData}}, Tapir.CoDual{Matrix{Real}, Matrix{Any}}}}, MistyClosures.MistyClosure{Core.OpaqueClosure{Tuple{Tapir.NoRData}, NTuple{6, Tapir.NoRData}}}, Val{false}, Val{6}}})(::Tapir.CoDual{typeof(Base._unsafe_copyto!), Tapir.NoFData}, ::Tapir.CoDual{Matrix{Real}, Matrix{Any}}, ::Tapir.CoDual{Int64, Tapir.NoFData}, ::Tapir.CoDual{Matrix{Float64}, Matrix{Float64}}, ::Tapir.CoDual{Int64, Tapir.NoFData}, ::Tapir.CoDual{Int64, Tapir.NoFData})
      @ Tapir ~/.julia/packages/Tapir/vvmqf/src/interpreter/s2s_reverse_mode_ad.jl:1252
    [7] RRuleZeroWrapper
      @ ~/.julia/packages/Tapir/vvmqf/src/interpreter/s2s_reverse_mode_ad.jl:244 [inlined]

which you can see in the abstractmcmc.jl CI run. I'm guessing this is a Tapir bug, but @willtebbutt if you disagree let me know.

@willtebbutt
Copy link
Member

Yes, this is definitely a Tapir.jl limitation -- we're missing a rule for Core.Intrinsics.atomic_pointerset. I need to open an issue about this, and improve some of the error messages in Tapir.jl to make it clear what's going on here.

@willtebbutt
Copy link
Member

Getting there! Many more of the tests now appear to pass. Will be debugging the remaining ones tomorrow.

@willtebbutt
Copy link
Member

willtebbutt commented Aug 30, 2024

Having spent some time digging in to the remaining time-out issue, it looks to be the same problem as we've seen before with there being a discrepancy between the type inference results obtained by NativeInterpreter and custom abstract interpreters. I'm going to go and open some proper issues about this.

The other issues look quite tractable, and I'm hoping they're fairly straightforward to fix.

@yebai
Copy link
Member

yebai commented Aug 30, 2024

Having spent some time digging in to the remaining time-out issue, it looks to be the same problem as we've seen before with there being a discrepancy between the type inference results obtained by NativeInterpreter and custom abstract interpreters.

Why a type inference discrepancy would cause such a significant slowdown? Also, feel free to disable respective tests in this PR so we can merge it—this is not really a Tapir issue.

@willtebbutt
Copy link
Member

Why a type inference discrepancy would cause such a significant slowdown?

It causes a lot of downstream type instabilities in low-level functions, which causes a 100x-1000x slowdown (very approximately).

Also, feel free to disable respective tests in this PR so we can merge it—this is not really a Tapir issue.

Good idea. I'll open proper issues in Tapir / Cthulhu and, once I've done this, disable some of these tests.

Co-authored-by: Will Tebbutt <wct23@cam.ac.uk>
@yebai
Copy link
Member

yebai commented Sep 1, 2024

@mhauru, there seem to be some merge conflicts which prevent CI from running. Can you help resolve that?

The other issues look quite tractable, and I'm hoping they're fairly straightforward to fix.

@willtebbutt I am slightly surprised about the amount of issues arising here. How much more time do you think it will take?

@mhauru
Copy link
Member Author

mhauru commented Sep 2, 2024

Merge conflicts sorted.

test/test_utils/ad_utils.jl Outdated Show resolved Hide resolved
@yebai
Copy link
Member

yebai commented Sep 2, 2024

#2313 (hopefully) fixed Tapir AD backend tests introduced by #2291

@willtebbutt
Copy link
Member

Sadly it looks like the most recent PR has produced something which Tapir.jl doesn't like. Doesn't look like it should be too nasty.

@willtebbutt
Copy link
Member

Update: the problems look to be straightforward to resolve. Version 0.2.44 of Tapir will work, and should be available in the next hour or two.

@willtebbutt
Copy link
Member

For reasons unclear to me, 0.2.44 isn't getting installed in CI yet. Probably it'll be resolved by the morning, so I'll take a look then .

@yebai yebai marked this pull request as ready for review September 2, 2024 22:53
Copy link
Member Author

@mhauru mhauru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one question about a change I didn't fully understand. Otherwise, I'm happy once tests pass and @willtebbutt is happy.

src/mcmc/abstractmcmc.jl Show resolved Hide resolved
@willtebbutt
Copy link
Member

I'm happy.

@yebai yebai merged commit f92c93f into master Sep 3, 2024
60 checks passed
@yebai yebai deleted the mhauru/tapir-tests branch September 3, 2024 11:36
@yebai
Copy link
Member

yebai commented Sep 3, 2024

Thanks @willtebbutt, @mhauru and @sunxd3. Perfect to see this merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Tapir to Turing's AD test suite
7 participants