Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turing MethodError with reversediff but not with forwarddiff #1673

Closed
fkrauer opened this issue Aug 2, 2021 · 6 comments
Closed

Turing MethodError with reversediff but not with forwarddiff #1673

fkrauer opened this issue Aug 2, 2021 · 6 comments

Comments

@fkrauer
Copy link

fkrauer commented Aug 2, 2021

Hi everyone

I have a small ODE model, which runs fine (albeit slow) with Turing forwarddiff AD. However, I cannot get it to run with reversediff (and neither with Zygote). It throws a MethodError (see below). Is this some type problem, maybe related to the parameters being an array? What would I need to change? TIA

PS: I also posted this in the Julia Forum.

Error:

ERROR: MethodError: convert(::Type{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, ::ReverseDiff.TrackedReal{Float64, Float64, Nothing}) is ambiguous. Candidates:
  convert(::Type{R}, t::T) where {R<:Real, T<:ReverseDiff.TrackedReal} in ReverseDiff at C:\Users\micky\.julia\packages\ReverseDiff\E4Tzn\src\tracked.jl:260
  convert(::Type{ForwardDiff.Dual{T, V, N}}, x::Number) where {T, V, N} in ForwardDiff at C:\Users\micky\.julia\packages\ForwardDiff\5gUap\src\dual.jl:380
  convert(::Type{T}, x::Number) where T<:Number in Base at number.jl:7
  convert(::Type{ForwardDiff.Dual{T, V, N}}, x) where {T, V, N} in ForwardDiff at C:\Users\micky\.julia\packages\ForwardDiff\5gUap\src\dual.jl:379
Possible fix, define
  convert(::Type{ForwardDiff.Dual{T, V, N}}, ::T) where {T, V, N, T<:ReverseDiff.TrackedReal}
Stacktrace:

setindex!(A::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, x::ReverseDiff.TrackedReal{Float64, Float64, Nothing}, i1::Int64) at .\array.jl

SEIRS2!(du::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, u::Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, p::LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}, t::ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, Float64, 1}) at h:\My Documents\SEIRS2_NUTS_toy.jl

(::ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing})(::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, ::Vararg{Any, N} where N) at C:\Users\micky.julia\packages\SciMLBase\cU5k7\src\scimlfunctions.jl

(::SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}})(du2::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, t::ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, Float64, 1}) at C:\Users\micky.julia\packages\SciMLBase\cU5k7\src\function_wrappers.jl

[TRUNCATED]

Code:

# Household stuff
cd(@__DIR__)
using Pkg 
Pkg.activate(".") 

using DifferentialEquations
using StatsPlots
using StatsBase
using Turing
using Distributions
using Random
using LabelledArrays
using Serialization
using LazyArrays
using ReverseDiff
using DiffEqSensitivity
using Zygote
using Memoization

Random.seed!(422)

# ODE model
function SEIRS2!(du,u,p,t)

    # states
    (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8]
    N1 = S1 + E1 + I1 + R1
    N2 = S2 + E2 + I2 + R2
    N = N1 + N2
  
    # params
    β = p.β
    η = p.η
    φ = p.φ
    ω = 1.0/p.ω
    μ = p.μ
    σ = p.σ
    γ1 = p.γ1
    γ2 = γ1 / p.g2
    δ2 = p.δ2
      
    # FOI
    βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0))
    λ1 = βeff*(I1/N1 + I2/N2)
    λ2 = λ1 * δ2

    # change in states
    du[1] = μ*N - λ1*S1 - μ*S1
    du[2] = λ1*S1 - σ*E1 - μ*E1
    du[3] = σ*E1 - γ1*I1 - μ*I1
    du[4] = γ1*I1 - ω*R1 - μ*R1

    du[5] = ω*(R1 + R2) - λ2*S2 - μ*S2
    du[6] = λ2*S2 - σ*E2 - μ*E2
    du[7] = σ*E2 - γ2*I2 - μ*I2
    du[8] = γ2*I2 - ω*R2 - μ*R2

    du[9] = (σ*(E1 + E2)) # cumulative incidence

end

# observation model
function NegativeBinomial2(ψ, incidence)
    p = 1.0/(1.0 + ψ*incidence)
    r = 1.0/ψ
    return NegativeBinomial(r, p)
end

# Solver settings
tmin = 0.0
tmax = 20.0*365.0
tspan = (tmin, tmax)
solvsettings = (abstol = 1.0e-3, 
                reltol = 1.0e-3, 
                saveat = 7.0,
                solver = AutoTsit5(Rosenbrock23()))

# Initiate ODE problem
theta_fix =  [1.0/4.98, 1.0/(80*365), 0.89, 1/6.16, 0.87]
theta_est =  [0.15, 0.001, 0.28, 0.07, 365.0, 180.0]
parnames = (:ψ, :ρ, :β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)
p = @LArray [theta_est; theta_fix] parnames
u0 = [200_000.0,1000.0,1000.0,300_000.0, 500_000.0, 1000.0,1000.0, 296_000, 2000.0]

# Initiate ODE problem
problem = ODEProblem(SEIRS2!,u0,tspan,p)

sol = solve(problem, 
            solvsettings.solver, 
            abstol=solvsettings.abstol, 
            reltol=solvsettings.reltol, 
            isoutofdomain=(u,p,t)->any(x->x<0.0,u),
            save_idxs=9, 
            saveat=solvsettings.saveat)

# Fake some data from model
foo = (sol[2:end] - sol[1:(end-1)]) .* p.ρ
data = rand.(NegativeBinomial2.(p.ψ, foo))
plot(foo, legend = false); scatter!(data,legend = false)

# Fit model to fake data

# Set up as Turing model
#Turing.setadbackend(:forwarddiff)
Turing.setadbackend(:reversediff)
Turing.setrdcache(true)

@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) 
    
    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = concrete_solve(problem_new, 
                    solvsettings.solver, 
                    abstol=solvsettings.abstol, 
                    reltol=solvsettings.reltol, 
                    isoutofdomain=(u,p,t)->any(x->x<0.0,u), 
                    save_idxs=9,
                    saveat=solvsettings.saveat)

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    incidence = max.(0.0, incidence) # avoid numerical instability issue
    increp = incidence .* ρ
    if size(increp,1)==size(data,1) 
        data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, increp)))
    else 
        Turing.@addlogprob! -Inf
        return
    end 

end

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings)

@time trace = mapreduce(c -> sample(model, NUTS(5000, 0.65), 1000, progress=true), chainscat, 1:3)   

@torfjelde
Copy link
Member

I haven't read through the thread properly, but might be related to SciML/DifferentialEquations.jl#610 ? I.e. try not using concrete_solve?

And other than that, when using reverse-mode AD, e.g. ReverseDiff, you should probably tell DifferentialEquations.jl to use the same for computing the sensitivity ("gradient through the ODE"): https://diffeq.sciml.ai/stable/analysis/sensitivity/#High-Level-Interface:-sensealg. Though I would think it solve would figure that out automatically, so it might just be the usage of concrete_solve that causes the issue 😕

@fkrauer
Copy link
Author

fkrauer commented Aug 2, 2021

Tusen takk @torfjelde, I have changed concrete_solve to solve, but that didn't solve the issue. I think the mutation of objects within the Turing model is not allowed for backwards AD. So I changed a few things and removed the if statements, and now it seems to work (see below). However, it is extremely slow, as is forwarddiff AD. Running 100 iters (which is far from convergence) can take 30 minutes or so. This strikes me as a bit odd given that Julia is so fast.

Has anyone ever successfully fitted an ODE model larger than the Lotka-Volterra (say 10 compartments) with Turing? I'd be very interested to see this code and learn from it.

...
theta = [ψ, ρ, β,η,ω,φ]
...
incidence2 = max.(0.0, incidence)
increp = incidence2 .* theta[2]
data ~ arraydist(NegativeBinomial2.(theta[1], increp))

@fkrauer fkrauer closed this as completed Aug 2, 2021
@torfjelde
Copy link
Member

Tusen takk

Haha, that caught me off guard! Null problem:)

I think the mutation of objects within the Turing model is not allowed for backwards AD.

This is indeed the case for Zygote.jl, but ReverseDiff.jl should be able to handle it. But even if there is mutation within your ODE, I'm pretty certain that the adjoints/gradients defined for solve will handle this, e.g. https://turing.ml/dev/tutorials/10-bayesian-differential-equations/ involves mutation.

I recently helped someone with speeding up DiffEq + Turing, so I'll have a crack at this and get back to you. I'm curious myself:)

@torfjelde
Copy link
Member

Okay, so after trying different optimizations I think I realized why it's taking you up-to 30 mins to only get 100 samples: you're actually running 5000 + 100 iterations, haha :sweatsmiley: It took me way longer than I'd like to admit to realize this. NUTS(5000, ...) means that we'll use 5000 iterations for adaptation/burnin and then 100 samples. I was way to focused on the model itself and didn't think about this.

Anyways, the following is using the model from Using Tsit5() with the out-of-domain handling added back in as it seems to be the version that provide the most perf improvement without too much hassle, e.g. we don't have to reach for static arrays (though in fairness the static arrays did seem to help a bit).

As you can see, 1000 samples + 500 adaptation = 1500 iterations in total takes about 100s, which ain't too shabby if I might say so myself. Notice that even the ESS and R-values are looking pretty decent even with just 1500 iterations.

@model function turingmodel(data, theta_fix, u0, problem, solvsettings)

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) (, , , , , , :δ2, :γ1, :g2)

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = Array(solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        isoutofdomain=(u,p,t)->any(<(0),u),
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    ))

    # Early return if we terminated early due to out-of-domain.
    if length(sol_new) - 1 != length(data)
        Turing.@addlogprob! -Inf
        return nothing
    end

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ))
end

model = turingmodel(data, theta_fix, u0, problem, solvsettings);

# Execute once to ensure that it's working correctly.
results = model();
model = turingmodel(data, theta_fix, u0, problem, parnames, merge(solvsettings, (solver = Tsit5(), )));
chain = sample(model, NUTS(), 1_000);
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
┌ Info: Found initial step size
│   ϵ = 0.0044342041015625
└ @ Turing.Inference /home/tor/.julia/packages/Turing/YGtAo/src/inference/hmc.jl:188
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
Sampling: 100%|█████████████████████████████████████████| Time: 0:01:15
chain
Chains MCMC chain (1000×18×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 103.05 seconds
Compute duration  = 103.05 seconds
parameters        = φ, ρ, ω, β, ψ, η
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ⋯
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64   ⋯

           ψ     0.1517    0.0103     0.0003    0.0003   1074.7326    0.9994   ⋯
           ρ     0.0010    0.0000     0.0000    0.0000    674.1901    0.9990   ⋯
           β     0.2890    0.0046     0.0001    0.0002    494.4507    1.0062   ⋯
           η     0.0705    0.0035     0.0001    0.0002    576.2473    1.0093   ⋯
           ω   379.0073    7.7127     0.2439    0.3320    537.2055    0.9995   ⋯
           φ   178.7325    2.7117     0.0858    0.1418    475.8661    1.0055   ⋯
                                                                1 column omitted

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5% 
      Symbol    Float64    Float64    Float64    Float64    Float64 

           ψ     0.1319     0.1447     0.1516     0.1584     0.1733
           ρ     0.0010     0.0010     0.0010     0.0010     0.0010
           β     0.2802     0.2857     0.2888     0.2921     0.2974
           η     0.0639     0.0680     0.0705     0.0729     0.0775
           ω   364.5389   373.7467   378.6863   384.1705   394.3270
           φ   173.4356   176.8529   178.7410   180.3889   183.9574
theta_est
6-element Vector{Float64}:
   0.15
   0.001
   0.28
   0.07
 365.0
 180.0

Recovered the true parameters nicely.

And just for reference, you can see an example run below + a bunch of versions of the model that I tried out in sequence.

Several sections showing different iterations of the model

Setup

using Pkg
Pkg.activate("/tmp/jl_RuWW8I/")
Activating environment at `/tmp/jl_RuWW8I/Project.toml`
pkgs = string.([
    :DifferentialEquations,
    :Turing,
    :Distributions,
    :Random,
    :LabelledArrays,
    :Serialization,
    :LazyArrays,
    :DiffEqSensitivity,
    :ModelingToolkit,
    :RecursiveArrayTools,
    :BenchmarkTools,
    :Zygote,
    :ReverseDiff,
    :ForwardDiff,
    :Tracker,
    :Memoization,
    :StaticArrays
])

Pkg.add(pkgs)
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
No Changes to `/tmp/jl_RuWW8I/Project.toml`
No Changes to `/tmp/jl_RuWW8I/Manifest.toml`
using DifferentialEquations
using Turing
using Distributions
using Random
using LabelledArrays
using Serialization
using LazyArrays
using DiffEqSensitivity
using ModelingToolkit
using RecursiveArrayTools
using StaticArrays

# Benchmarking related stuff.
using BenchmarkTools

# Use packages to ensure that we trigger Requires.jl.
using Zygote: Zygote
using ReverseDiff: ReverseDiff
using ForwardDiff: ForwardDiff
using Tracker: Tracker
using Memoization: Memoization # used for ReverseDiff.jl cache.

using Turing.Core: ForwardDiffAD, ReverseDiffAD, TrackerAD, ZygoteAD, CHUNKSIZE

const DEFAULT_ADBACKENDS = [
    ForwardDiffAD{40}(),    # chunksize=40
    ForwardDiffAD{100}(),   # chunksize=100
    TrackerAD(),
    ZygoteAD(),
    ReverseDiffAD{false}(), # rdcache=false
    ReverseDiffAD{true}()   # rdcache=false
]

# This is a piece of code I often use to benchmark models. It'll likely make it's
# way into Turing.jl soonish.
# https://gist.github.com/torfjelde/7794c384d82d03c36625cd25b702b8d7
"""
    make_turing_suite(model; kwargs...)

Create default benchmark suite for `model`.

# Keyword arguments
- `adbackends`: a collection of adbackends to use. Defaults to `$(DEFAULT_ADBACKENDS)`.
- `run_once=true`: if `true`, the body of each benchmark will be run once to avoid
  compilation to be included in the timings (this may occur if compilation runs
  longer than the allowed time limit).
- `save_grads=false`: if `true` and `run_once` is `true`, the gradients from the initial
  execution will be saved and returned as the second return-value. This is useful if you
  want to check correctness of the gradients for different backends.
# Notes
- A separate "parameter" instance (`DynamicPPL.VarInfo`) will be created for _each test_.
  Hence if you have a particularly large model, you might want to only pass one `adbackend`
  at the time.
"""
function make_turing_suite(
    model, vi_orig=DynamicPPL.VarInfo(model);
    adbackends=DEFAULT_ADBACKENDS,
    run_once=true,
    save_grads=false
)
    suite = BenchmarkGroup()
    suite["not_linked"] = BenchmarkGroup()
    suite["linked"] = BenchmarkGroup()

    grads = Dict(:not_linked => Dict(), :linked => Dict())

    vi_orig = DynamicPPL.VarInfo(model)
    spl = DynamicPPL.SampleFromPrior()

    for adbackend in adbackends
        vi = DynamicPPL.VarInfo(model)
        vi[spl] = deepcopy(vi_orig[spl])

        if run_once
            ℓ, ∇ℓ = Turing.Core.gradient_logp(
                adbackend,
                vi[spl],
                vi,
                model,
                spl
            )

            if save_grads
                grads[:not_linked][adbackend] = (ℓ, ∇ℓ)
            end
        end
        suite["not_linked"]["$(adbackend)"] = @benchmarkable $(Turing.Core.gradient_logp)(
            $adbackend,
            $(vi[spl]),
            $vi,
            $model,
            $spl
        )

        # Need a separate `VarInfo` for the linked version since otherwise we risk the
        # `vi` from above being mutated.
        vi_linked = deepcopy(vi)
        DynamicPPL.link!(vi_linked, spl)
        if run_once
            ℓ, ∇ℓ = Turing.Core.gradient_logp(
                adbackend,
                vi_linked[spl],
                vi_linked,
                model,
                spl
            )

            if save_grads
                grads[:linked][adbackend] = (ℓ, ∇ℓ)
            end
        end
        suite["linked"]["$(adbackend)"] = @benchmarkable $(Turing.Core.gradient_logp)(
            $adbackend,
            $(vi_linked[spl]),
            $vi_linked,
            $model,
            $spl
        )

    end

    return save_grads ? (suite, grads) : suite
end

function test_gradient(model, adbackend, vi=DynamicPPL.VarInfo(model))
    spl = DynamicPPL.SampleFromPrior()

    return Turing.Core.gradient_logp(
        adbackend,
        vi[spl],
        vi,
        model,
        spl
    )
end
test_gradient (generic function with 2 methods)
# ODE model
function SEIRS2!(du,u,p,t)

    # states
    (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8]
    N1 = S1 + E1 + I1 + R1
    N2 = S2 + E2 + I2 + R2
    N = N1 + N2

    # params
    β = p.β
    η = p.η
    φ = p.φ
    ω = 1.0/p.ω
    μ = p.μ
    σ = p.σ
    γ1 = p.γ1
    γ2 = γ1 / p.g2
    δ2 = p.δ2

    # FOI
    βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0))
    λ1 = βeff*(I1/N1 + I2/N2)
    λ2 = λ1 * δ2

    # change in states
    du[1] = μ*N - λ1*S1 - μ*S1
    du[2] = λ1*S1 - σ*E1 - μ*E1
    du[3] = σ*E1 - γ1*I1 - μ*I1
    du[4] = γ1*I1 - ω*R1 - μ*R1

    du[5] = ω*(R1 + R2) - λ2*S2 - μ*S2
    du[6] = λ2*S2 - σ*E2 - μ*E2
    du[7] = σ*E2 - γ2*I2 - μ*I2
    du[8] = γ2*I2 - ω*R2 - μ*R2

    du[9] =*(E1 + E2)) # cumulative incidence

end

# observation model
function NegativeBinomial2(ψ, incidence; check_args=true)
    p = 1.0/(1.0 + ψ*incidence)
    r = 1.0/ψ
    return NegativeBinomial(r, p; check_args=check_args)
end

# Solver settings
tmin = 0.0; tmax = 20.0*365.0; tspan = (tmin, tmax)

solvsettings = (
    abstol = 1.0e-3, 
    reltol = 1.0e-3, 
    saveat = 7.0,
    solver = AutoTsit5(Rosenbrock23()),
    maxiters = 1e6
)

# Initiate ODE problem
theta_fix =  [1.0/4.98, 1.0/(80*365), 0.89, 1/6.16, 0.87]
theta_est =  [0.15, 0.001, 0.28, 0.07, 365.0, 180.0]
parnames = (, , , , , , , , :δ2, :γ1, :g2)
p = @LArray [theta_est; theta_fix] parnames
u0 = [200_000.0,1000.0,1000.0,300_000.0, 500_000.0, 1000.0,1000.0, 296_000, 2000.0]

# Initiate ODE problem
problem = ODEProblem(SEIRS2!,u0,tspan,p)

# Solve
sol = solve(
    problem, 
    solvsettings.solver, 
    abstol=solvsettings.abstol, 
    reltol=solvsettings.reltol, 
    isoutofdomain=(u,p,t)->any(x->x<0.0,u),
    save_idxs=9, 
    saveat=solvsettings.saveat
);

foo = (sol[2:end] - sol[1:(end-1)]) .* p.ρ;
data = rand.(NegativeBinomial2.(p.ψ, foo));

Initial run

Same model as in the original issue (with the exception of the if-statement at the end).

@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) 

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        isoutofdomain=(u,p,t)->any(<(0),u),
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    )

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(0.0, incidence)
    data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ)))
end

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings);

# Execute once to ensure that it's working correctly.
results = model();

Since the execution is non-deterministic, we need to use the same set of parameters in the benchmarking of the different models.

var_info = DynamicPPL.VarInfo(model);
test_gradient(model, ForwardDiffAD{40}(), var_info)
-51513.54325993552 (70168.75780995484 12976.945884848457 33573.366670560834 -131047.33181480742 -40.79691974039409 75.67135749544357)
# ReverseDiff.jl without tape-compilation (i.e. `rdcache=false`).
test_gradient(model, ReverseDiffAD{false}(), var_info)
-51520.918599805795 (70170.19934497547 12977.209992820593 33633.23016064762 -131178.3723725407 -40.82876406906327 76.09277036009247)
test_gradient(model, ZygoteAD(), var_info)
MethodError: no method matching LazyArray(::Vector{NegativeBinomial{Float64}})
Closest candidates are:
  (::Type{S})(::Complex{Num}) where S<:Union{AbstractFloat, Integer, Complex{var"#s19"} where var"#s19"<:AbstractFloat, Complex{var"#s18"} where var"#s18"<:Integer, AbstractArray} at /home/tor/.julia/packages/Symbolics/H8dtg/src/Symbolics.jl:43
  (::Type{S})(::Num) where S<:Union{AbstractFloat, Integer, Complex{var"#s19"} where var"#s19"<:AbstractFloat, Complex{var"#s18"} where var"#s18"<:Integer, AbstractArray} at /home/tor/.julia/packages/Symbolics/H8dtg/src/Symbolics.jl:43
  LazyArray(::Base.Broadcast.Broadcasted) at /home/tor/.julia/packages/LazyArrays/zliW5/src/lazybroadcasting.jl:35
  ...

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:0 [inlined]
  [2] _pullback(ctx::Zygote.Context, f::Type{LazyArray}, args::Vector{NegativeBinomial{Float64}})
    @ Zygote ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:9
  [3] _pullback
    @ ./In[6]:31 [inlined]
....

All right, seems like Zygote isn't too happy about the usage of LazyArrays.jl. This could maybe be addressed by defining an adjoint for `LazyArray`, but let's leave that for now.

Even though ReverseDiff.jl works we're not going to bother benchmarking it since we can't use the compiled tape due to the conditional statements in the model, i.e. we're only going to benchmark ForwardDiff.

suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(3.467 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(3.491 ms)

The linked part is what we really care about when sampling using an AD-based sampler, e.g. NUTS, as it refers to the variables in the transformed/unconstrained space.

minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             3.467 ms
  gctime:           0.000 ns (0.00%)
  memory:           5.65 MiB
  allocs:           19833

Fix type-instability in max

The current line

incidence = max.(0.0, incidence)

can lead to type-instability, since if incidence < 0, then we return 0.0 which is a Float64, and otherwise we get eltype(indicidence) which can be ForwardDiff.Dual, ReverseDiff.TrackedReal, etc. We can eliminate this instability by simply doing

incidence = max.(zero(eltype(incidence)), incidence)

instead.

@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) 

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        isoutofdomain=(u,p,t)->any(<(0),u),
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    )

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ)))
end

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings);

# Execute once to ensure that it's working correctly.
results = model();
suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.481 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.484 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             2.481 ms
  gctime:           0.000 ns (0.00%)
  memory:           5.86 MiB
  allocs:           10292

That's almost 1s / 30% improvement! That's pretty decent. Notice in particular how many allocations that are now gone because the Julia compiler can properly infer the type used in the broadcasted expression arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ))).

Embed parnames in model

The line

p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

leads to the type of p_new not being inferrable, e.g.

g(theta, parnames) = @LArray theta parnames[3:end]
@code_warntype g(vcat(theta_est, theta_fix), parnames)
Variables
  #self#::Core.Const(g)
  theta::Vector{Float64}
  parnames::NTuple{11, Symbol}

Body::LArray{Float64, 1, Vector{Float64}, _A} where _A
1 ─ %1 = Base.lastindex(parnames)::Core.Const(11)
│   %2 = (3:%1)::Core.Const(3:11)
│   %3 = Base.getindex(parnames, %2)::NTuple{9, Symbol}
│   %4 = Core.apply_type(LabelledArrays.LArray, %3)::Type{LArray{_A, N, D, Syms} where {N, D<:AbstractArray{_A, N}, Syms}} where _A
│   %5 = (%4)(theta)::LArray{Float64, 1, Vector{Float64}, _A} where _A
└──      return %5

Notice the LArray{Float64, 1, Vector{Float64}, _A} where A_. In all likelihood this doesn't matter too much because the only place where we use this array is when it's being passed to remake + the eltype of the array is correctly inferred. The only downside of it not being inferred is that things like property-access, e.g. p_new.a within the body of the model, would have to be checked at run-time rather than at compile-time. But here Julia will then perform a single dynamic lookup to figure out which remake to call and once that's done, there's no real cost to the type of p_new not being inferrable. But let's just remove it and see if it makes a difference.

@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) 

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    # NOTE: This shoud be done in a nicer way, e.g. define a method which
    # takes in `problem.p` (an example `LArray` with the correct lables)
    # and the new values, and returns a `LArray` wit the already known labels from `problem.p`.
    p_new = @LArray vcat(theta_est, theta_fix) (, , , , , , :δ2, :γ1, :g2)

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        isoutofdomain=(u,p,t)->any(<(0),u),
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    )

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ)))
end

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings);

# Execute once to ensure that it's working correctly.
results = model();
suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.699 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.646 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             2.699 ms
  gctime:           0.000 ns (0.00%)
  memory:           6.24 MiB
  allocs:           11015

Nah, it didn't, as expected (it actually made it a tiny bit slower with more allocations; a bit uncertain why that is). But it's a point worth noting because if we instead were using this p_new throughout the model, e.g. accessing fields, it would be nice to ensure that accessing its properties was compiled away. Anyhew, we move on.

Dropping usage of LazyArrays.jl

I'm curious whether LazyArrays.jl actually helps or not here, e.g. maybe it's better to materialize the broadcasted expression immediately?

arraydist won't be too happy with VectorOfArrays that solve returns so we have to wrap solve in a call to Array.

And since LazyArrays.jl was causing issues for Zygote.jl earlier, we can check if it works now that we're no longer using LazyArrays.jl. I'll add a sensealg keyword argument to the model to allow us to specify that we want to use InterpolatingAdjoint(autojacvecZygoteVJP)= to obtain the sensitivities (i.e. "gradient" of solve wrt. parameters).

@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings; sensealg=InterpolatingAdjoint())

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = Array(solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        isoutofdomain=(u,p,t)->any(<(0),u),
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters,
        sensealg=sensealg
    ))

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ))
end

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings);

# Execute once to ensure that it's working correctly.
results = model();
suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.699 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.611 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             2.699 ms
  gctime:           0.000 ns (0.00%)
  memory:           6.30 MiB
  allocs:           10941

Not a big difference tbh, but maybe a tiny improvement. Let's try Zygote:

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=ZygoteVJP()));
test_gradient(model, ZygoteAD(), var_info)
DimensionMismatch("Inconsistent array dimensions.")

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:0 [inlined]
  [2] _pullback(ctx::Zygote.Context, f::typeof(throw), args::DimensionMismatch)
    @ Zygote ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:9
  [3] _pullback
    @ ~/.julia/packages/Distributions/fXTVC/src/multivariates.jl:199 [inlined]
  [4] _pullback
    @ ~/.julia/packages/Distributions/fXTVC/src/multivariates.jl:264 [inlined]
  [5] _pullback
    @ ~/.julia/packages/DynamicPPL/F7F1M/src/context_implementations.jl:269 [inlined]
  [6] _pullback(::Zygote.Context, ::typeof(DynamicPPL.observe), ::Product{Discrete, NegativeBinomial{Float64}, Vector{NegativeBinomial{Float64}}}, ::VectorOfArray{Int64, 1, Vector{Int64}}, ::DynamicPPL.TypedVarInfo{NamedTuple{(:ψ, :ρ, :β, :η, :ω, :φ), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:ψ, Tuple{}}, Int64}, Vector{Beta{Float64}}, Vector{AbstractPPL.VarName{:ψ, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:ρ, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:ρ, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:β, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:η, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:η, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:ω, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:ω, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:φ, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:φ, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64})
    @ Zygote ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:0
...

Now this is a weird error! We're using the same var_info for every call so we should not have any issues with early termination in solve or anything of the sort. I'll have to do some more digging to answer that properly.

But before we check that, let's see if what sort of performance we get when using reverse-mode for the sensitivity calculation but ForwardDiff.jl for the rest of the gradient computation:

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=ForwardDiffSensitivity());

suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.322 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.317 ms)
model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=ReverseDiffVJP(true)));

suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.378 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.365 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             2.378 ms
  gctime:           0.000 ns (0.00%)
  memory:           5.82 MiB
  allocs:           10011
model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=ZygoteVJP()));

suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.519 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.544 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             2.519 ms
  gctime:           0.000 ns (0.00%)
  memory:           6.10 MiB
  allocs:           10557
using Enzyme
model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=EnzymeVJP()));

suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.340 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(2.315 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             2.340 ms
  gctime:           0.000 ns (0.00%)
  memory:           5.73 MiB
  allocs:           9843

So it's not making things much faster than just using ForwardDiff everywhere, but IIUC this could be a beneficial approach if the number of ODEs we're solving increases but the parameters does not. Then we could use the adjoint methods based on reverse-mode differentiation to scale nicely with the number of ODEs while using the speed of ForwardDiff.jl for the rest of the model.

Using Tsit5()

I'm not too familiar with the idea of "stiff ODEs", but I'm guessing that if the problem works nicely without a stiff solver then we probably don't need a stiff solver? So let's try Tsit5:

model = turingmodel(data, theta_fix, u0, problem, parnames, merge(solvsettings, (solver = Tsit5(), )));
suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(1.763 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(1.782 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             1.763 ms
  gctime:           0.000 ns (0.00%)
  memory:           3.03 MiB
  allocs:           5044

Aye, much faster indeed!

ModelingToolkit.jl

We're not done yet though. What about computing the jacobian symbolically?

@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings)

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = Array(solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    ))

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ))
end

# Symbolic problem
de = modelingtoolkitize(problem)
jac = eval(ModelingToolkit.generate_jacobian(de)[2])
f = ODEFunction(SEIRS2!, jac=jac)
problem_sym = ODEProblem(f,u0,tspan,p)

# Model with symbolic problem.
model = turingmodel(data, theta_fix, u0, problem_sym, parnames, merge(solvsettings, (solver = Tsit5(), )));

# Execute once to ensure that it's working correctly.
results = model();
suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(1.975 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(1.971 ms)

Seems like this is such a simple ODE that it didn't help much at all 🤷

Immutable version

Curious about an immutable version:

function SEIRS2(u,p,t)

    # states
    (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8]
    N1 = S1 + E1 + I1 + R1
    N2 = S2 + E2 + I2 + R2
    N = N1 + N2

    # params
    β = p.β
    η = p.η
    φ = p.φ
    ω = 1.0/p.ω
    μ = p.μ
    σ = p.σ
    γ1 = p.γ1
    γ2 = γ1 / p.g2
    δ2 = p.δ2

    # FOI
    βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0))
    λ1 = βeff*(I1/N1 + I2/N2)
    λ2 = λ1 * δ2

    # change in states
    du1 = μ*N - λ1*S1 - μ*S1
    du2 = λ1*S1 - σ*E1 - μ*E1
    du3 = σ*E1 - γ1*I1 - μ*I1
    du4 = γ1*I1 - ω*R1 - μ*R1

    du5 = ω*(R1 + R2) - λ2*S2 - μ*S2
    du6 = λ2*S2 - σ*E2 - μ*E2
    du7 = σ*E2 - γ2*I2 - μ*I2
    du8 = γ2*I2 - ω*R2 - μ*R2

    du9 =*(E1 + E2)) # cumulative incidence

    return [du1, du2, du3, du4, du5, du6, du7, du8, du9]
end

# Immutable problem
problem_immutable = ODEProblem{false}(SEIRS2,u0,tspan,p)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: false
timespan: (0.0, 7300.0)
u0: 9-element Vector{Float64}:
 200000.0
   1000.0
   1000.0
 300000.0
 500000.0
   1000.0
   1000.0
 296000.0
   2000.0
@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings)

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = Array(solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    ))

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ, check_args=false))
end

# Model with symbolic problem.
model = turingmodel(data, theta_fix, u0, problem_immutable, parnames, merge(solvsettings, (solver = Tsit5(), )));

# Execute once to ensure that it's working correctly.
results = model();
suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(5.576 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(5.538 ms)

Much slower, though not unexpected.

Static problem

With an immutable implementation we can use static arrays to see if that squeezes out any more perf:

function SEIRS2_static(u,p,t)

    # states
    (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8]
    N1 = S1 + E1 + I1 + R1
    N2 = S2 + E2 + I2 + R2
    N = N1 + N2

    # params
    β = p.β
    η = p.η
    φ = p.φ
    ω = 1.0/p.ω
    μ = p.μ
    σ = p.σ
    γ1 = p.γ1
    γ2 = γ1 / p.g2
    δ2 = p.δ2

    # FOI
    βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0))
    λ1 = βeff*(I1/N1 + I2/N2)
    λ2 = λ1 * δ2

    # change in states
    du1 = μ*N - λ1*S1 - μ*S1
    du2 = λ1*S1 - σ*E1 - μ*E1
    du3 = σ*E1 - γ1*I1 - μ*I1
    du4 = γ1*I1 - ω*R1 - μ*R1

    du5 = ω*(R1 + R2) - λ2*S2 - μ*S2
    du6 = λ2*S2 - σ*E2 - μ*E2
    du7 = σ*E2 - γ2*I2 - μ*I2
    du8 = γ2*I2 - ω*R2 - μ*R2

    du9 =*(E1 + E2)) # cumulative incidence

    return @SVector [du1, du2, du3, du4, du5, du6, du7, du8, du9]
end

# Immutable problem
p_static = (@SLVector parnames)([theta_est; theta_fix])
u0_static = SVector(u0...)
problem_static = ODEProblem(SEIRS2_static,u0_static,tspan,p_static)

sol = solve(
    problem_static, 
    solvsettings.solver, 
    abstol=solvsettings.abstol, 
    reltol=solvsettings.reltol, 
    isoutofdomain=(u,p,t)->any(x->x<0.0,u),
    save_idxs=9, 
    saveat=solvsettings.saveat
);
@model function turingmodel_static(data, theta_fix, u0, problem, parnames, solvsettings)

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = SVector(β,η,ω,φ)
    p_new = (@SLVector (, , , , , , :δ2, :γ1, :g2))(vcat(theta_est, theta_fix))

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = Array(solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    ))

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ, check_args=false))
end

# Model with symbolic problem.
model = turingmodel_static(data, SVector(theta_fix...), u0_static, problem_static, parnames, merge(solvsettings, (solver = Tsit5(), )));

# Execute once to ensure that it's working correctly.
results = model();
suite = make_turing_suite(
    model, var_info,
    adbackends=(ForwardDiffAD{40}(), )
);
benchmarks = run(suite, seconds=10);
benchmarks
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(1.745 ms)
  "not_linked" => 1-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "ForwardDiffAD{40}()" => Trial(1.764 ms)
minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
BenchmarkTools.TrialEstimate: 
  time:             1.745 ms
  gctime:           0.000 ns (0.00%)
  memory:           3.12 MiB
  allocs:           5007

Decent, but doesn't seem to be strictly better than non-static.

@fkrauer
Copy link
Author

fkrauer commented Aug 3, 2021

This is absolutely amazing, thanks so much @torfjelde, you saved my research project. This was a toy model, my actual model is a bit larger (17 compartments), but the code was basically the same. The time saved by changing the outofdomain statement plus type stability for the incidence is incredible.

PS: The 5000 burnin was a typo 🤪 sorry.

@torfjelde
Copy link
Member

torfjelde commented Aug 3, 2021

Great, really glad to hear!:)

Regarding dropping the outofdomain check:

  1. It's nice to include since it guarantees that you're sampling only from the space of parameters which are indeed valid for your model.
  2. Only using max does not always guarantee that you're sampling from the true posterior, but in most cases it does. It's fine to just use the max to ensure that you're not seeing any zeros if you never actually see any zeros during sampling. Essentially if the data is informative enough to stop us from choosing parameters that lead to invalid values, then we'll never reach the boundary of the space where we go from valid to invalid values (in this case, values below 0) and thus the max will never actually do anything. In this case, sampling with or without the max is equivalent and we're still sampling from the true posterior. But max can be useful to avoid issues in the adaption/burnin phase where we indeed can end up in bad regions.

So just keep that in mind 👍 What you could do is add the following line to the end of your model

return sol_new

and then after you're done sampling, just to check that nothing strange occurred, you can run generated_quantities(model, chain) to all the solutions used to generate chain. Then you can just check that none of the solutions that occured in chain were invalid.

But if you do end up dropping the outofdomain check, then you can actually use ReverseDiff with rdcache(true) + InterpolatingAdjoint(autovecjac=ReverseDiffVJP()) so that might be useful once you go to higher dims (though I think 17 compartments is still something ForwardDiff can handle easily).

PS: The 5000 burnin was a typo 🤪 sorry.

Haha, no worries at all. I should have thought of that immediately. I was really confused as to why it was taking so long on your end, and I had to actually run a complete sample and noticed the iterations in the chain said 5001:5500, haha. Oh well, I don't think either of us will make that mistake again :sweaty_smile:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants