-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turing MethodError with reversediff but not with forwarddiff #1673
Comments
I haven't read through the thread properly, but might be related to SciML/DifferentialEquations.jl#610 ? I.e. try not using And other than that, when using reverse-mode AD, e.g. |
Tusen takk @torfjelde, I have changed Has anyone ever successfully fitted an ODE model larger than the Lotka-Volterra (say 10 compartments) with Turing? I'd be very interested to see this code and learn from it.
|
Haha, that caught me off guard! Null problem:)
This is indeed the case for Zygote.jl, but ReverseDiff.jl should be able to handle it. But even if there is mutation within your ODE, I'm pretty certain that the adjoints/gradients defined for I recently helped someone with speeding up DiffEq + Turing, so I'll have a crack at this and get back to you. I'm curious myself:) |
Okay, so after trying different optimizations I think I realized why it's taking you up-to 30 mins to only get 100 samples: you're actually running 5000 + 100 iterations, haha :sweatsmiley: It took me way longer than I'd like to admit to realize this. Anyways, the following is using the model from Using As you can see, 1000 samples + 500 adaptation = 1500 iterations in total takes about 100s, which ain't too shabby if I might say so myself. Notice that even the ESS and R-values are looking pretty decent even with just 1500 iterations. @model function turingmodel(data, theta_fix, u0, problem, solvsettings)
# Priors
ψ ~ Beta(1,5)
ρ ~ Uniform(0.0,1.0)
β ~ Uniform(0.0,1.0)
η ~ Uniform(0.0,1.0)
ω ~ Uniform(1.0, 3.0*365.0)
φ ~ Uniform(0.0,364.0)
theta_est = [β,η,ω,φ]
p_new = @LArray vcat(theta_est, theta_fix) (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)
# Update problem and solve ODEs
problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0))
sol_new = Array(solve(
problem_new,
solvsettings.solver,
abstol=solvsettings.abstol,
reltol=solvsettings.reltol,
isoutofdomain=(u,p,t)->any(<(0),u),
save_idxs=9,
saveat=solvsettings.saveat,
maxiters=solvsettings.maxiters
))
# Early return if we terminated early due to out-of-domain.
if length(sol_new) - 1 != length(data)
Turing.@addlogprob! -Inf
return nothing
end
incidence = sol_new[2:end] - sol_new[1:(end-1)]
# avoid numerical instability issue
incidence = max.(zero(eltype(incidence)), incidence)
data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ))
end
model = turingmodel(data, theta_fix, u0, problem, solvsettings);
# Execute once to ensure that it's working correctly.
results = model(); model = turingmodel(data, theta_fix, u0, problem, parnames, merge(solvsettings, (solver = Tsit5(), )));
chain = sample(model, NUTS(), 1_000);
chain
theta_est
Recovered the true parameters nicely. And just for reference, you can see an example run below + a bunch of versions of the model that I tried out in sequence. Several sections showing different iterations of the model
Setupusing Pkg
Pkg.activate("/tmp/jl_RuWW8I/")
pkgs = string.([
:DifferentialEquations,
:Turing,
:Distributions,
:Random,
:LabelledArrays,
:Serialization,
:LazyArrays,
:DiffEqSensitivity,
:ModelingToolkit,
:RecursiveArrayTools,
:BenchmarkTools,
:Zygote,
:ReverseDiff,
:ForwardDiff,
:Tracker,
:Memoization,
:StaticArrays
])
Pkg.add(pkgs)
using DifferentialEquations
using Turing
using Distributions
using Random
using LabelledArrays
using Serialization
using LazyArrays
using DiffEqSensitivity
using ModelingToolkit
using RecursiveArrayTools
using StaticArrays
# Benchmarking related stuff.
using BenchmarkTools
# Use packages to ensure that we trigger Requires.jl.
using Zygote: Zygote
using ReverseDiff: ReverseDiff
using ForwardDiff: ForwardDiff
using Tracker: Tracker
using Memoization: Memoization # used for ReverseDiff.jl cache.
using Turing.Core: ForwardDiffAD, ReverseDiffAD, TrackerAD, ZygoteAD, CHUNKSIZE
const DEFAULT_ADBACKENDS = [
ForwardDiffAD{40}(), # chunksize=40
ForwardDiffAD{100}(), # chunksize=100
TrackerAD(),
ZygoteAD(),
ReverseDiffAD{false}(), # rdcache=false
ReverseDiffAD{true}() # rdcache=false
]
# This is a piece of code I often use to benchmark models. It'll likely make it's
# way into Turing.jl soonish.
# https://gist.github.com/torfjelde/7794c384d82d03c36625cd25b702b8d7
"""
make_turing_suite(model; kwargs...)
Create default benchmark suite for `model`.
# Keyword arguments
- `adbackends`: a collection of adbackends to use. Defaults to `$(DEFAULT_ADBACKENDS)`.
- `run_once=true`: if `true`, the body of each benchmark will be run once to avoid
compilation to be included in the timings (this may occur if compilation runs
longer than the allowed time limit).
- `save_grads=false`: if `true` and `run_once` is `true`, the gradients from the initial
execution will be saved and returned as the second return-value. This is useful if you
want to check correctness of the gradients for different backends.
# Notes
- A separate "parameter" instance (`DynamicPPL.VarInfo`) will be created for _each test_.
Hence if you have a particularly large model, you might want to only pass one `adbackend`
at the time.
"""
function make_turing_suite(
model, vi_orig=DynamicPPL.VarInfo(model);
adbackends=DEFAULT_ADBACKENDS,
run_once=true,
save_grads=false
)
suite = BenchmarkGroup()
suite["not_linked"] = BenchmarkGroup()
suite["linked"] = BenchmarkGroup()
grads = Dict(:not_linked => Dict(), :linked => Dict())
vi_orig = DynamicPPL.VarInfo(model)
spl = DynamicPPL.SampleFromPrior()
for adbackend in adbackends
vi = DynamicPPL.VarInfo(model)
vi[spl] = deepcopy(vi_orig[spl])
if run_once
ℓ, ∇ℓ = Turing.Core.gradient_logp(
adbackend,
vi[spl],
vi,
model,
spl
)
if save_grads
grads[:not_linked][adbackend] = (ℓ, ∇ℓ)
end
end
suite["not_linked"]["$(adbackend)"] = @benchmarkable $(Turing.Core.gradient_logp)(
$adbackend,
$(vi[spl]),
$vi,
$model,
$spl
)
# Need a separate `VarInfo` for the linked version since otherwise we risk the
# `vi` from above being mutated.
vi_linked = deepcopy(vi)
DynamicPPL.link!(vi_linked, spl)
if run_once
ℓ, ∇ℓ = Turing.Core.gradient_logp(
adbackend,
vi_linked[spl],
vi_linked,
model,
spl
)
if save_grads
grads[:linked][adbackend] = (ℓ, ∇ℓ)
end
end
suite["linked"]["$(adbackend)"] = @benchmarkable $(Turing.Core.gradient_logp)(
$adbackend,
$(vi_linked[spl]),
$vi_linked,
$model,
$spl
)
end
return save_grads ? (suite, grads) : suite
end
function test_gradient(model, adbackend, vi=DynamicPPL.VarInfo(model))
spl = DynamicPPL.SampleFromPrior()
return Turing.Core.gradient_logp(
adbackend,
vi[spl],
vi,
model,
spl
)
end
# ODE model
function SEIRS2!(du,u,p,t)
# states
(S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8]
N1 = S1 + E1 + I1 + R1
N2 = S2 + E2 + I2 + R2
N = N1 + N2
# params
β = p.β
η = p.η
φ = p.φ
ω = 1.0/p.ω
μ = p.μ
σ = p.σ
γ1 = p.γ1
γ2 = γ1 / p.g2
δ2 = p.δ2
# FOI
βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0))
λ1 = βeff*(I1/N1 + I2/N2)
λ2 = λ1 * δ2
# change in states
du[1] = μ*N - λ1*S1 - μ*S1
du[2] = λ1*S1 - σ*E1 - μ*E1
du[3] = σ*E1 - γ1*I1 - μ*I1
du[4] = γ1*I1 - ω*R1 - μ*R1
du[5] = ω*(R1 + R2) - λ2*S2 - μ*S2
du[6] = λ2*S2 - σ*E2 - μ*E2
du[7] = σ*E2 - γ2*I2 - μ*I2
du[8] = γ2*I2 - ω*R2 - μ*R2
du[9] = (σ*(E1 + E2)) # cumulative incidence
end
# observation model
function NegativeBinomial2(ψ, incidence; check_args=true)
p = 1.0/(1.0 + ψ*incidence)
r = 1.0/ψ
return NegativeBinomial(r, p; check_args=check_args)
end
# Solver settings
tmin = 0.0; tmax = 20.0*365.0; tspan = (tmin, tmax)
solvsettings = (
abstol = 1.0e-3,
reltol = 1.0e-3,
saveat = 7.0,
solver = AutoTsit5(Rosenbrock23()),
maxiters = 1e6
)
# Initiate ODE problem
theta_fix = [1.0/4.98, 1.0/(80*365), 0.89, 1/6.16, 0.87]
theta_est = [0.15, 0.001, 0.28, 0.07, 365.0, 180.0]
parnames = (:ψ, :ρ, :β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)
p = @LArray [theta_est; theta_fix] parnames
u0 = [200_000.0,1000.0,1000.0,300_000.0, 500_000.0, 1000.0,1000.0, 296_000, 2000.0]
# Initiate ODE problem
problem = ODEProblem(SEIRS2!,u0,tspan,p)
# Solve
sol = solve(
problem,
solvsettings.solver,
abstol=solvsettings.abstol,
reltol=solvsettings.reltol,
isoutofdomain=(u,p,t)->any(x->x<0.0,u),
save_idxs=9,
saveat=solvsettings.saveat
);
foo = (sol[2:end] - sol[1:(end-1)]) .* p.ρ;
data = rand.(NegativeBinomial2.(p.ψ, foo)); Initial runSame model as in the original issue (with the exception of the if-statement at the end). @model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings)
# Priors
ψ ~ Beta(1,5)
ρ ~ Uniform(0.0,1.0)
β ~ Uniform(0.0,1.0)
η ~ Uniform(0.0,1.0)
ω ~ Uniform(1.0, 3.0*365.0)
φ ~ Uniform(0.0,364.0)
theta_est = [β,η,ω,φ]
p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]
# Update problem and solve ODEs
problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0))
sol_new = solve(
problem_new,
solvsettings.solver,
abstol=solvsettings.abstol,
reltol=solvsettings.reltol,
isoutofdomain=(u,p,t)->any(<(0),u),
save_idxs=9,
saveat=solvsettings.saveat,
maxiters=solvsettings.maxiters
)
incidence = sol_new[2:end] - sol_new[1:(end-1)]
# avoid numerical instability issue
incidence = max.(0.0, incidence)
data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ)))
end
model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings);
# Execute once to ensure that it's working correctly.
results = model(); Since the execution is non-deterministic, we need to use the same set of parameters in the benchmarking of the different models. var_info = DynamicPPL.VarInfo(model); test_gradient(model, ForwardDiffAD{40}(), var_info)
# ReverseDiff.jl without tape-compilation (i.e. `rdcache=false`).
test_gradient(model, ReverseDiffAD{false}(), var_info)
test_gradient(model, ZygoteAD(), var_info)
All right, seems like Zygote isn't too happy about the usage of LazyArrays.jl. This could maybe be addressed by defining an adjoint for `LazyArray`, but let's leave that for now. Even though ReverseDiff.jl works we're not going to bother benchmarking it since we can't use the compiled tape due to the conditional statements in the model, i.e. we're only going to benchmark ForwardDiff. suite = make_turing_suite(
model, var_info,
adbackends=(ForwardDiffAD{40}(), )
); benchmarks = run(suite, seconds=10);
benchmarks
The minimum(benchmarks["linked"]["ForwardDiffAD{40}()"])
Fix type-instability in
|
This is absolutely amazing, thanks so much @torfjelde, you saved my research project. This was a toy model, my actual model is a bit larger (17 compartments), but the code was basically the same. The time saved by changing the outofdomain statement plus type stability for the incidence is incredible. PS: The 5000 burnin was a typo 🤪 sorry. |
Great, really glad to hear!:) Regarding dropping the
So just keep that in mind 👍 What you could do is add the following line to the end of your model return sol_new and then after you're done sampling, just to check that nothing strange occurred, you can run But if you do end up dropping the
Haha, no worries at all. I should have thought of that immediately. I was really confused as to why it was taking so long on your end, and I had to actually run a complete sample and noticed the |
Hi everyone
I have a small ODE model, which runs fine (albeit slow) with Turing forwarddiff AD. However, I cannot get it to run with reversediff (and neither with Zygote). It throws a MethodError (see below). Is this some type problem, maybe related to the parameters being an array? What would I need to change? TIA
PS: I also posted this in the Julia Forum.
Error:
Code:
The text was updated successfully, but these errors were encountered: