Sean L. Wu (@slwu89) and Simon Frost (@sdwfrost), 2023-5-9
SDDP.jl (stochastic dual dynamic programming) is a package designed to solve optimal policies in multi-stage (time or world state) linear programming problems with exogeneous stochasticity. We can use it to optimize policy for a non-pharmaceutical intervention which decreses the transmission rate.
Because SDDP.jl solves an optimization problem for each node in a graph of nodes (which may represent the passage of time, or other changes
in world state), the model we solve is a discretization of following ODEs (
The minimization objective at each node (time point) is a linear combination of cumulative intervention applied, and cumulative cases. The total cumulative intervention force applied cannot exceed some maximum value. The decision variable is the intensity of the intervention at each time point (node).
using SDDP, JuMP, Ipopt, Plots;
We set the parameters, which includes the maximum intervention level at any node, υ_max
, and the cost, which is the integral of the intervention level over time, υ_total
.
β = 0.5 # infectivity rate
γ = 0.25 # recovery rate
υ_max = 0.5 # maximum intervention
υ_total = 10.0; # maximum cost
We set the time horizon to be long enough for the system to settle down to an equilibrium. We use a grid of timepoints fine enough to capture a wide variety of policy shapes, but coarse enough to keep the number of policy parameters to optimize low.
tmax = 100.0
δt = 1.0
nsteps = Int(tmax / δt);
We set the initial conditions for the proportion of susceptibles and infecteds.
u0 = [0.99, 0.01]; # S,I
We specify a model using SDDP.LinearPolicyGraph
. Because the nodes in the policy graph represent the
passage of time, we use a linear policy graph. We set the optimizer
to the one from the Ipopt
.
We set S
, I
, and C
to be SDDP.State
variables, meaning the values from the previous node in the policy
graph will be available to the current node. We specify 2 constraints on the intervention. While the second
constraint is mathematically the same as specifying υ_cumulative.out ≤ υ_total
we must write it in
the form shown so that υ
appears in the constraint.
We then set up the differences as non-linear expressions and the update rules as non-linear constraints.
Finally, we use @stageobjective
to set the minimization objective for this node to be a linear combination
of total intervention pressure and cumulative cases.
model = SDDP.LinearPolicyGraph(
stages = nsteps,
sense = :Min,
lower_bound = 0,
optimizer = Ipopt.Optimizer,
) do sp, t
@variable(sp, 0 ≤ S, SDDP.State, initial_value = u0[1])
@variable(sp, 0 ≤ I, SDDP.State, initial_value = u0[2])
@variable(sp, 0 ≤ C, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ_cumulative, SDDP.State, initial_value = 0)
@variable(sp, 0 ≤ υ ≤ υ_max)
# constraints on control
@constraint(sp, υ_cumulative.out == υ_cumulative.in + (δt * υ))
@constraint(sp, υ_cumulative.in + (δt * υ) ≤ υ_total)
# expressions to simplify the state updates
@NLexpression(sp, infection, (1-exp(-(1 - υ) * β * I.in * δt)) * S.in)
@NLexpression(sp, recovery, (1-exp(-γ*δt)) * I.in)
# state updating rules
@NLconstraint(sp, S.out == S.in - infection)
@NLconstraint(sp, I.out == I.in + infection - recovery)
@NLconstraint(sp, C.out == C.in + infection)
# linear weighting of objectives
@stageobjective(sp, υ_cumulative.out + 40*C.out)
end;
We train the model for 100 iterations. SDDP.jl needs to iterate between forwards passes over the policy graph where the policy is optimized given an approximation of the overall objective for each node, and backwards passes to improve the approximation.
SDDP.train(model; iteration_limit = 50, print_level = 0);
After the model has been trained, we can simulate from the model under the final optimal policy. The second argument is the number of trajectories to draw (because the model is deterministic, a single trajectory will suffice). The third argument is the variables to record during simulation.
sims = SDDP.simulate(model, 1, [:S,:I, :C, :υ, :υ_cumulative]);
We can use the plotting utilities of SDDP.jl to show the optimal policy and state variables.
Plots.plot(
SDDP.publication_plot(sims, title = "S") do data
return data[:S].out
end,
SDDP.publication_plot(sims, title = "I") do data
return data[:I].out
end,
SDDP.publication_plot(sims, title = "C") do data
return data[:C].out
end,
SDDP.publication_plot(sims, title = "Control") do data
return data[:υ]
end,
SDDP.publication_plot(sims, title = "Cumulative control") do data
return data[:υ_cumulative].out
end;
xlabel = "Time"
)