First pass on enhancing the vignette

UofUEpiBio · Dec 11, 2024 · 4011c4b · 4011c4b
1 parent 6c6568d
commit 4011c4b
Showing 1 changed file with 58 additions and 34 deletions.
diff --git a/vignettes/likelihood-free-mcmc.Rmd b/vignettes/likelihood-free-mcmc.Rmd
@@ -18,13 +18,16 @@ knitr::opts_chunk$set(
 ```
 # Introduction
 The purpose of the "lfmcmc" function is to perform a Likelihood-Free Markhov Chain Monte Carlo simulation.
+In this example, we use LFMCMC to recover the parameters of an SIR model.
 
-# Example: Using LFMCMC to calibrate an SIR Model
+# Setup the SIR Model
 
-## Setup and Running the Model
-Create an SIR Model and add a small world population.
-Then, run the model.
-```{r sir-setup}
+This example uses a social network with parameters listed within the
+`ModelSIR` function.
+The virus name is specified (COVID-19), 1000 agents are initialized, the virus prevalence of 0.1 is declared, the transmission rate for any given agent is 0.3, and the recovery rate is set to 0.3.
+To create this model in epiworldR, use the `ModelSIR` and `agents_smallworld` functions.
+
+```{r setup-sir}
 library(epiworldR)
 
 model_seed <- 122
@@ -43,7 +46,11 @@ agents_smallworld(
   d = FALSE,
   p = 0.01
 )
+```
 
+# Run the Model
+We run the model for 50 days and print the summary statistics.
+```{r run-sir}
 verbose_off(model_sir)
 
 run(
@@ -54,14 +61,25 @@ run(
 
 summary(model_sir)
 ```
+For this example, we will try to recover the Recovery and Transmission rate parameters with LFMCMC.
+In most cases, you'd use an observed dataset, but we're using the simulated data generated by running the model to show we can recover the parameters through LFMCMC.
+```{r get-obs-data}
+sir_model_data <- get_today_total(model_sir)
+```
 
-## Setup LFMCMC
-```{r lfmcmc-setup}
-# Extract the observed data from the model
-obs_data <- get_today_total(model_sir)
+# Setup LFMCMC
+Now that we have our dataset, we can set up the LFMCMC object.
+LFMCMC requires four functions:
+
+* Simulation function: This runs an epiworld SIR model with a given set of parameters and outputs a simulated dataset from the model run. In our case, we are setting the Recovery and Transmission rate parameters and running the model for 50 days, then returning the final totals for each SIR state. If you were using an observed dataset, this function should return output that is has the same structure as the observed data (e.g., dimensions, value types, etc.)
+* Summary function: This extracts summary statistics from the dataset. This should produce the same output format for both the observed data and the simulation function output. In our case, since we are using the final SIR state totals as our "observed" data which is the same as the output of the simulation function, our summary function simply passes that data through.
+* Proposal function: This returns a new set of parameters. It is called the proposal function, because it is "proposing" the new parameters for the LFMCMC algorithm to try in the simulation function. It does this by taking the parameters from the previous run (`old_params`) and taking a random step away from those values.
+* Kernel function: This scores the results of the latest simulation run against the observed dataset, by comparing the summary statistics (from `summary_fun()`). LFMCMC uses the kernel score and the Hastings Ratio to determine whether or not to accept that run.
+
+We define our LFMCMC functions:
 
-# Define the LFMCMC simulation function
-simfun <- function(params) {
+```{r lfmcmc-fun-setup}
+simulation_fun <- function(params) {
 
   set_param(model_sir, "Recovery rate", params[1])
   set_param(model_sir, "Transmission rate", params[2])
@@ -75,50 +93,56 @@ simfun <- function(params) {
 
 }
 
-# Define the LFMCMC summary function
-sumfun <- function(dat) {
+summary_fun <- function(dat) {
   return(dat)
 }
 
-# Define the LFMCMC proposal function
-propfun <- function(old_params) {
+proposal_fun <- function(old_params) {
   res <- plogis(qlogis(old_params) + rnorm(length(old_params)))
   return(res)
 }
 
-# Define the LFMCMC kernel function
-kernelfun <- function(simulated_stats, observed_stats, epsilon) {
+kernel_fun <- function(simulated_stats, observed_stats, epsilon) {
   dnorm(sqrt(sum((simulated_stats - observed_stats)^2)))
 }
+```
 
-# Create the LFMCMC model
+Then we initialize the LFMCMC object using the `LFMCMC` function in epiworldR and the various setters for the four LFMCMC functions and the observed data.
+```{r lfmcmc-setup}
 lfmcmc_model <- LFMCMC(model_sir) |>
-  set_simulation_fun(simfun) |>
-  set_summary_fun(sumfun) |>
-  set_proposal_fun(propfun) |>
-  set_kernel_fun(kernelfun) |>
-  set_observed_data(obs_data)
+  set_simulation_fun(simulation_fun) |>
+  set_summary_fun(summary_fun) |>
+  set_proposal_fun(proposal_fun) |>
+  set_kernel_fun(kernel_fun) |>
+  set_observed_data(sir_model_data)
 ```
 
-## Run LFMCMC simulation
+# Run LFMCMC simulation
+We set the initial parameters of the simulation with Recovery rate of 0.1 and a Transmission rate of 0.5.
+We will run LFMCMC for 2,000 iterations and use a kernel epsilon of 1.0.
+Now, we can run the LFMCMC simulation with the `run_lfmcmc()` function in epiworldR.
+
 ```{r lfmcmc-run}
-# Set initial parameters
-par0 <- c(0.1, 0.5)
-n_samp <- 2000
-epsil <- 1.0
+initial_params <- c(0.1, 0.5)
+n_samples <- 2000
+epsilon <- 1.0
 
-# Run the LFMCMC simulation
 run_lfmcmc(
   lfmcmc = lfmcmc_model,
-  params_init_ = par0,
-  n_samples_ = n_samp,
-  epsilon_ = epsil,
+  params_init_ = initial_params,
+  n_samples_ = n_samples,
+  epsilon_ = epsilon,
   seed = model_seed
 )
-
-# Print the results
+```
+# Print results
+We set the name of the LFMCMC statistics using the state names of the SIR model and set the names of the estimated parameters.
+Then we print the results of the simulation.
+```{r lfmcmc-print}
 set_stats_names(lfmcmc_model, get_states(model_sir))
 set_params_names(lfmcmc_model, c("Immune recovery", "Infectiousness"))
 
 print(lfmcmc_model)
 ```
+LFMCMC didn't perfectly recover the initial model parameters, but it did produce reasonably close estimates.
+If the dataset were observed and the initial Recovery and Transmission rates unknown, we could estimate them using LFMCMC.