A PyMC and Bambi implementation of the algorithms from:
Sean Talts, Michael Betancourt, Daniel Simpson, Aki Vehtari, Andrew Gelman: “Validating Bayesian Inference Algorithms with Simulation-Based Calibration”, 2018; arXiv:1804.06788
Many thanks to the authors for providing open, reproducible code and implementations in rstan
and PyStan
(link).
May be pip installed from github:
pip install git+https://github.com/ColCarroll/simulation_based_calibration
-
Define a function returning a PyMC model. The arguments must be the same as the observed variables.
import numpy as np import pymc as pm data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]) sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]) with pm.Model() as centered_eight: obs = pm.MutableData("obs", data) mu = pm.Normal('mu', mu=0, sigma=5) tau = pm.HalfCauchy('tau', beta=5) theta = pm.Normal('theta', mu=mu, sigma=tau, shape=8) y_obs = pm.Normal('y', mu=theta, sigma=sigma, observed=obs)
-
Run simulations
sbc = SBC(centered_eight, {'obs':'y'}, num_simulations=100, # ideally this should be higher, like 1000 sample_kwargs={'draws': 25, 'tune': 50}) sbc.run_simulations()
79%|███████▉ | 79/100 [05:36<01:29, 4.27s/it]
-
Plot the empirical CDF plots for the difference between prior and posterior. The lines should be close to uniform and within the oval envelope.
sbc.plot_results()
The paper on the arXiv is very well written, and explains the algorithm quite well.
Morally, the example below is exactly what this library does, but it generalizes to more complicated models:
def my_model(y=None):
with pm.Model() as model:
x = pm.Normal('x')
pm.Normal('y', mu=x, observed=y)
return model
Then what this library does is compute
with my_model():
prior_samples = pm.sample_prior_predictive(num_trials)
simulations = {'x': []}
for idx in range(num_trials):
y_tilde = prior_samples['y'][idx]
x_tilde = prior_samples['x'][idx]
with model(y=y_tilde):
trace = pm.sample()
simulations['x'].append((trace['x'] < x_tilde).sum())