To install this package, run pip install metabayes
.
Note, please see the demonstration notebook for a more detailed example.
First, import the required modules:
from metabayes import GibbsSampler, Prior
Then, you will need to define a Prior
object to be used in the hierarchical model. Using the default settings:
prior = Prior()
After your Prior
is specified, you can run the model as follows:
gs = GibbsSampler.from_binary_counts(num_trials,
control_successes,
num_trials,
treatment_successes,
prior)
gs.run_model()
results = gs.get_posterior_samples()
Note that the model currently only supports binomial metircs (i.e. conversion rate).
This repo uses a Bayesian hierarchical model to analyze multiple experiments. The intuition behind this methodology is that the model learns a prior from the experiments included in the dataset. This prior is used to compute a posterior for the true lift of each experiment.
The key results of the model are as follows:
- A posterior distribution for the average lift across all experiments. Intuitively, this quantifies the model's "best guess" for the true lift of a new experiment not in the dataset but belonging to the same category of tests included in the dataset.
- A posterior distribution for the standard deviation of the true lifts across the experiments. Intuitively, this quantifies variation in true effects across experiments. Unlike a purely "fixed effects" approach, the hierarchical model recognizes that the true effect of each experiment can be different. If the model concludes that the true effect is likely consistent across all tests, the posterior distribution for the effect standard deviation will be small. If the model concludes that the true effect varies significantly across tests, the posterior distribution for the standard deviation of true effects will be large.
- A posterior distribution for the true effect of each test. This provides a useful estimate for each test's lift with an informed prior. In this sense, the model intelligently uses data from other tests to inform the lift estimate of a given test (via the informed prior), but it also recognizes that each test can have a unique true effect.
For a set of
- The number of users who converted in the control variant of the
$i\text{th}$ test,$y_{i, c}$ - The total number of users in the control variant of the
$i\text{th}$ test,$n_{i, c}$ - The number of users who converted in the treatment variant of the
$i\text{th}$ test,$y_{i, t}$ - The total number of users in the treatment variant of the
$i\text{th}$ test,$n_{i, t}$
where
where
where
For reasons that will become clearer later, we have also defined the precision,
These equations give us well-defined statistical descriptions of how the observed lifts are distributed given the underlying true lifts. In other words, the likelihood of our Bayesian model is fully defined:
Now that we have fully defined the likelihood, let's consider the priors. Given that our likelihood (
where
We could stop here if we wanted as we now have a fully specified Bayesian model with a prior and a likelihood and leave it to users to choose sensible values for
We use a mean of 0 to reflect agnosticism about the direction of the meta-lift. The precision of this prior,
We also need to define a prior for
We use default values of
Now that we have fully defined our priors and our likelihood, we now discuss how the posterior distributions are sampled using a Gibbs sampler. The end result is a trace for each learned parameter, which approximates samples from the relevant posterior distribution. For example words, if you plot a histogram of the trace for
The Gibbs sampler draws from a computed conditional posterior for each learned parameter. In the posterior calculations, we condition on the most recently drawn values of parameters that aren't relevant to the current calculation.
To compute the next value in the trace for
We again use the normal-normal conjugate pair to update
Finally, we use the normal-gamma conjugate pair to update