Skip to content

Latest commit

 

History

History
164 lines (123 loc) · 4.51 KB

README.md

File metadata and controls

164 lines (123 loc) · 4.51 KB

Python versions

Currently only python 3.6+ is supported. Python 3.5+ support can be added if needed. We don't plan to support python 2.

Installation

Currently only local install is supported. From the main directory of this project run

pip install python/posterior_db

Installing from git url will be supported soon. Publishing the package to PyPI will also happen at some point.

Using the posterior database from python

The included database contains convenience functions to access data, model code and information for individual posteriors.

First we create the posterior database to use, here the cloned posterior database.

>>> from posteriordb import PosteriorDatabase
>>> import os
>>> pdb_path = os.path.join(os.getcwd(), "posterior_database")
>>> my_pdb = PosteriorDatabase(pdb_path)

The above code requires that your working directory is in the main folder of your copy of this project. Alternatively, you can specify the path to the folder directly. To list the posteriors available, use posterior_names.

>>> pos = my_pdb.posterior_names()
>>> pos[:5]

['roaches-roaches_negbin',
 'syn_gmK2D1n200-gmm_diagonal_nonordered',
 'radon_mn-radon_variable_intercept_centered',
 'syn_gmK3D2n300-gmm_nonordered',
 'radon-radon_hierarchical_intercept_centered']

In the same fashion, we can list data and models included in the database as

>>> mn = my_pdb.model_names()
>>> mn[:5]

['gmm_diagonal_nonordered',
 'radon_pool',
 'radon_partial_pool_noncentered',
 'blr',
 'radon_hierarchical_intercept_noncentered']


>>> dn = my_pdb.dataset_names()
>>> dn[:5]

['radon_mn',
 'wells_centered',
 'radon',
 'wells_centered_educ4_interact',
 'wells_centered_educ4']

The posterior's name is made up of the data and model fitted to the data. Together, these two uniquely define a posterior distribution. To access a posterior object we can use the posterior name.

>>> posterior = my_pdb.posterior("eight_schools-eight_schools_centered")

From the posterior we can access the dataset and the model

>>> model = posterior.model
>>> data = posterior.data

We can also access the names of posteriors, models and datasets.

>>> posterior.name
"eight_schools-eight_schools_centered"

>>> model.name
"eight_schools_centered"

>>> data.name
"eight_schools"

We can access the same model and dataset also directly from the posterior database

>>> model = my_pdb.model("eight_schools_centered")
>>> data = my_pdb.data("eight_schools")

From the model we can access model code and information about the model

>>> model.code("stan")
data {
  int <lower=0> J; // number of schools
  real y[J]; // estimated treatment
  real<lower=0> sigma[J]; // std of estimated effect
}
parameters {
  real theta[J]; // treatment effect in school j
  real mu; // hyper-parameter of mean
  real<lower=0> tau; // hyper-parameter of sdv
}
model {
  tau ~ cauchy(0, 5); // a non-informative prior
  theta ~ normal(mu, tau);
  y ~ normal(theta , sigma);
  mu ~ normal(0, 5);
}

>>> model.code_file_path("stan")
'/home/eero/posterior_database/content/models/stan/eight_schools_centered.stan'

>>> model.information
{'keywords': ['bda3_example', 'hiearchical'],
 'description': 'A centered hiearchical model for the 8 schools example of Rubin (1981)',
 'urls': ['http://www.stat.columbia.edu/~gelman/arm/examples/schools'],
 'title': 'A centered hiearchical model for 8 schools',
 'references': ['rubin1981estimation', 'gelman2013bayesian'],
 'added_by': 'Mans Magnusson',
 'added_date': '2019-08-12'}

Note that the references are referencing to BibTeX items that can be found in content/references/references.bib.

From the dataset we can access the data values and information about it

>>> data.values()
{'J': 8,
 'y': [28, 8, -3, 7, -1, 1, 18, 12],
 'sigma': [15, 10, 16, 11, 9, 11, 10, 18]}

>>> data.file_path()
'/tmp/tmpx16edu0w'

>>> data.information
{'keywords': ['bda3_example'],
 'description': 'A study for the Educational Testing Service to analyze the effects of\nspecial coaching programs on test scores. See Gelman et. al. (2014), Section 5.5 for details.',
 'urls': ['http://www.stat.columbia.edu/~gelman/arm/examples/schools'],
 'title': 'The 8 schools dataset of Rubin (1981)',
 'references': ['rubin1981estimation', 'gelman2013bayesian'],
 'added_by': 'Mans Magnusson',
 'added_date': '2019-08-12'}

To access gold standard posterior draws we can use gold_standard as follows (NOTE not implemented yet).

> gs = posterior.gold_standard()

NOT_IMPLEMENTED_YET