Genfer (GENerating functions to inFER posteriors) is a tool to compute the exact posterior distribution of discrete probabilistic models, even with infinite support and continuous priors. It is the implementation of the framework from our paper
Zaiser, Murawski, Ong: Exact Bayesian Inference on Discrete Models via Probability Generating Functions: A Probabilistic Programming Approach. NeurIPS 2023 (oral) (arxiv)
The model is specified as a probabilistic program and translated to a generating function representing the posterior distribution. Using automatic differentiation, posterior probabilities and moments can be extracted from this generating function. Further information can be found in the above paper.
To build this tool, you need a Rust installation. We recommend rustup to install Rust.
Having installed Rust, you can build the project using Rust's package and build manager Cargo:
cargo build --release
This may take a minute or two.
The build result is then available at target/release/genfer
.
For development, it may be useful to run cargo build --profile release-dev
, which performs fewer optimizations, resulting in faster compilation but slightly slower binaries.
The build result is then available at target/release-dev/genfer
.
To run exact inference on a probabilistic program path/to/program.sgcl
, use the following command:
target/release/genfer path/to/program.sgcl
You can see a list of command line options as follows:
target/release/genfer --help
- computing posterior moments (expectation, variance, skewness, kurtosis) for continuous and discrete variables
- computing posterior probability masses for discrete variables (limit can be set with
--limit
) - sampling from continuous distributions
- sampling and observing from discrete distributions
- arbitrary precision arithmetic (via
--precision <number>
). This option does not guarantee that the result is correct to that precision, because rounding errors can still compound. But it can reduce the impact of rounding errors compared to the default precision of 53 bits. Together with the--bounds
option, the actual precision of the result can be verified. - interval arithmetic to bound the floating point rounding errors (via
--bounds
) - rational number arithmetic (via
--rational
) to avoid rounding errors when no transcendental functions are used
- observations involving continuous distributions are not supported
- probability densities for posterior distributions of continuous variables cannot be computed
- the running time is exponential in the number of program variables – try to keep this number low by reusing variables! We plan on adding a program transformation pass that reduces the number of variables
- nonlinear operations on variables are not supported (for variables with finite support this can be achieved by checking for each possible value and assigning the desired result)
- unbounded loops are not supported
Consider the following problem:
Suppose your coworker gets 10 calls per day on average and each call is a scam with probability 20%. By the end of today, they have only received one scam call. How many calls did they get in total today?
We can model the number of phone calls per day with a Poisson distribution and the number of scam calls with a Binomial distribution. Then the question is what the posterior distribution of the number of calls is. As a probabilistic program, this can be described as follows:
# File name: example.sgcl
calls ~ Poisson(10);
scams ~ Binomial(calls, 0.2);
observe(scams = 1);
return calls;
Genfer can compute the posterior distribution automatically and will yield an output similar to this:
$ target/release/genfer example.sgcl
Support is a subset of: {0, ...}
Total measure: Z = 0.27067056647322557
Expected value: E = 9.0
Standard deviation: σ = 2.8284271247461876
Variance: V = 7.999999999999986
Skewness (3rd std moment): S = 0.35355339059330737
Kurtosis (4th std moment): K = 3.1249999999997264
Unnormalized: p(0) = 0.0
Normalized: p(0) / Z = 0.0
Unnormalized: p(1) = 0.00009079985952496972
Normalized: p(1) / Z = 0.0003354626279025117
Unnormalized: p(2) = 0.0007263988761997578
Normalized: p(2) / Z = 0.0026837010232200935
[...]
Unnormalized: p(25) = 6.910972968455599e-7
Normalized: p(25) / Z = 2.5532783481055835e-6
Unnormalized: p(n) <= 3.1727834072246485e-7 for all n >= 26
Normalized: p(n) / Z <= 1.1721937292869623e-6 for all n >= 26
The probabilistic programming language is called SGCL (Statistical Guarded Command Language), as it is based on pGCL (probabilistic GCL). The syntax of a probabilistic program is as follows:
<statements>
return X;
where X
is the variable whose posterior distribution will be computed and <statements>
is a sequence of statements.
Statements take one of the following forms:
- assignment & addition:
X := a * Y + b;
orX += a * Y + b;
wherea
andb
are natural numbers andX
andY
are program variables - decrementing:
X -= a
wherea
is a natural number andX
is a discrete variable - sampling:
X ~ <distribution>;
where<distribution>
is one of the supported distributions listed below - observations:
observe <event>
where<event>
is described below.fail
means observing an event that happens with probability 0. - branching:
if <event> { <statements> } else { <statements> }
where<event>
is described below - bounded looping:
loop n { <statements> }
repeats the statement blockn
times wheren
has to be a natural number.
The following distributions are supported (where m
, n
are natural numbers, a
, b
are rational numbers and p
is a rational number between 0 and 1):
Bernoulli(p)
Bernoulli(X)
whereX
is a discrete or continuous variable taking on values between 0 and 1Binomial(n, p)
Binomial(X, p)
whereX
is a discrete variableCategorical(p_0, p_1, ..., p_n)
: categorical distribution on{0, ..., n}
wherei
has probabilityp_i
Dirac(a)
wherea
Exponential(rate)
whererate
is a rational numberGeometric(p)
Gamma(shape, rate)
whereshape
andrate
are rational numbersNegBinomial(n, p)
NegBinomial(X, p)
whereX
is a discrete variablePoisson(rate)
whererate
is a rational numberPoisson(rate * X)
whererate
is a rational number andX
is a discrete or continuous variableUniformCont(a, b)
: uniform distribution on[a, b]
UniformDisc(m, n)
: uniform distribution on{m, ..., n - 1}
Rational numbers can be written as decimals (e.g. 0.4
) or fractions (e.g. 355/113
).
Events take one of the following forms:
n ~ <distribution>
: the event of samplingn
from the given distributionX in [n1, n2, n3, ...]
: the event ofX
being in the given list of natural numbersX not in [n1, n2, n3, ...]
X = n
,X != n
,X < n
,X <= n
,X > n
,X >= n
for a discrete variableX
and a natural numbern
X = Y
,X != Y
,X < Y
,X <= Y
,X > Y
,X >= Y
for discrete variablesX
andY
where one of the two has finite supportnot <event>
: negation/complement<event> and <event>
: conjunction/intersection<event> or <event>
: disjunction/union