Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{WIP} Developer guide update #7510

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 133 additions & 100 deletions docs/source/contributing/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,38 +4,70 @@ orphan: true

# PyMC Developer Guide

{doc}`PyMC <index>` is a Python package for Bayesian statistical modeling built on top of {doc}`PyTensor <pytensor:index>`.
This document aims to explain the design and implementation of probabilistic programming in PyMC, with comparisons to other PPLs like TensorFlow Probability (TFP) and Pyro.
{doc}`PyMC <index>` is a Python package for Bayesian statistical modeling built on top of the {doc}`PyTensor <pytensor:index>` library.
This document explains the design and implementation of probabilistic programming in PyMC, with comparisons to other probabilistic programming libraries like TensorFlow Probability (TFP) and Pyro.
A user-facing API introduction can be found in the {ref}`API quickstart <pymc_overview>`.
A more accessible, user facing deep introduction can be found in [Peadar Coyle's probabilistic programming primer](https://github.com/springcoil/probabilisticprogrammingprimer).

## Distribution
An accessible introduction to building models with PyMC can be found in [our PyData London 2022 tutorial](https://github.com/fonnesbeck/probabilistic_python).

## Table of Contents
- [Distributions](#distributions)
- [Reflection](#reflection)
- [PyMC in Comparison](#pymc-in-comparison)
- [PyMC](#pymc)
- [Tensorflow Probability](#tensorflow-probability)
- [Pyro](#pyro)
- [Behind the scenes of the logp function](#behind-the-scenes-of-the-logp-function)
- [Model Context and Random Variables](#model-context-and-random-variables)
- [Additional things that pm.Model does](#additional-things-that-pmmodel-does)
- [Logp and dlogp](#logp-and-dlogp)
- [Inference](#inference)
- [MCMC](#mcmc)
- [Transition kernel](#transition-kernel)
- [Dynamic HMC](#dynamic-hmc)
- [Variational Inference (VI)](#variational-inference-vi)
- [Some challenges and insights from implementing VI](#some-challenges-and-insights-from-implementing-vi)
- [Forward sampling](#forward-sampling)
- [Extending PyMC](#extending-pymc)
- [What we got wrong](#what-we-got-wrong)
- [Shape](#shape)
- [Random methods in numpy](#random-methods-in-numpy)
- [Samplers are in Python](#samplers-are-in-python)

## Distributions

Probability distributions in PyMC are implemented as classes that inherit from {class}`~pymc.Continuous` or {class}`~pymc.Discrete`.
Either of these inherit {class}`~pymc.Distribution` which defines the high level API.
Both of these inherit {class}`~pymc.Distribution` which defines the high level API.

For a detailed introduction on how a new distribution should be implemented check out the {ref}`guide on implementing distributions <implementing_distribution>`.
For a detailed introduction on how a specific statistical distribution should be implemented check out the {ref}`guide on implementing distributions <implementing_distribution>`.


## Reflection

How tensor/value semantics for probability distributions are enabled in PyMC:
Let's consider how the tensor/value semantics for probability distributions are enabled in PyMC.

Model random variables are created by calling probability distribution classes with parameters inside of a `pm.Model` context, using a syntax analogous to statistical notation. For example, a normal distribution with a specified mean and standard deviation is written as:

$$
z \sim \text{Normal}(0, 5)
$$

In PyMC, model variables are defined by calling probability distribution classes with parameters:
And in PyMC:

```python
z = Normal("z", 0, 5)
with pm.Model():
z = pm. Normal("z", 0, 5)
```

This is done inside the context of a ``pm.Model``, which intercepts some information, for example to capture known dimensions.
The notation aligns with the typically used math notation:
The context manager intercepts information about the distribution relevant to the model, such as the variable dimension and any transforms, and registers it with the model.

$$
z \sim \text{Normal}(0, 5)
$$
The call to a {class}`~pymc.Distribution` constructor returns an PyTensor {class}`~pytensor.tensor.TensorVariable`, which is a symbolic representation of the model variable and the graph of inputs it depends on.

A call to a {class}`~pymc.Distribution` constructor as shown above returns an PyTensor {class}`~pytensor.tensor.TensorVariable`, which is a symbolic representation of the model variable and the graph of inputs it depends on.
Under the hood, the variables are created through the {meth}`~pymc.Distribution.dist` API, which calls the {class}`~pytensor.tensor.random.basic.RandomVariable` {class}`~pytensor.graph.op.Op` corresponding to the distribution.
```python
print(type(z))
# ==> <class 'pytensor.tensor.variable.TensorVariable'>
```

Under the hood, the variables are created through the {meth}`~pymc.Distribution.dist` classmethod, which calls the {class}`~pytensor.tensor.random.basic.RandomVariable` {class}`~pytensor.graph.op.Op` corresponding to the distribution.

At a high level of abstraction, the idea behind ``RandomVariable`` ``Op``s is to create symbolic variables (``TensorVariable``s) that can be associated with the properties of a probability distribution.
For example, the ``RandomVariable`` ``Op`` which becomes part of the symbolic computation graph is associated with the random number generators or probability mass/density functions of the distribution.
Expand All @@ -57,7 +89,7 @@ Now, because the ``NormalRV`` can be associated with the [probability density fu
with pm.Model():
z = pm.Normal("z", 0, 5)
symbolic = pm.logp(z, 2.5)
numeric = symbolic.eval()
symbolic.eval()
# array(-2.65337645)
```

Expand Down Expand Up @@ -92,14 +124,17 @@ $$

```python
with pm.Model() as model:
z = pm.Normal('z', mu=0., sigma=5.) # ==> pytensor.tensor.var.TensorVariable
x = pm.Normal('x', mu=z, sigma=1., observed=5.) # ==> pytensor.tensor.var.TensorVariable
z = pm.Normal('z', mu=0., sigma=5.)
# ==> pytensor.tensor.var.TensorVariable
x = pm.Normal('x', mu=z, sigma=1., observed=5.)
# ==> pytensor.tensor.var.TensorVariable
# The log-prior of z=2.5
pm.logp(z, 2.5).eval() # ==> -2.65337645
# ???????
x.logp({'z': 2.5}) # ==> -4.0439386
# ???????
model.logp({'z': 2.5}) # ==> -6.6973152
pm.logp(z, 2.5).eval()
# ==> -2.65337645
x.logp({'z': 2.5})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a thing anymore

# ==> -4.0439386
model.logp({'z': 2.5})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model.logp({'z': 2.5})
model.compile_logp()({'z': 2.5})

# ==> -6.6973152
```

### Tensorflow Probability
Expand All @@ -110,25 +145,35 @@ import tensorflow.compat.v1 as tf
from tensorflow_probability import distributions as tfd

with tf.Session() as sess:
z_dist = tfd.Normal(loc=0., scale=5.) # ==> <class 'tfp.python.distributions.normal.Normal'>
z = z_dist.sample() # ==> <class 'tensorflow.python.framework.ops.Tensor'>
x = tfd.Normal(loc=z, scale=1.).log_prob(5.) # ==> <class 'tensorflow.python.framework.ops.Tensor'>
z_dist = tfd.Normal(loc=0., scale=5.)
# ==> <class 'tfp.python.distributions.normal.Normal'>
z = z_dist.sample()
# ==> <class 'tensorflow.python.framework.ops.Tensor'>
x = tfd.Normal(loc=z, scale=1.).log_prob(5.)
# ==> <class 'tensorflow.python.framework.ops.Tensor'>
model_logp = z_dist.log_prob(z) + x
print(sess.run(x, feed_dict={z: 2.5})) # ==> -4.0439386
print(sess.run(model_logp, feed_dict={z: 2.5})) # ==> -6.6973152
print(sess.run(x, feed_dict={z: 2.5}))
# ==> -4.0439386
print(sess.run(model_logp, feed_dict={z: 2.5}))
# ==> -6.6973152
```

### Pyro

```python
z_dist = dist.Normal(loc=0., scale=5.) # ==> <class 'pyro.distributions.torch.Normal'>
z = pyro.sample("z", z_dist) # ==> <class 'torch.Tensor'>
z_dist = dist.Normal(loc=0., scale=5.)
# ==> <class 'pyro.distributions.torch.Normal'>
z = pyro.sample("z", z_dist)
# ==> <class 'torch.Tensor'>
# reset/specify value of z
z.data = torch.tensor(2.5)
x = dist.Normal(loc=z, scale=1.).log_prob(5.) # ==> <class 'torch.Tensor'>
x = dist.Normal(loc=z, scale=1.).log_prob(5.)
# ==> <class 'torch.Tensor'>
model_logp = z_dist.log_prob(z) + x
x # ==> -4.0439386
model_logp # ==> -6.6973152
x
# ==> -4.0439386
model_logp
# ==> -6.6973152
```


Expand Down Expand Up @@ -159,21 +204,24 @@ As explained above, distribution in a ``pm.Model()`` context automatically turn
To get the logp of a free\_RV is just evaluating the ``logp()`` [on itself](https://github.com/pymc-devs/pymc/blob/6d07591962a6c135640a3c31903eba66b34e71d8/pymc/model.py#L1212-L1213):

```python
# self is a pytensor.tensor with a distribution attached
self.logp_sum_unscaledt = distribution.logp_sum(self)
self.logp_nojac_unscaledt = distribution.logp_nojac(self)
class Normal(Continuous):
def logp(self, value):
mu = self.mu
tau = self.tau
return bound(
(-tau * (value - mu) ** 2 + pt.log(tau / np.pi / 2.0)) / 2.0,
tau > 0,
)
```

Or for an observed RV. it evaluate the logp on the data:
The logp evaluations are represented as tensors (``RV.logpt``). When we combine different ``logp`` values (for example, by summing all ``RVs.logpt`` to obtain the total logp for the model), PyTensor manages the dependencies automatically during the graph construction and compilation process.
This dependence among nodes in the model graph means that whenever you want to generate a new function that takes new input tensors, you either need to regenerate the graph with the appropriate dependencies, or replace the node by editing the existing graph.
The latter is facilitated by PyTensor's ``pytensor.clone_replace()`` function.

```python
self.logp_sum_unscaledt = distribution.logp_sum(data)
self.logp_nojac_unscaledt = distribution.logp_nojac(data)
```

### Model context and Random Variable
### Model Context and Random Variables

I like to think that the ``with pm.Model() ...`` is a key syntax feature and *the* signature of PyMC model language, and in general a great out-of-the-box thinking/usage of the context manager in Python (with some critics, of course).
A signature feature of PyMC's syntax is the ``with pm.Model() ...`` expression, which extends the functionality of the context manager in Python to make expressing Bayesian models as natural as possible.

Essentially [what a context manager does](https://www.python.org/dev/peps/pep-0343/) is:

Expand All @@ -193,38 +241,33 @@ finally:
VAR.__exit__()
```

or conceptually:

```python
with EXPR as VAR:
# DO SOMETHING
USERCODE
# DO SOME ADDITIONAL THINGS
```

So what happened within the ``with pm.Model() as model: ...`` block, besides the initial set up ``model = pm.Model()``?
Starting from the most elementary:
But what are the implications of this, besides the model instatiation ``model = pm.Model()``?

### Random Variable

From the above session, we know that when we call e.g. ``pm.Normal('x', ...)`` within a Model context, it returns a random variable.
Thus, we have two equivalent ways of adding random variable to a model:
As we have seen already, when we call e.g. ``pm.Normal('x', ...)`` within a Model context, it returns a random variable.

```python
with pm.Model() as m:
with pm.Model() as model:
x = pm.Normal('x', mu=0., sigma=1.)

print(type(x)) # ==> <class 'pytensor.tensor.var.TensorVariable'>
print(m.free_RVs) # ==> [x]
print(logpt(x, 5.0)) # ==> Elemwise{switch,no_inplace}.0
print(logpt(x, 5.).eval({})) # ==> -13.418938533204672
print(m.logp({'x': 5.})) # ==> -13.418938533204672
print(type(x))
# ==> <class 'pytensor.tensor.var.TensorVariable'>
print(model.free_RVs)
# ==> [x]
print(pm.logp(x, 5.0))
# ==> Elemwise{switch,no_inplace}.0
print(pm.logp(x, 5.).eval({}))
# ==> -13.418938533204672
print(model.compile_logp()({'x': 5.}))
# ==> -13.418938533204672
```

In general, if a variable has observations (``observed`` parameter), the RV is an observed RV, otherwise if it has a ``transformed`` (``transform`` parameter) attribute, it is a transformed RV otherwise, it will be the most elementary form: a free RV.

Note that this means that random variables with observations cannot be transformed.

<!--

Below, I will take a deeper look into transformed RV. A normal user
might not necessarily come in contact with the concept, since a
transformed RV and ``TransformedDistribution`` are intentionally not
Expand All @@ -245,18 +288,11 @@ usually created in order to optimise performance. But getting a
possible (see also in
{ref}`doc <pymc_overview##Transformed-distributions-and-changes-of-variables>`):

.. code:: python


lognorm = Exp().apply(pm.Normal.dist(0., 1.))
lognorm


.. parsed-literal::

<pymc.distributions.transforms.TransformedDistribution at 0x7f1536749b00>


```python
lognorm = Exp().apply(pm.Normal.dist(0., 1.))
lognorm
# <pymc.distributions.transforms.TransformedDistribution at 0x7f1536749b00>
```

Now, back to ``model.RV(...)`` - things returned from ``model.RV(...)``
are PyTensor tensor variables, and it is clear from looking at
Expand Down Expand Up @@ -305,29 +341,33 @@ transformation by nested applying multiple transforms to a Distribution

z2 = Exp().apply(z)
z2.transform is None # ==> True
-->



### Additional things that ``pm.Model`` does

In a way, ``pm.Model`` is a tape machine that records what is being added to the model, it keeps track the random variables (observed or unobserved) and potential term (additional tensor that to be added to the model logp), and also deterministic transformation (as bookkeeping):

* named\_vars
* free\_RVs
* observed\_RVs
* deterministics
* potentials
* missing\_values

The model context then computes some simple model properties, builds a bijection mapping that transforms between dictionary and numpy/PyTensor ndarray, thus allowing the ``logp``/``dlogp`` functions to have two equivalent versions:
One takes a ``dict`` as input and the other takes an ``ndarray`` as input.
More importantly, a ``pm.Model()`` contains methods to compile PyTensor functions that take Random Variables (that are also initialised within the same model) as input, for example:

```python
from pymc.blocking import DictToArrayBijection

with pm.Model() as m:
z = pm.Normal('z', 0., 10., shape=10)
x = pm.Normal('x', z, 1., shape=10)

print(m.initial_point)
print(m.dict_to_array(m.initial_point)) # ==> m.bijection.map(m.initial_point)
print(m.initial_point())
print(DictToArrayBijection.map(m.initial_point)) # ==> m.bijection.map(m.initial_point)
print(m.bijection.rmap(np.arange(20)))
# {'z': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'x': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])}
# [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Expand All @@ -336,26 +376,19 @@ print(m.bijection.rmap(np.arange(20)))

```python
list(filter(lambda x: "logp" in x, dir(pm.Model)))
#['d2logp',
# 'd2logp_nojac',
# 'datalogpt',
# 'dlogp',
# 'dlogp_array',
# 'dlogp_nojac',
# 'fastd2logp',
# 'fastd2logp_nojac',
# 'fastdlogp',
# 'fastdlogp_nojac',
# 'fastlogp',
# 'fastlogp_nojac',
# 'logp',
# 'logp_array',
# 'logp_dlogp_function',
# 'logp_elemwise',
# 'logp_nojac',
# 'logp_nojact',
# 'logpt',
# 'varlogpt']
# ['compile_d2logp',
# 'compile_dlogp',
# 'compile_logp',
# 'd2logp',
# 'datalogp',
# 'dlogp',
# 'logp',
# 'logp_dlogp_function',
# 'observedlogp',
# 'point_logps',
# 'potentiallogp',
# 'varlogp',
# 'varlogp_nojac']
```

### Logp and dlogp
Expand Down