Merge pull request #171 from pescadores/distributions

Revised distributions for stochasticmux
pescadores · Mar 11, 2024 · 19a3f37 · 19a3f37
2 parents 9ad3511 + 1f480e5
commit 19a3f37
Show file tree

Hide file tree

Showing 5 changed files with 364 additions and 23 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -103,11 +103,10 @@ jobs:
           run: pytest
 
         - name: Upload coverage to Codecov
-          uses: codecov/codecov-action@v3
+          uses: codecov/codecov-action@v4
           with:
             token: ${{ secrets.CODECOV_TOKEN }}
             files: ./coverage.xml
-            directory: ./coverage/reports/
             flags: unittests
             env_vars: OS,PYTHON
             name: codecov-umbrella

diff --git a/docs/index.rst b/docs/index.rst
@@ -96,6 +96,14 @@ Advanced examples
 
     auto_examples/index
 
+***********************
+Stochastic mux analysis
+***********************
+.. toctree::
+    :maxdepth: 2
+
+    muxanalysis
+
 *************
 API Reference
 *************

diff --git a/docs/muxanalysis.rst b/docs/muxanalysis.rst
@@ -0,0 +1,262 @@
+.. _muxanalysis:
+
+Analysis of Stochastic Mux
+==========================
+
+:ref:`mux` objects (*mux* for short, *muxen* for plural) allow multiple :ref:`Streamer` objects to
+be combined into a single stream by selectively sampling from each constituent stream.
+The different kinds of mux objects provide different behaviors, but among them, and among
+them, ``StochasticMux`` is the most complex.
+This section provides an in-depth analysis of ``StochasticMux``'s behavior.
+
+
+
+Stream activation and replacement
+---------------------------------
+
+``StochasticMux`` differs from other muxen (``ShuffledMux``, ``RoundRobinMux``, etc.) by
+maintaining an **active set** of streamers from the full collection it is multiplexing.
+At any given time, samples are drawn only from the active set, while the remaining streamers are
+**inactive**.
+Each active streamer is limited to produce a (possibly random) number of samples, after which, it is removed from
+the active set and replaced by a new streamer selected at random; hence the name **StochasticMux**.
+
+A key quantity to understand when using ``StochasticMux`` is the streamer replacement rate: how
+often should we expect streamers to be replaced from the active set, as a function of samples
+generated by the mux?
+This quantity is important for a couple of reasons:
+
+    * If we care about the distribution of samples produced by ``StochasticMux`` being a good
+      approximation of what you would get if all streamers were active simultaneously (i.e.,
+      ``ShuffledMux`` behavior), then the streamer replacement rate should be small.
+    * If we have large startup costs involved with activating a streamer (e.g., loading data
+      from disk), then streamer replacement should be infrequent to ensure high throughput. 
+      What's more, replacement events should be spread out among the active set, to avoid having several replacement events in a short period of time.
+
+In the following sections, we'll analyze replacement rates for the different choices of rate
+distributions (`constant`, `poisson`, and `binomial`).
+We'll focus the analysis on a single (active) streamer at a time.
+The question we'll analyze is specifically: how many samples :math:`N` must we generate (in
+expectation) before a specific streamer is deactivated and replaced?
+Understanding the distribution of `N` (its mean and variance) will help us understand how often
+we should expect to see streamer replacement events.
+
+
+Notation
+--------
+
+Let :math:`A` denote the size of the active set, let :math:`r` denote the number of samples
+generated by a particular streamer, and let :math:`p` denote the probability of selecting the
+active streamer in question.
+We'll make the simplifying assumption that the ``weights`` attached to all streamers are
+uniform, i.e., :math:`p = 1/A`.
+
+
+Constant distribution
+---------------------
+
+When using the ``constant`` distribution, the sample limit :math:`r` is fixed in advance.
+Our question about the number of samples generated by StochasticMux can then be rephrased
+slightly:
+how many samples :math:`K` must we draw from *all other active streamers* before drawing the
+:math:`r`\ th sample from the streamer under analysis?
+
+This number :math:`K` is a random variable, modeled by the `negative binomial distribution <https://en.wikipedia.org/wiki/Negative_binomial_distribution>`_:
+
+.. math::
+
+   \text{Pr}[K = k] = {k + r - 1 \choose k} {(1-p)^k p^r}
+
+
+It has expected value
+
+.. math::
+
+   \text{E}[K] = r \times \frac{1-p}{p},
+
+and variance
+
+.. math::
+
+   \text{Var}[K] = r \times \frac{1-p}{p^2}.
+
+
+The total number of samples produced by the mux before the streamer is replaced is now a random
+variable :math:`N = K + r`.
+We can use linearity of expectation to compute its expected value as
+
+.. math::
+
+   \text{E}[N] = \text{E}[K] + r = r \times\frac{1-p}{p} + r = \frac{r}{p}.
+
+
+Since :math:`N` and :math:`K` differ only by a constant (:math:`r`), they have the same
+variance:
+
+.. math::
+
+   \text{Var}[N] = \text{Var}[K].
+
+
+If we apply the simplifying assumption that streamers are selected uniformly at random (:math:`p
+= 1/A`), then we get the following:
+
+    * :math:`\text{E}[N] = r \times A`, and 
+    * :math:`\text{Var}[N] = r \times A \times (A-1)`.
+
+In plain language, this says that the streamer replacement rate scales like the product of the size of the active set and the number of samples per streamer.
+Making either of these values large implies that we should expect to wait longer to replace an active streamer.
+However, the variance of replacement event times is approximately **quadratic** in the size of the active set.
+This means that making the active set larger will increase the dispersion of replacement events away from the expected value.
+
+
+Poisson distribution
+--------------------
+
+In pescador version 2 and earlier, the sample limit :math:`r` was not a constant value, but a
+random variable :math:`R` drawn from a Poisson distribution with rate parameter :math:`\lambda`.
+In this case, the mean and variance of :math:`R` are simply :math:`\text{E}[R] =
+\text{Var}[R] = \lambda`.
+
+However, this does not lead to a closed form expression for :math:`\text{E}[N]` or :math:`\text{Var}[N]` because we must now marginalize over :math:`R`:
+
+.. math::
+
+   \text{Pr}[N=n]   &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r, R = r]\\
+                    &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R = r] \times \text{Pr}[R=r]\\
+                    &= \sum_{r=0}^{\infty} {n - 1 \choose n-r} {(1-p)^{n-r} p^r} \times \frac{\lambda^r e^{-\lambda}}{r!}
+
+
+While this distribution is still supported, it has been replaced as the default by a binomial
+distribution mode which is more amenable to analysis.
+
+
+.. note::
+    In pescador ≥ 3.0, poisson mode is actually implemented as :math:`R \sim 1 +
+    \text{Poisson}(\lambda - 1)`.  This maintains the expected value of
+    :math:`\lambda`, with slightly reduced variance (:math:`\lambda - 1`), 
+    but it ensures that at least one sample is produced from active streamers before deactivation.
+
+    This logic is also applied to the binomial mode described below, but omitted
+    from the analysis here for simplicity.
+
+Binomial distribution
+---------------------
+
+In the binomial distribution mode, :math:`R` is a random variable governed by a binomial
+distribution with parameters :math:`(m, q)`:
+
+.. math::
+
+   \text{Pr}[R=r] = {m \choose r} q^r (1-q)^{m-r}.
+
+(We will come back to determining values for :math:`(m, q)` later.)
+
+This distribution can be integrated with the negative binomial distribution above to yield a
+straightforward computation of :math:`\text{Pr}[N]`.
+
+.. math::
+
+   \text{Pr}[N=n] &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R= r] \times \text{Pr}[R=r]\\
+   &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \times {m \choose r} q^r {(1-q)}^{m-r}.
+
+If we set :math:`q = 1-p`, this simplifies as follows:
+
+.. math::
+
+   \text{Pr}[N=n] &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \times {m \choose r} {(1-p)}^r p^{m-r}\\
+   &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^n p^m {m \choose r}\\
+   &= {\left(1-p\right)}^n p^m {n + m - 1\choose n}.
+
+This distribution again has the form of a negative binomial with parameters :math:`(m, 1-p)`.
+If we further set
+
+.. math::
+
+   m = \frac{\lambda}{1-p}
+
+for an expected rate parameter :math:`\lambda > 0` (as in the Poisson case above), then the
+distribution :math:`\text{Pr}[N=n]` is
+
+.. math::
+
+   N \sim \text{NB}\left(\frac{\lambda}{1-p}, 1-p\right),
+
+where NB denotes the probability mass function of the negative binomial distribution.
+This yields:
+
+    - :math:`\text{E}[R] = \lambda`: each streamer generates :math:`\lambda` samples
+      on average,
+    - :math:`\text{Var}[R] = \lambda \times p`,
+    - :math:`\text{E}[N] = \lambda / p`, and
+    - :math:`\text{Var}[N] = \lambda \frac{1-p}{p^2}`.
+
+These match the analysis of the constant-mode case above, except that the number of samples per
+streamer is now a random variable with expectation :math:`\lambda`.
+Again, in the special case where :math:`p=1/A`, we recover
+
+    - :math:`\text{E}[N] = \lambda \times A`, and
+    - :math:`\text{Var}[N] = \lambda \times A \times (A-1)`.
+
+In short, binomial mode ``StochasticMux`` exhibits the same stream replacement
+characteristics as the constant-mode case, but relaxes the need for each streamer to
+generate an identical number of samples.
+
+
+Limiting case :math:`p=1`
+-------------------------
+
+As defined above, the binomial mode is ill-defined when :math:`p=1` due to a
+division-by-zero in the parametrization.
+This situation does occur in practice with some configurations of ``StochasticMux``,
+e.g. when operating in **exhaustive** mode so that streamers are activated without
+replacement and are discarded after deactivation.
+In this case, the size of the active set :math:`A` can eventually decay, and the
+probability of choosing the last active streamer :math:`p \rightarrow 1`.
+
+To circumvent this issue, ``StochasticMux`` detects this situation automatically and
+falls back on a Poisson distribution for :math:`R`.
+This is justified by the `Poisson limit theorem <https://en.wikipedia.org/wiki/Poisson_limit_theorem>`_ if we take the product :math:`\lambda/(1-p) \times (1-p) = \lambda` as the limit value as :math:`p \rightarrow 1`.
+
+
+Discussion
+----------
+
+The above analysis tells us, on average, how long we should expect to wait before a
+given streamer is exhausted and replaced.
+Because this distribution applies equally to all streamers, the variance of this
+distribution tells us how dispersed these replacement events are likely to be.
+
+Qualitatively, there are a few things we can observe from the above analysis.
+
+First, for large active set sizes :math:`A`, binomial mode will behave similarly to constant mode because :math:`\text{Var}[R]` will be inversely proportional to :math:`A`.
+For small active sets, binomial mode will behave more similarly to Poisson mode.
+
+Second, Poisson mode will exhibit the highest variance of sample limit values
+:math:`\text{Var}[R] = \lambda` upper-bounds that of the binomial mode
+:math:`\text{Var}[R] = \lambda \times p`.
+We can therefore expect that the replacement event distribution :math:`\text{Pr}[N]`
+under Poisson mode will also exhibit slightly higher variance in general.
+
+Third, the binomial mode provides a controlled interaction between the stream
+replacement rate and the size of the active set, which is difficult to achieve with
+Poisson mode.
+
+Finally, we should emphasize that the analysis in this section represents a common,
+if simplified application of ``StochasticMux``, and there are many other variables
+at play that may alter the mux's behavior, including:
+
+    - whether streamers are activated with or without replacement,
+    - whether streamers are used exhaustively or replaced after deactivation,
+    - whether streamer weights are uniform or non-uniform,
+    - whether streamers self-limit instead of relying on the mux for deactivation.
+
+The final point is subtle: remember that streamers encapsulate arbitrary generator
+code, and there's nothing stopping a generator from determining its own maximum
+number of samples to produce.
+If this number is smaller than the value assigned by the mux, the streamer will act
+as if it has been exhausted and the mux will replace it immediately.
+This situation would both reduce the replacement time average and variance.
+(If a streamer self-limits at a number larger than the mux's limit, the mux will
+terminate it first and the analysis above still holds.)
+