Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised distributions for stochasticmux #171

Merged
merged 10 commits into from
Mar 11, 2024
Merged

Revised distributions for stochasticmux #171

merged 10 commits into from
Mar 11, 2024

Conversation

bmcfee
Copy link
Collaborator

@bmcfee bmcfee commented Feb 1, 2024

This PR implements several changes described in #148

  • New distributions for stochasticmux: const and binomial (new default)
  • Adjusted the poisson mode so that the expected value is actually rate and not rate+1

I've also relaxed the uniform convergence unit tests. A p-value of >=0.95 was probably overkill for the sample size we were drawing, and I've reduced it to 0.5. Strangely, poisson was giving me the most trouble here, while const and binomial were behaving better. It's probably an artifact of setting rate=2 in the test.

@bmcfee bmcfee added this to the 3.0.0 milestone Feb 1, 2024
Copy link

codecov bot commented Feb 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.67%. Comparing base (9ad3511) to head (1f480e5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #171      +/-   ##
==========================================
- Coverage   97.78%   97.67%   -0.12%     
==========================================
  Files           8        8              
  Lines         542      559      +17     
==========================================
+ Hits          530      546      +16     
- Misses         12       13       +1     
Flag Coverage Δ
unittests 97.67% <100.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bmcfee
Copy link
Collaborator Author

bmcfee commented Feb 1, 2024

There's a bit of weirdness here in the initialization with binomial mode. (Will come back to this later...)

The binomials are parametrized by Bin((rate-1)/(1-p), 1-p) where p is the probability of selecting the streamer from the active set. The reason for this dependence is subtle, but it ensures that the replacement times for streamers don't concentrate too much; a streamer gets replaced on average every rate * N_active samples, with variance like rate * N_active * (N_active-1). (At least, assuming uniform weights on the streamers.)

The weirdness arises when we initialize streamers on at a time: the weight distribution is not fully known until the first batch of active streamers are fully initialized, so the calculations for p above are generally going to be wrong. In the extreme case, the first active streamer will use the poisson approximation (since p=1 when there are no other streamers active yet); the second streamer will have bin((rate-1)/0.5, 0.5) (assuming uniform weights again), the third will have bin((rate-1)/0.666, 0.666), and so on. In all cases, the expected values will be the same, but the first few streamers will have higher rate variance than the later ones. Specifically, the variance sequence will look like (again, assuming uniform streamer weights) of (rate-1)/n (for n=1,2, ..., n_active).

Now, we could hack around this by pre-determining the weights so that everything is primed properly. However, I think it might actually be beneficial to leave it as is because it injects more randomness in the rate distributions early on, which ought to have a comparable effect to having random offsets in a burn-in phase as @ejhumphrey suggested in #132 .

It's a bit weird, but given the potentially dynamic nature of the active stream distribution (especially in exhaustive mode), it won't be possible to always ensure that the rate distribution for a streamer is "correct" over time. The best we can do is sample the rate value according to whatever the distribution will be at the time the streamer is activated.

@bmcfee
Copy link
Collaborator Author

bmcfee commented Feb 1, 2024

Having slept on it, i think a better solution here is to initialize the active set weights by a random draw from the weights array instead of with zeros. This won't have any effect on const or poisson, but it will put the binomial mode in a less quirky position at initialization time.

@bmcfee
Copy link
Collaborator Author

bmcfee commented Mar 8, 2024

@cjacoby 👋 I know it's been a gajillion years, but do you have any interest in looking this over? I think it's basically good to go, but it does have some kinda breaky behavior relative to older versions that I'd like to get another set of eyes on.

Quick TLDR is summarized in #148 (comment)

Copy link
Collaborator

@cjacoby cjacoby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than I think a small comment improvement would improve it (for me in 6mo to a year when I come back and can't remember what this is about).

pescador/mux.py Show resolved Hide resolved
@bmcfee
Copy link
Collaborator Author

bmcfee commented Mar 11, 2024

Ok, doc section is added and back-link is included. I tried to clean it up a bit from my original notebook (4 years ago!) and put in some expository text. Hopefully it makes sense?

@cjacoby cjacoby merged commit 19a3f37 into main Mar 11, 2024
11 of 12 checks passed
@cjacoby cjacoby deleted the distributions branch March 11, 2024 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants