-
-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StreamingMCMC class #2857
StreamingMCMC class #2857
Conversation
Looks great so far!
We do care about backwards compatibility. I like the idea of a new method either
Hmm... one option is to use the new # in a test file:
mcmc = StreamingMCMC(..., statistics=StatsOfDict(default=StackStats)) and use this in |
I'll need to think more about computing |
Thanks for Draft test with True:
{ (0, 'y'): {'count': 2000, 'mean': tensor([-0.0008]), 'variance': tensor([0.9980])},
(1, 'y'): {'count': 2000, 'mean': tensor([-0.0336]), 'variance': tensor([0.9671])} }
False:
{'y': {'count': 4000, 'mean': tensor([-0.0172]), 'variance': tensor([0.9826])}} Right now I'm moving to implement tests. When it comes to the refactor of the old I can either:
WDYT? |
Hmm... it looks like we could maybe refactor |
@fritzo Hi! After a short break I continued this PR. In the latest commit I did a rebase to use merged @pytest.mark.parametrize("run_mcmc_cls", [run_default_mcmc, run_streaming_mcmc]) But only for these tests that don't require When it comes to the implementation we discussed to what degree default It resulted in some forced workarounds, like in tests' parametrization - to make output of Profiling exampleHere's a small memory profiling (via memory_profiler) that I did on modified Gist containing modified script for reproducibility (commands are at the bottom): And the results (former is Questions
https://mc-stan.org/docs/2_18/reference-manual/effective-sample-size-section.html It looks pretty easy (apply every Can I add it to this PR?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation and tests look great, and thanks for adding plots to verify memory overhead 👍 I think you'll need to merge dev since we recently changed from travis-ci to github actions forCI. Answering your specific questions:
- remaining tests ... require
rhat
andess
Yeah, it will take some effort to implement those; each is pretty complex and probably worth a separate PR. In the meantime I think it's fine to xfail in the tests.
- numpyro ... implements "Thinning"
Hmm I guess thinning could help the original MCMC
class (or simply setting max_tree_depth
to a large value). However our streaming statistics are already constant-memory, so I don't see how thinning would help (well it would reduce computational complexity but also increase statistical error of e.g. mean and variance). Maybe one way to implement thinning would be to add a thinned StreamingStats
subclass, possibly
- adding an optional
thinning
argument toStackStats
, defaulting to 1; - creating a new
ThinnedStackStats
; or - creating a constant-memory
ReservoirStackStats
that implements reservoir sampling.
WDYT?
- Also some time ago I created a PR
Sorry I lost track of that, review sent!
@mtsokol is there anything else you'd like to add, or is this ready to merge? As mentioned above, I believe thinning would better fit into pyro.ops.streaming than in |
@fritzo I think now it's ready to be merged. What I just pushed is a minor docs fix (checked if html is correctly generated). Now is ready. |
Hi @fritzo!
Here's a workspace PR for
StreamingMCMC
, currently only with initial draft (I will rebase allpyro.ops.streaming
changes).I decided to introduce
AbstractMCMC
to extract a few lines. Initial test forStreamingMCMC
only prints correct statistics so it runs.Right now I'm wondering how to unify those two classes (their methods):
If we care about backward compatibility then I think
summary
inStreamingMCMC
could be changed to aget_statistics
. If not thensummary
can be unified to be pure and defined as abstract inAbstractMCMC
and implemented by these classes.Also right now I'm thinking how to rewrite test suite as current implementation is based on
get_samples
for each test (that's another argument to makesummary
return statistics instead of printing).WDYT?