Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a coalescent time distribution #2474

Merged
merged 17 commits into from
May 14, 2020
Merged

Implement a coalescent time distribution #2474

merged 17 commits into from
May 14, 2020

Conversation

fritzo
Copy link
Member

@fritzo fritzo commented May 10, 2020

Addresses #2426
Blocking #2468

This adds two new distributions CoalescentTimes and CoalescentTimesWithRate and a class CoalescentRateLikelihood implementing a phylogenetic likelihood

p(coalescent times | sample times, base rate)

by splitting a phylogeny into fixed sample times (i.e. EPI nodes, leaves) and unknown coalescent times (i.e. internal NODE nodes). For ...WithRates, the base rate is specified as a vector representing a piecewise constant base coalescent rate on a uniform grid (e.g. computed from S and I time series in SIR models).

The design aims to make it cheap to enumerate over multiple phylogenetic samples (e.g. from BEAST). Two crucial observations are: (1) among phylogeny samples, the leaves are fixed, hence the total number of nodes is fixed; and (2) we can avoid summing over intervals by precomputing rate.cumsum(-1) and using this for all samples. Exploiting these facts leads to an O(T + S N log(N)) algorithm where T is the number of simulation time steps, N is the number of leaves in the phylogeny, and S is the number of sampled phylogenies. This PR does not implement batching over different phylogenetic problems (e.g. one phylogeny per region in regional models); that would require more careful masking and padding and will be left to a possible future PR.

These are low level distribution and do not implement preprocessing to transform phylogenetic trees into (coal_times, leaf_times) pairs. This PR is intended as a low-level helper for use in #2468 and an example using pyro.contrib.epidemiology.

Tested

  • shape tests
  • compare against uniform rate
  • added smoke tests for use in an SEIR model
  • visually sanity check samples in a notebook

@fritzo fritzo added the WIP label May 10, 2020
@fritzo fritzo requested a review from eb8680 May 12, 2020 17:20
@fritzo fritzo added the WIP label May 13, 2020
@fritzo
Copy link
Member Author

fritzo commented May 13, 2020

@eb8680 FYI I am adding an interface that plays better with .transition_bwd() as we discussed.

@fritzo
Copy link
Member Author

fritzo commented May 14, 2020

I've added a CoalescentRateLikelihood class compatible with both vectorized and sequential operation, and copied more of @eb8680's #2468 into this PR to exercise the likelihood in an SEIR model. However this PR still deals entirely with (leaf_times, coal_times).

I think it would be cleanest to first merge this PR, then update #2468 to use CoalescentRateLikelihood and update the example script there to parse a .nwk phylogeny and extract to (leaf_times, coal_times) either using @eb8680's r2py code or the Bio.Phylo python library.

Copy link
Member

@eb8680 eb8680 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM per our review over zoom, is there anything else you want me to look at?

R = R0 * prev["S"] / self.population
coal_rate = R * (1. + 1. / k) / (tau_i * prev["I"] + 1e-8)
pyro.factor("coalescent_{}".format(t),
self.coal_likelihood(coal_rate, t))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good interface

return log_prob


class CoalescentRateLikelihood:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a nn.Module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment there is no reason to make this an nn.Module. It is a little more similar to a Distribution, but doesn't quite follow the distribution interface (indeed I've given it a validate_args kwarg like a distribution). Moreover in typical usage the parameters are all fixed rather than learned.

BTW this is a very natural object from the perspective of Funsor. Consider a funsor

leaf_times = Tensor(...)
coal_times = Tensor(...)
rate_grid = Variable("rate_grid", reals(duration))
f = CoalescentTimesWithRate(leaf_times, rate_grid).log_prob(coal_times)

Then this likelihood represents roughly the funsor

f(rate_grid=rate_grid["t"])

I look forward to the bright future when we won't need special classes like this 🚀

@fritzo
Copy link
Member Author

fritzo commented May 14, 2020

Thanks for reviewing!

@fritzo fritzo removed the request for review from martinjankowiak May 14, 2020 22:20
@eb8680 eb8680 merged commit d293cba into dev May 14, 2020
@fritzo fritzo deleted the coalescent branch June 5, 2020 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants