Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question Regarding Effective Population Size #36

Open
SriRaj34 opened this issue Dec 5, 2022 · 2 comments
Open

Question Regarding Effective Population Size #36

SriRaj34 opened this issue Dec 5, 2022 · 2 comments

Comments

@SriRaj34
Copy link

SriRaj34 commented Dec 5, 2022

Hello,

I hope all is well! We have been very interested in using phylonco, specifically the binary substitution model and error model. We have first simulated data with a known mutation rate and population size using CellCoal, and are interested in applying the model on the simulated data.

We simulated data according to exponential growth, and are using the following priors:

  1. Strict molecular Clock (with the mutation rate we simulated the data with, 1E-05)
  2. Coalescent Exponential Growth Model

When we run the binary substitution model and error model with these priors, we end up with a final lambda value of 82. In our simulated data, we gave all possible sites(including invariable sites) and did not perform any ascertainment bias correction (with constantSiteWeights). Our effective population size is around 5000, which after correcting for diploid (meaning this is 2x), would be an effective cell population size of 2500.

We have attached our test.xml file and log file for your reference, and would be grateful for any insight you have. Specifically, is there a prior we should be using for our lambda value that makes more sense? Thank you in advance!

Sri and Tamara

files_test.zip

@alexeid
Copy link
Collaborator

alexeid commented Dec 5, 2022

Thanks for your interest in our package. What is the true model for your simulation? Is your true substitution model binary, with a lambda parameter, or something else? What was the true growth rate and final population size? I assume that in your simulated data there was no sequencing/amplification error or allelic dropout?

@alexeid
Copy link
Collaborator

alexeid commented Dec 5, 2022

Just looking at your data, the vast majority are zeros, so it will be the case that lambda must be very large to obtain such an equilibrium distribution. If a condition of the inference was starting at all zeros at the root then you could get a very different result. However standard phylogenetic likelihood calculations integrate over every possible sequence at the root and assume that the root sequence is at equilibrium for the given continuous-time Markov process (CTMC).

If you started your simulation at all zeros at the root, but then employed a time-reversible CTMC, then this simulation would not be consistent with the standard model assumptions for phylogenetic likelihood calculations. So the lambda will not necessarily match the true value in such a case. This reflects a form of model misspecification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants