Skip to content

Commit

Permalink
Add an option to set recombination rate in demographic model
Browse files Browse the repository at this point in the history
  • Loading branch information
gregorgorjanc committed Nov 9, 2024
1 parent cb7c7f2 commit e4a6bdf
Show file tree
Hide file tree
Showing 13 changed files with 196 additions and 61 deletions.
21 changes: 16 additions & 5 deletions docs/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1023,7 +1023,7 @@ Misspecification of the model can generate unrealistic patterns of genetic
variation that will affect downstream analyses.
So, having at least one detailed demographic model is recommended for every species.
A given species might have more than one demographic model,
fit from different data or by different methods.
fit from different data or by different methods or diffrent assumptions/parameters.

-----------------------------------
What models are appropriate to add?
Expand All @@ -1037,7 +1037,7 @@ such as population splits and changes in the amount of gene flow between populat
The values of different parameters should be specified in units of "number of individuals"
(for population sizes) and generations (for times).
Sometimes, you will need to convert values published in the literature
to these units by making some assumptions on the mutation rate;
to these units by making some assumptions on the mutation rate (sometimes even recombination rate);
typically the same assumptions made by the study that published the demographic model.


Expand Down Expand Up @@ -1110,15 +1110,16 @@ We provide below a template block of code for these two operations:
citations=...,
generation_time=...,
mutation_rate=...,
recombination_rate=...,
population_configurations=...,
migration_matrix=...,
demographic_events=...,
)
_species.add_demographic_model(_model_func_name())
A demographic model is thus defined using ten different attributes.
The first seven attributes are quite straightforward:
A demographic model is thus defined using up to eleven different attributes.
The first eight attributes are quite straightforward:

* ``id`` (`string`): A unique, short-hand identifier for this demographic model.
This id contains a short description written in camel case,
Expand Down Expand Up @@ -1164,6 +1165,16 @@ The first seven attributes are quite straightforward:
However, note that this is quite uncommon, so you should make sure this is the case
before you set the mutation rate to ``None``.

* ``recombination_rate`` (`double`): The recombination rate assumed during the inference
of this demographic model (per bp per generation).
While demographic model inference might make less assumptions about recombination rates
than mutation rates, we provide this option for completness.
Namely, a demographic model might have been inferred under the assumption of a specific
recombination rate, which does not match with the species' recombination rate implemented
in ``stdpopsim``.
Also, some demographic models were inferred under the assumptions of a specifc ratio of
mutation to recombination rates.

The final three attributes
(``population_configurations``, ``migration_matrix``, and ``demographic_events``)
describe the inferred demographic history that you wish to code.
Expand All @@ -1182,7 +1193,7 @@ then we highly recommend that you take some time to read through its
parameter of interest.
In your coded model, you should use some reasonable point estimate,
such as the value associated with the the maximum likelihood fit,
or the mean posterior (for Bayesian methods).
or the mean of posterior distribution for Bayesian methods.

------------------------------------
Adding a parameter table to the docs
Expand Down
1 change: 1 addition & 0 deletions docs/parameter_tables/BosTau/HolsteinFriesian_1M13.csv
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ Time (gen.),"6",Begining of 11th time interval
Time (gen.),"3",Begining of 12th time interval
Generation time (yrs.),5,Generation time
Mutation rate,9.4e-9,Per-base per-generation mutation rate
Recombination rate,1.0e-0,Per-base per-generation recombination rate
29 changes: 16 additions & 13 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,13 +140,14 @@ Note that the number of samples from each population are simply specified as
Omitted populations will have no samples in the resulting tree sequence.

.. note::
Many demographic models were inferred or calibrated using a mutation rate that
differs from the cataloged species' mutation rate. Simulations using the CLI now
automatically use the *model's* specified mutation rate instead of the species
rate, so that expected levels of diversity more closely match those observed in
the data that were used to infer the demographic model. For generic demographic
models or those without associated mutation rates, the species mutation rate is
used.
Many demographic models were inferred or calibrated using a mutation rate
or recombination rate that differs from the cataloged species' rate.
Simulations using the CLI now automatically use the *model's* specified
mutation rate or recombination rate instead of the species rate,
so that expected levels of diversity more closely match those observed in
the data that were used to infer the demographic model.
For generic demographic models or those without associated mutation or
recombination rates, the species rates are used.

Now we want to add an empirical genetic map to make the simulation more
realistic. We can look up the available genetic maps using the
Expand Down Expand Up @@ -518,7 +519,7 @@ uniform recombination rate set to the average recombination rate for that chromo
# default is a uniform genetic map
print("mean recombination rate:", f"{contig.recombination_map.mean_rate:.3}")
# mean recombination rate: 1.44e-08
# mean recombination rate: 2.11e-08
# and the default mutation rate is based on the species default
print("mean mutation rate:", contig.mutation_rate)
Expand All @@ -529,18 +530,20 @@ uniform recombination rate set to the average recombination rate for that chromo
# model mutation rate: 2.35e-08
The Gutenkunst OOA model was inferred using a mutation rate much larger than the
default mutation rate in the ``stdpopsim`` catalog. As such, simulating using this
model and default rate will result in levels of diversity substantially lower than
expected for the human population data that this model was inferred from. To match
observed diversity in humans, we should instead use the mutation rate associated
with the demographic model:
default mutation rate for human species in the ``stdpopsim`` catalog. As such,
simulating using this model and default rate will result in levels of diversity
substantially lower than expected for the human population data that this model
was inferred from. To match observed diversity in humans, we should instead use
the mutation rate associated with the demographic model:

.. code-block:: python
contig = species.get_contig("chr22", mutation_rate=model.mutation_rate)
print(contig.mutation_rate == model.mutation_rate)
# True
Similar functionality exists for the recombination rate, if a model specifies one.

Choose a sampling scheme and simulate
-------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion stdpopsim/catalog/BosTau/demographic_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ def _HolsteinFriesian_1M13():
populations=populations,
citations=citations,
generation_time=_species.generation_time,
recombination_rate=1e-8,
mutation_rate=0.94e-8,
recombination_rate=1e-8,
population_configurations=[
# 3 generations at 90, [0- 3)
msprime.PopulationConfiguration(
Expand Down
24 changes: 22 additions & 2 deletions stdpopsim/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,11 @@ def write_citations(engine, model, contig, species, dfe):
and model.mutation_rate is not None
):
continue
if (
stdpopsim.citations.CiteReason.REC_RATE in citation.reasons
and model.recombination_rate is not None
):
continue

Check warning on line 416 in stdpopsim/cli.py

View check run for this annotation

Codecov / codecov/patch

stdpopsim/cli.py#L416

Added line #L416 was not covered by tests
cite_str.append("\n")
cite_str.append(citation.displaystr())
logger.warning("".join(cite_str))
Expand Down Expand Up @@ -754,11 +759,13 @@ def run_simulation(args):
contig = species.get_contig(
args.chromosome,
genetic_map=args.genetic_map,
# TODO: what if we would have model.genetic_map?
length_multiplier=args.length_multiplier,
length=args.length,
inclusion_mask=args.inclusion_mask,
exclusion_mask=args.exclusion_mask,
mutation_rate=model.mutation_rate,
recombination_rate=model.recombination_rate,
left=args.left,
right=args.right,
)
Expand Down Expand Up @@ -905,7 +912,20 @@ def write_simulation_summary(
dry_run_text += f"{sample_counts[p]} ({sample_time})\n"
# Get information about relevant contig
gmap = "None" if contig.genetic_map is None else contig.genetic_map.id
mean_recomb_rate = contig.recombination_map.mean_rate
# use the model recombination rate, if provided with the demographic model
# TODO: what if we would have model.genetic_map?
if model.recombination_rate is not None:
mean_recomb_rate = model.recombination_rate
logging.info(

Check warning on line 919 in stdpopsim/cli.py

View check run for this annotation

Codecov / codecov/patch

stdpopsim/cli.py#L918-L919

Added lines #L918 - L919 were not covered by tests
"using recombination rate from demographic model "
f"({model.recombination_rate})"
)
else:
mean_recomb_rate = contig.recombination_map.mean_rate
logging.info(
"using recombination rate from species contig "
f"({contig.recombination_map.mean_rate})"
)
# use the model mutation rate, if provided with the demographic model
if model.mutation_rate is not None:
mut_rate = model.mutation_rate
Expand All @@ -924,8 +944,8 @@ def write_simulation_summary(
dry_run_text += f"{indent}Contig origin: {contig_origin}\n"
dry_run_text += f"{indent}Contig length: {contig_len}\n"
dry_run_text += f"{indent}Contig ploidy: {contig_ploidy}\n"
dry_run_text += f"{indent}Mean recombination rate: {mean_recomb_rate}\n"
dry_run_text += f"{indent}Mean mutation rate: {mut_rate}\n"
dry_run_text += f"{indent}Mean recombination rate: {mean_recomb_rate}\n"
dry_run_text += f"{indent}Genetic map: {gmap}\n"
if contig.bacterial_recombination is True:
dry_run_text += (
Expand Down
28 changes: 23 additions & 5 deletions stdpopsim/engines.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,28 @@ def _warn_mutation_rate_mismatch(self, contig, demographic_model):
demographic_model.mutation_rate, contig.mutation_rate
):
warnings.warn(
"The demographic model has mutation rate "
"The demographic model has mean mutation rate "
f"{demographic_model.mutation_rate}, but this simulation used the "
f"contig's mutation rate {contig.mutation_rate}. Diversity levels "
"may be different than expected for this species. For details see "
"documentation at "
"contig's mean mutation rate "
f"{contig.mutation_rate}. "
"Diversity levels may be different than expected for this species. "
"For details see documentation at "
"https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html"
)

def _warn_recombination_rate_mismatch(self, contig, demographic_model):
# TODO: what if we would have demographic_model.genetic_map and/or
# demographic_model.recombination_rate?
if demographic_model.recombination_rate is not None and not math.isclose(
demographic_model.recombination_rate, contig.recombination_map.mean_rate
):
warnings.warn(
"The demographic model has mean recombination rate "
f"{demographic_model.recombination_rate}, but this simulation used the "
"contig's mean recombination rate "
f"{contig.recombination_map.mean_rate}. "
"Diversity levels may be different than expected for this species. "
"For details see documentation at "
"https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html"
)

Expand Down Expand Up @@ -259,6 +276,7 @@ def simulate(
# TODO: remove this after a release or two. See #745.
self._warn_zigzag(demographic_model)
self._warn_mutation_rate_mismatch(contig, demographic_model)
self._warn_recombination_rate_mismatch(contig, demographic_model)

rng = np.random.default_rng(seed)
seeds = rng.integers(1, 2**31 - 1, size=2)
Expand All @@ -276,7 +294,7 @@ def simulate(
)
elif (gc_frac is not None) and (gc_frac > 0.0):
# the recombination map is a map of double-stranded breaks,
# so we need to separate out GC from crossovers
# so we need to separate out gene conversion from crossing-over
gc_rate = contig.recombination_map.mean_rate * gc_frac / (1 - gc_frac)
recombination_map = msprime.RateMap(
position=recombination_map.position,
Expand Down
32 changes: 20 additions & 12 deletions stdpopsim/genomes.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

def _uniform_rate_map(left, right, length, rate):
"""
Makes a RateMap that is
Makes a uniform RateMap
"""
pos = [
0.0,
Expand Down Expand Up @@ -392,6 +392,7 @@ def species_contig(
length_multiplier=1,
length=None,
mutation_rate=None,
recombination_rate=None,
use_species_gene_conversion=False,
inclusion_mask=None,
exclusion_mask=None,
Expand Down Expand Up @@ -457,7 +458,11 @@ def species_contig(
)
if mutation_rate is None:
mutation_rate = u_tot / L_tot
r = r_tot / L_tot
if recombination_rate is None:
r = r_tot / L_tot
else:
r = recombination_rate
r_tot = r * L_tot
if use_species_gene_conversion is True:
if gcf_tot > 0:
gene_conversion_fraction = gcf_tot / r_tot
Expand Down Expand Up @@ -509,25 +514,28 @@ def species_contig(

if genetic_map is None:
gm = None
if recombination_rate is None:
r = chrom.recombination_rate
else:
r = recombination_rate
if length_multiplier != 1:
logger.debug(
f"Making flat chromosome {length_multiplier} * {chrom.id}"
)
recomb_map = msprime.RateMap.uniform(
round(length_multiplier * chrom.length),
chrom.recombination_rate,
round(length_multiplier * chrom.length), r
)
else:
logger.debug(
f"Making flat contig from {left} to {right} for {chrom.id}"
)
recomb_map = _uniform_rate_map(
left, right, chrom.length, chrom.recombination_rate
)
recomb_map = _uniform_rate_map(left, right, chrom.length, r)
else:
if length_multiplier != 1:
raise ValueError("Cannot use length multiplier with empirical maps")
logger.debug(f"Getting map for {chrom.id} from {genetic_map}")
raise ValueError("Cannot use length multiplier with genetic map")
if recombination_rate is not None:
raise ValueError("Cannot use recombination rate with genetic map")

Check warning on line 537 in stdpopsim/genomes.py

View check run for this annotation

Codecov / codecov/patch

stdpopsim/genomes.py#L537

Added line #L537 was not covered by tests
logger.debug(f"Getting genetic map for {chrom.id} from {genetic_map}")
gm = species.get_genetic_map(genetic_map)
recomb_map = gm.get_chromosome_map(chrom.id)
recomb_map = recomb_map.slice(left=left, right=right)
Expand Down Expand Up @@ -823,15 +831,15 @@ def mutation_types(self):
def __str__(self):
gmap = "None" if self.genetic_map is None else self.genetic_map.id
s = (
"Contig(length={:.2G}, recombination_rate={:.2G}, "
"mutation_rate={:.2G}, bacterial_recombination={}, "
"Contig(length={:.2G}, mutation_rate={:.2G}, "
"recombination_rate={:.2G}, bacterial_recombination={}, "
"gene_conversion_fraction={}, gene_conversion_length={}, "
"genetic_map={}, origin={}, dfe_list={}, "
"interval_list={})"
).format(
self.length,
self.recombination_map.mean_rate,
self.mutation_rate,
self.recombination_map.mean_rate,
self.bacterial_recombination,
self.gene_conversion_fraction,
self.gene_conversion_length,
Expand Down
20 changes: 14 additions & 6 deletions stdpopsim/models.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
Common infrastructure for specifying demographic models.
"""

import copy
import textwrap

Expand Down Expand Up @@ -74,6 +75,10 @@ class DemographicModel:
:ivar mutation_rate: The mutation rate associated with the demographic model.
If no mutation rate is given, the species' default mutation rate is used.
:vartype mutation_rate: float
:ivar recombination_rate: The recombination rate associated with the demographic
model. If no recombination rate is given, the species' default recombination
rate is used.
:vartype recombination_rate: float
"""

def __init__(
Expand All @@ -92,12 +97,14 @@ def __init__(
populations=None,
model=None,
mutation_rate=None,
recombination_rate=None,
):
self.id = id
self.description = description
self.long_description = long_description
self.generation_time = 1 if generation_time is None else generation_time
self.mutation_rate = mutation_rate
self.recombination_rate = recombination_rate
self.citations = [] if citations is None else citations
self.qc_model = qc_model

Expand Down Expand Up @@ -148,12 +155,13 @@ def __str__(self):
long_desc = "\n║ ".join(long_desc_lines)
s = (
"Demographic model:\n"
f"║ id = {self.id}\n"
f"║ description = {self.description}\n"
f"║ long_description = {long_desc}\n"
f"║ generation_time = {self.generation_time}\n"
f"║ mutation_rate = {self.mutation_rate}\n"
f"║ citations = {[cite.doi for cite in self.citations]}\n"
f"║ id = {self.id}\n"
f"║ description = {self.description}\n"
f"║ long_description = {long_desc}\n"
f"║ generation_time = {self.generation_time}\n"
f"║ mutation_rate = {self.mutation_rate}\n"
f"║ recombination_rate = {self.recombination_rate}\n"
f"║ citations = {[cite.doi for cite in self.citations]}\n"
f"║{self.model}"
)
return s
Expand Down
Loading

0 comments on commit e4a6bdf

Please sign in to comment.