-
Notifications
You must be signed in to change notification settings - Fork 9
Very homozygous diploid
Models are not perfect and sometimes can wrongly converge under several circumstances. One of them is when there is only one prominent peak. The Garden Bumblebee (Bombus hortorum) sequenced by the Darwin Tree of Life project demonstrates nicely this very common problem of model convergence when fit to a very homozygous kmer spectra.
Let's do a default genomescope run on the bumblebee sample:
genomescope.R -i iyBomHort1.hist.txt -o bumble_default
It will generate the following model:
1. What do you think is wrong with this model?
When there is very little heterozyosity, the model is not able to differentiate between the homozygous or heterozygous peaks. In this case, the genomescope model predicted 14.6% heterozygosity, which is an unlikely for any organism, especially given it is a hymenopteran genome. Second, this would be an extremely large bumblebee genome, unexpected genome size estimates are also a red flag. Whatever genome we analyse, it's always very useful to have expectations.
We can adjust the parameters to treat this main peak as the homozygous peak, rather than the heterozygous peak, and are able to see that this is in fact a highly inbred sample.
- Download this k-mer spectrum and fit a model.
Can you specify a coverage prior to the model so it converges right? Coverage prior is specified by the parameter `-l`.
genomescope.R -i iyBomHort1.hist.txt -l 40 -o bumble_l40
This is more like it, right?
Let's keep looking at other examples...
👆 Go back to Table of Content
👉 ⚒ Fit some highly heterozygous diploid genome models of notorious organisms.
👉 📖 Read about Characterization of polyploid genomes using k mer spectra analysis
Introduction
k-mer spectra analysis
- 📖 Introduction to K-mer spectra analysis
- 📖 Basics of genome modeling
- ⚒ manual model fitting (for better understanding of the underlying model)
- ⚒ simple diploid
- ⚒ demonstrating the effect of sequencing error rate on k-mer coverage
- 📖 Common difficulties in characterisation of diploid genomes using k mer spectra analysis
- ⚒ low coverage (pitfall) - to be merged
- ⚒ very homozygous diploid
- ⚒ highly heterozygous diploid
- ⚒ Genome size of a repetitive genome (pitfall)
- ⚒ Wrong ploidy (pitfall)
- 📖 Characterization of polyploid genomes using k mer spectra analysis
- ⚒ Autotetraploid
- ⚒ Allotetraploid
- ⚒ Estimating ploidy (smudgeplot)
- 📖 Genome modeling as a quality control
- ⚒ Contamination (pitfall)
- ⚒ k-mers in an assembly (Mercury/KAT)
- 📖 Analysing genome skimming data
Separation of chromosomes
- 📖Separate sub-genomes of an allopolyploid
- 📖Separating chromosomes by comparison of sequencing libraries
Species assignment using short k-mers
Others