Skip to content

Very homozygous diploid

Lucía Campos edited this page Mar 29, 2024 · 3 revisions

Models are not perfect and sometimes can wrongly converge under several circumstances. One of them is when there is only one prominent peak. The Garden Bumblebee (Bombus hortorum) sequenced by the Darwin Tree of Life project demonstrates nicely this very common problem of model convergence when fit to a very homozygous kmer spectra.

Let's do a default genomescope run on the bumblebee sample:

genomescope.R -i iyBomHort1.hist.txt -o bumble_default

It will generate the following model:

iyBomHort1 k31_linear_plot

1. What do you think is wrong with this model?

When there is very little heterozyosity, the model is not able to differentiate between the homozygous or heterozygous peaks. In this case, the genomescope model predicted 14.6% heterozygosity, which is an unlikely for any organism, especially given it is a hymenopteran genome. Second, this would be an extremely large bumblebee genome, unexpected genome size estimates are also a red flag. Whatever genome we analyse, it's always very useful to have expectations.

We can adjust the parameters to treat this main peak as the homozygous peak, rather than the heterozygous peak, and are able to see that this is in fact a highly inbred sample.

  1. Download this k-mer spectrum and fit a model.
Can you specify a coverage prior to the model so it converges right? Coverage prior is specified by the parameter `-l`.
genomescope.R -i iyBomHort1.hist.txt -l 40 -o bumble_l40

linear_plot

This is more like it, right?

What's next

Let's keep looking at other examples...

👆 Go back to Table of Content

👉 ⚒ Fit some highly heterozygous diploid genome models of notorious organisms.

👉 📖 Read about Characterization of polyploid genomes using k mer spectra analysis

Table of content

Introduction

k-mer spectra analysis

Separation of chromosomes

Species assignment using short k-mers

Others

Clone this wiki locally