-
Notifications
You must be signed in to change notification settings - Fork 9
Matching k‐mers to a reference
Here we want to find exact matches of k-mers in the genome assembly. We can tweak a fast popular mapper bwa mem
to do so (thanks to Daniel Standage for the tip) by setting up minimum seed length -k
(the initial match required to consider mapping) as well as the minimum score to output -T
to the value k
we used. We also specify -a
as we want all the matches (although we don't expect to see many of those). -c
just skips all seeds (k-mers) with more than hits than the value specified (5000 in the example)
bwa mem -k <k> -T <k> -a -c 5000 <assembly.fasta> <kmers.fasta>
The lates Glossina assembly is downloaded from NCBI here: /cluster/projects/nn9458k/oh_know/teachers/kamil/data/tse-tse/ncbi-genomes-2021-09-12
. The mapped bams can be summarised into a table of scaffolds with the number of k-mers from each category. To do that we will use another home-made script, bams2table.py:
python3 bams2table.py <A_kmers_mapped.bam> <X_kmers_mapped.bam> <Y_kmers_mapped.bam> <output_per_scf_table.tsv>
Finally, now <output_per_scf_table.tsv>
contains names of all scaffolds and number of chromosome specific k-mers. Just looking at the table... Q: did the assembly contain Y? Q: What is the longest clearly X-linked scaffold? The signal is usually clear, but hardly ever perfect. What causes that k-mers assigned to all three chromosomes map to so many scaffolds? Can we find any probably chimeric scaffold?
If there will be enough time, we can plot the properties of mapped k-mers and decide on scaffolds that can be clearly assigned to chromosomes and actually generate a table of assignments. Do you have an idea to get the Y chromosome too?
Introduction
k-mer spectra analysis
- 📖 Introduction to K-mer spectra analysis
- 📖 Basics of genome modeling
- ⚒ manual model fitting (for better understanding of the underlying model)
- ⚒ simple diploid
- ⚒ demonstrating the effect of sequencing error rate on k-mer coverage
- 📖 Common difficulties in characterisation of diploid genomes using k mer spectra analysis
- ⚒ low coverage (pitfall) - to be merged
- ⚒ very homozygous diploid
- ⚒ highly heterozygous diploid
- ⚒ Genome size of a repetitive genome (pitfall)
- ⚒ Wrong ploidy (pitfall)
- 📖 Characterization of polyploid genomes using k mer spectra analysis
- ⚒ Autotetraploid
- ⚒ Allotetraploid
- ⚒ Estimating ploidy (smudgeplot)
- 📖 Genome modeling as a quality control
- ⚒ Contamination (pitfall)
- ⚒ k-mers in an assembly (Mercury/KAT)
- 📖 Analysing genome skimming data
Separation of chromosomes
- 📖Separate sub-genomes of an allopolyploid
- 📖Separating chromosomes by comparison of sequencing libraries
Species assignment using short k-mers
Others