Skip to content
Kamil S. Jaron edited this page Feb 18, 2020 · 5 revisions

Parameters L and U are determining lower and upper thresholds for coverage of kmers that will be considered as genomic kmers. Some approximate estimates can be make with smudgeplot smudgeplot cutoff function, but there is nothing wrong in eyeballing it directly from kmer spectra (and very often it does give a better estimate).

L

as high as you can but safe not to cut off your haploid kmers.

U

perhaps less important than L, you might want to exclude super repetitive kmers (like mt DNA or kmers from centro/telomeres) from your analysis. These kmers have usually enormous coverage, so U can go up to several thousands without a bit problem.

I am actually considering removing this argument and explore if ultra-repetitive kmers would actually represent a problem (we thought that they might so we have kicked them out, but we actually never checked).

TODO add a couple of examples of kmer spectra with appropriate L and U