You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Heng!
We experimented with Drosophila melanogaster genome and found several strange things in minimap2 behaviour (Version: 2.8-r686-dirty). I've attached two our files for reproducing the issues (full chrX extracted from the reference genome and its fragment from our simulated assembly which should perfectly map to it). We also used the full reference Drosophila_melanogaster.BDGP6.dna.toplevel.fa which could be downloaded e.g. here.
The issues are:
./minimap2 Drosophila_melanogaster.chrX.fa chrX_fragment.fa gives empty output while we expect to see several perfect mappings. Note that ./minimap2 chrX_fragment.fa Drosophila_melanogaster.chrX.fa results in many alignments.
./minimap2 Drosophila_melanogaster.BDGP6.dna.toplevel.fa chrX_fragment.fa produces expected matches and all of them are from chrX, so (1) looks strange even without taking into account the issue with target/query order.
./minimap2 -x asm10 Drosophila_melanogaster.BDGP6.dna.toplevel.fa chrX_fragment.fa results in no alignments again. Note that all alignments from (2) are perfect, so should not be excluded because of -x asm10 in theory.
This is all because chrX_fragment.fa has 65 copies on chrX.
If you look at the minimap2 log, you will see this line: mid_occ = 37. This means minimizers occurring 37 times or more will be ignored. This fragment has 65 copies, more than the threshold.
When you map to the whole genome, the threshold is changed to 115, probably because other regions have more copies. At 115, the fragment become mappable.
-x asm10 uses k-mer of different lengths. The threshold will change.
You can choose a fixed threshold with -f 200. You will see consistent results from these three runs.
I wish minimap2 could output a random mapping even if a query has thousands of copies. However, I could not find an efficient way to achieve that. This is one of the things suffix array/BWT based algorithms are better at.
Hi, Heng!
We experimented with Drosophila melanogaster genome and found several strange things in minimap2 behaviour (Version: 2.8-r686-dirty). I've attached two our files for reproducing the issues (full chrX extracted from the reference genome and its fragment from our simulated assembly which should perfectly map to it). We also used the full reference Drosophila_melanogaster.BDGP6.dna.toplevel.fa which could be downloaded e.g. here.
The issues are:
./minimap2 Drosophila_melanogaster.chrX.fa chrX_fragment.fa
gives empty output while we expect to see several perfect mappings. Note that./minimap2 chrX_fragment.fa Drosophila_melanogaster.chrX.fa
results in many alignments../minimap2 Drosophila_melanogaster.BDGP6.dna.toplevel.fa chrX_fragment.fa
produces expected matches and all of them are from chrX, so (1) looks strange even without taking into account the issue with target/query order../minimap2 -x asm10 Drosophila_melanogaster.BDGP6.dna.toplevel.fa chrX_fragment.fa
results in no alignments again. Note that all alignments from (2) are perfect, so should not be excluded because of-x asm10
in theory.Drosophila_melanogaster.chrX.fa.gz
chrX_fragment.fa.gz
The text was updated successfully, but these errors were encountered: