Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimap2 sometimes misses good base-level alignments #659

Closed
bhandsaker opened this issue Aug 21, 2020 · 7 comments
Closed

minimap2 sometimes misses good base-level alignments #659

bhandsaker opened this issue Aug 21, 2020 · 7 comments

Comments

@bhandsaker
Copy link

bhandsaker commented Aug 21, 2020

I have run into more than one example where minimap2 fails to generate base-level alignments for a query sequence even though good alignments appear to exist.

One example is when I try to align the human hg38 alt contig chr17_GL000258v2_alt to the non-alt chr17. When I generate backbone alignments like this:

minimap2 -x asm10 chr17.fasta.gz chr17_GL000258v2_alt.fasta.gz

I get (in part) the following alignments (sorted by query start), which pretty much tile the query sequence in this region:

chr17_GL000258v2_alt    1821992 494456  534299  -       chr17   83257441        46252401        46292148
chr17_GL000258v2_alt    1821992 534329  566383  +       chr17   83257441        45592735        45625212
chr17_GL000258v2_alt    1821992 549319  1162636 -       chr17   83257441        45625284        46237516

However when I do base-level alignment

minimap2 -c -x asm10 chr17.fasta.gz chr17_GL000258v2_alt.fasta.gz

Then the middle aligment from above largely disappears and the resulting alignments leave a ~10kb gap of unaligned query sequence:

chr17_GL000258v2_alt    1821992 494422  534340  -       chr17   83257441        46252360        46292182
chr17_GL000258v2_alt    1821992 545039  1162656 -       chr17   83257441        45625264        46242506

The query gap segment chr17_GL000258v2_alt:534341-545038 has a good alignment to chr17:45592747-45603440:+, as suggested by the original backbone alignment.
In addition, most of the unaligned query region, specifically segment chr17_GL000258v2_alt:535164-545038 has a good alignment to the target "gap" at chr17:46242508-46252364:-.
I've attached a muscle file showing both of these alignments.
align_segments.txt
align_muscle.clw.txt

I have tried varying the presets (asm5, asm10 and asm20 produce the same results) and I have tried turning up -z (up to -z 10000) as suggested here #158 but neither of these seem to change the behavior.

I have two questions:

  1. I am curious why minimap2 fails to emit one or both of these "manual" alignments for this 10kb segment.
  2. Are there some parameters I can change to coax minimap2 to generate some alignment for this query segment rather failing to align it?

Thanks very much for any insights you can offer.
I am happy to provide the fasta files if that would be helpful.

@lh3
Copy link
Owner

lh3 commented Aug 21, 2020

You may try setting mask level to 0 with -M0 and see what happens. The logic is mostly implemented in function mm_set_parent(), in particular this line:

minimap2/hit.c

Line 165 in c9874e2

if ((float)ol / min - (float)uncov_len / max > mask_level) {

After base alignment, the unaligned length (uncov_len) becomes shorter. The cross_match mask-level heuristics kicks in and filters it out. That is my guess.

PS: on that line, ol is the overlapping length. min is the shorter alignment length between two overlapping alignments and max is the longer alignment length.

@bhandsaker
Copy link
Author

Thank you, Heng. Using -M0 does in fact change the behavior.

minimap2 -c -x asm10 -M0 chr17.fasta.gz chr17_GL000258v2_alt.fasta.gz

yields

chr17_GL000258v2_alt 1821992 494422 553606 - chr17 83257441 46233262 46292182 chr17_GL000258v2_alt 1821992 549319 1162656 - chr17 83257441 45625264 46237516

@lh3
Copy link
Owner

lh3 commented Aug 21, 2020

In practice, -M0 is not recommended, but you can use a smaller value like -M 0.2 to reduce the effect of mask level. You will get more overlapping alignments but will miss few. The default of -M is 0.5.

@bhandsaker
Copy link
Author

If I'm understanding -M correctly, I think what I would ideally like is to be able to set -M in absolute terms to not miss alignments of less than approximately length L, rather than as a fraction of the alignment length. The surprising behavior, which bit me more than once, is that the absence of any reported alignment for a longish query segment does not necessarily imply that no good alignment exists for that segment.

But knowing the behavior and the cause and how to control it is very helpful.

lh3 added a commit that referenced this issue Aug 21, 2020
@lh3
Copy link
Owner

lh3 commented Aug 21, 2020

I have added a new option --mask-len to the mask-len branch. In your case, you can add --mask-len=5k -M0.2. An alignment is kept if dropping it would leave a uncovered region on query longer than this value. This option defaults to INT_MAX for the compatibility with the existing behavior. I wonder if you can help to test the feature. I don't have right data. Thanks.

@bhandsaker
Copy link
Author

I built and tested the branch and it seems to work fine. I tried it on another couple of examples also. It seems to produce a bit more stable behavior with respect to not dropping short but good alignments.

@lh3
Copy link
Owner

lh3 commented Aug 22, 2020

Thanks. I have merged that branch to the master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants