Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimap2 does not respect large -K option? #613

Closed
mrvollger opened this issue May 22, 2020 · 3 comments
Closed

minimap2 does not respect large -K option? #613

mrvollger opened this issue May 22, 2020 · 3 comments

Comments

@mrvollger
Copy link

Hi!

I have been mapping some really large contigs and for this I find it useful to adjust -K so I can map more than one of them at a time. However, I find that when I increase -K beyond 2147m it only loads/maps one sequence at a time. Below I have a minimal example that shows this:

Make test data:

$ printf ">test\nAAAAAAGGGGGGGGGGGGGGGGCCCCCCCCCCCCTTTTTTTTTTTTT\n" > test_ref.fasta && cat  test_ref.fasta test_ref.fasta test_ref.fasta test_ref.fasta  > test_reads.fasta
$ minimap2 --version
2.17-r941

Maps all 4 sequences at once with -K 2147m

$ minimap2 -t 128 -K 2147m test_ref.fasta test_reads.fasta  > /dev/null
[M::mm_idx_gen::0.002*3.23] collected minimizers
[M::mm_idx_gen::0.018*13.35] sorted minimizers
[M::main::0.018*13.32] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.018*13.24] mid_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.018*13.17] distinct minimizers: 4 (75.00% are singletons); average occurrences: 1.250; average spacing: 9.400
[M::worker_pipeline::0.029*8.72] mapped 4 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -t 128 -K 2147m test_ref.fasta test_reads.fasta
[M::main] Real time: 0.030 sec; CPU: 0.255 sec; Peak RSS: 0.005 GB

Maps sequences one at a time with -K 2148m

$ minimap2 -t 128 -K 2148m test_ref.fasta test_reads.fasta  > /dev/null
[M::mm_idx_gen::0.002*3.35] collected minimizers
[M::mm_idx_gen::0.016*15.18] sorted minimizers
[M::main::0.016*15.14] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.016*15.05] mid_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.017*14.86] distinct minimizers: 4 (75.00% are singletons); average occurrences: 1.250; average spacing: 9.400
[M::worker_pipeline::0.028*9.42] mapped 1 sequences
[M::worker_pipeline::0.039*7.30] mapped 1 sequences
[M::worker_pipeline::0.048*6.20] mapped 1 sequences
[M::worker_pipeline::0.057*5.49] mapped 1 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -t 128 -K 2148m test_ref.fasta test_reads.fasta
[M::main] Real time: 0.057 sec; CPU: 0.311 sec; Peak RSS: 0.004 GB

If you can verify this issue, there should probably be a note in the man page, but ideally I would like to be able to adjust beyond -K 2147.

Thanks!
Mitchell

@lh3
Copy link
Owner

lh3 commented May 22, 2020

Duplicate of #491 and #562. Try the github HEAD.

@lh3 lh3 closed this as completed May 22, 2020
@lh3 lh3 added the duplicate label May 22, 2020
@mrvollger
Copy link
Author

Sorry I should have looked longer at the other issues.

I am noticing similar behavior with yak, and I don't think there is an issue for it there yet. Should I make one?

Will process many at once:

 ./yak count -K 2000000000  mat_ill.fasta -o  /dev/null

Will only process one at a time:

 ./yak count -K 20000000000  mat_ill.fasta -o  /dev/null

@lh3
Copy link
Owner

lh3 commented May 22, 2020

yak and minigraph have a similar issue. I haven't fixed those yet... You can create a new issue as a reminder for me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants