Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-part indices / unknown reference size / samtools 1.10 #527

Closed
wwood opened this issue Dec 8, 2019 · 2 comments
Closed

multi-part indices / unknown reference size / samtools 1.10 #527

wwood opened this issue Dec 8, 2019 · 2 comments

Comments

@wwood
Copy link

wwood commented Dec 8, 2019

Hi there,

I'm using minimap2 as part of a metagenomics coverage calculation tool (CoverM, to be concrete).

The tool takes a reference fasta and reads, and outputs coverage, using minimap2 to do the mapping. The issue is that because the target fasta is of unknown size (and may even be streamed), --split-prefix must always specified. Unfortunately, as in #400 @SQ lines are duplicated when the number of bases is less than 4G. Piping minimap2 -a to samtools sort worked previously, but as of 1.10 samtools now croaks because duplicate sequence names are encountered.

Assuming I'm understanding correctly and there's no other workarounds, would it be possible to not output the duplicated @SQ lines when --split-prefix is specified with a small target fasta please?

Thanks in advance. ben

wwood added a commit to wwood/CoverM that referenced this issue Dec 9, 2019
@lh3 lh3 closed this as completed in 24f50f3 Jan 18, 2020
@lh3
Copy link
Owner

lh3 commented Jan 18, 2020

Addressed in 24f50f3.

@wwood
Copy link
Author

wwood commented Jan 19, 2020

I'll wait for a release to test it but thanks for addressing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants