Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sambamba does not seem to follow a 'natural' (mixed sort) order when name-sorting, thus breaking compatibility with samtools #132

Closed
schelhorn opened this issue Mar 25, 2015 · 1 comment

Comments

@schelhorn
Copy link

I appeared to me that sambamba does seem to apply a lexicographic ordering rather then a natural (aka mixed sort) ordering when sorting for query names. This is in contrast to samtools, which applies mixed sort ordering (i.e., it parses the query name into strings and integers and sorts based on that). These two strategies produce significantly different orderings.

Is this behaviour of sambamba on purpose? It somewhat complicates comparability of results between sambamba and the de-facto standard samtools. Other projects may expect (maybe unjustified so) sambamba to be a drop-in replacement for samtools, as for example bcbio.

If sambamba's behaviour is different in such core aspects then I suggest that this should be clearly stated in the documentation in order to avoid misunderstandings. Even better, of course, an option for natural-sort ordering could be implemented.

@lomereiter
Copy link
Contributor

The major use of sorting by name is making so that paired reads one after another, but I imagine that a mix of sambamba sort -n and samtools merge may fail.

Closing this issue as a duplicate, please subscribe to #109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants