You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I appeared to me that sambamba does seem to apply a lexicographic ordering rather then a natural (aka mixed sort) ordering when sorting for query names. This is in contrast to samtools, which applies mixed sort ordering (i.e., it parses the query name into strings and integers and sorts based on that). These two strategies produce significantly different orderings.
Is this behaviour of sambamba on purpose? It somewhat complicates comparability of results between sambamba and the de-facto standard samtools. Other projects may expect (maybe unjustified so) sambamba to be a drop-in replacement for samtools, as for example bcbio.
If sambamba's behaviour is different in such core aspects then I suggest that this should be clearly stated in the documentation in order to avoid misunderstandings. Even better, of course, an option for natural-sort ordering could be implemented.
The text was updated successfully, but these errors were encountered:
The major use of sorting by name is making so that paired reads one after another, but I imagine that a mix of sambamba sort -n and samtools merge may fail.
Closing this issue as a duplicate, please subscribe to #109
I appeared to me that sambamba does seem to apply a lexicographic ordering rather then a natural (aka mixed sort) ordering when sorting for query names. This is in contrast to samtools, which applies mixed sort ordering (i.e., it parses the query name into strings and integers and sorts based on that). These two strategies produce significantly different orderings.
Is this behaviour of sambamba on purpose? It somewhat complicates comparability of results between sambamba and the de-facto standard samtools. Other projects may expect (maybe unjustified so) sambamba to be a drop-in replacement for samtools, as for example bcbio.
If sambamba's behaviour is different in such core aspects then I suggest that this should be clearly stated in the documentation in order to avoid misunderstandings. Even better, of course, an option for natural-sort ordering could be implemented.
The text was updated successfully, but these errors were encountered: