Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi and thanks a lot for your work on Cuttlefish!
I've been working on a GFA1 file parser and have used Cuttlefish to generate GFA1 files. In doing so, I noticed that the Path lines output by Cuttlefish are ambiguous and break the GFA 1.0 Format Specification. Quoting https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md#p-path-line:
However, Cuttlefish currently outputs an extra overlap value.
Here are two small reproducible examples which highlight the issue. First, consider the FASTA file with contents:
I used the symbol 'N' to induce a zero overlap link so that we can track which overlaps correspond to which links. Running Cuttlefish (and KMC) on this file with
-k 3
produces the GFA1 file with contents:Here, "0M" is given twice. The correct overlap output for the path would be a single "0M". Next, also consider the FASTA file with contents:
and the produced GFA1 file with contents:
Here, the correct overlap output would be "2M,0M,2M".
It appears that the overlap buffer for each thread already contains the correct overlaps and the manually output first overlap of the path is a duplicate. My proposed fix is to simply output the overlap buffer (without the first comma). After applying the fix, Cuttlefish produces the GFA1 files
and
which are correct.