Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When to use --no_force_align #2

Open
sbliven opened this issue Nov 30, 2018 · 1 comment
Open

When to use --no_force_align #2

sbliven opened this issue Nov 30, 2018 · 1 comment

Comments

@sbliven
Copy link
Contributor

sbliven commented Nov 30, 2018

It's unclear to me when to use the --no_force_align option to ProGraph. The README describes this as

do not force alignment of initial Methionine

What's the scientific motivation for skipping initial M by default?

I ask because of a potential bug in the interaction with the --repeat option, which matches the sequences to a T-Reks output alignment. These files reference sequence positions, so they cause an off-by-one error if the M was stripped.

I can think of several possible solutions:

  • Default to --no_force_align when the --repeat option is also specified
  • For each sequence, store a flag indicating whether it has been truncated. If so, account for that when reading in the repeats file
  • Be more permissive when verifying the FASTA/T-REKS alignment. Automatically recover from off-by-one errors in the coordinates. (This would have the side benefit of supporting malformed T-Reks files that used 0-based indexes rather than the correct 1-based positions.)
@brightrail
Copy link

For now, let's go for a quick fix: Default to --no_force_align when the --repeat option is also specified
Later when we have more time, it would be good to fix it properly (the last proposed solution).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants