Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back translate #74

Merged
merged 2 commits into from
Oct 26, 2015
Merged

Back translate #74

merged 2 commits into from
Oct 26, 2015

Conversation

jfjlaros
Copy link
Member

Added back translate interface.

@jfjlaros
Copy link
Member Author

Note that I use the NM to NP lookup table form the db module, this table is not symmetric so perhaps I am using it wrong.

@martijnvermaat martijnvermaat force-pushed the back_translate branch 2 times, most recently from 9e861ea to 8f0ede6 Compare September 25, 2015 09:23
@martijnvermaat
Copy link
Contributor

I rebased and squashed this branch on current master, makes reviewing easier.

@martijnvermaat
Copy link
Contributor

I'm wondering a bit about naming, i.e.,:

  1. Backtranslate versus back-translate versus back translate.
  2. Translate versus translator versus translation.

And in code:

  1. backtranslate.BackTranslate suggests one word and two words.
  2. mutalyzer.back_translate takes another approach.

@martijnvermaat martijnvermaat force-pushed the back_translate branch 4 times, most recently from a05a417 to 5782de9 Compare October 1, 2015 21:09
# TODO:
# - Support reference sequences other than transcript or protein.
# - Check input:
# - without_dna: Does the reference amino acid fit?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Is this a TODO before we can merge this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think this is blocking.

  1. Currently the back translator only supports NM or NP. In principle other references (NG, NC, LGC, ENSx) can also work.
  2. This refers to the name checker. We currently do not check whether the reference amino acid is actually there. E.g., for p.Glu1Arg we do not check whether there is a glutamine at position 1.

@martijnvermaat
Copy link
Contributor

@jfjlaros I revised this PR and fixed most of the issues I could find.

I just commented with some small remaining questions. Before merging this I would also like to include some tests.

@martijnvermaat
Copy link
Contributor

From an NP finding the corresponding NM is currently sort of broken when the NM is already in the database. This is because the database stores the NM including its version number and we query for the NM without version number.

It still sort of works, since the NCBI will now be queried, but of course we would like to use the cached NM.

I think we should first fix the database to store the versions separately.

Note that this is only picked up by the unit tests when disabling the internet connection.

@martijnvermaat
Copy link
Contributor

After some tests it looks like we can actually have transcript-protein links with version numbers, so we should probably do that first.

@martijnvermaat martijnvermaat force-pushed the back_translate branch 4 times, most recently from 7dd9623 to ec832c8 Compare October 13, 2015 22:03
@martijnvermaat martijnvermaat force-pushed the back_translate branch 3 times, most recently from 54e705a to ecbc78f Compare October 26, 2015 14:40
martijnvermaat added a commit that referenced this pull request Oct 26, 2015
@martijnvermaat martijnvermaat merged commit e661b72 into master Oct 26, 2015
@martijnvermaat martijnvermaat deleted the back_translate branch October 26, 2015 15:12
@martijnvermaat martijnvermaat mentioned this pull request May 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants