Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension of API for specifying transcript variants #380

Closed
mkroon1 opened this issue May 30, 2016 · 8 comments · Fixed by #405
Closed

Extension of API for specifying transcript variants #380

mkroon1 opened this issue May 30, 2016 · 8 comments · Fixed by #405

Comments

@mkroon1
Copy link
Contributor

mkroon1 commented May 30, 2016

There is an issue where clients of Mutalyzer's API may query Mutalyzer with outdated information regarding transcript variants. See for example LOVDnl/LOVD3#73

I have made a gist with a more elaborate description of the problem and proposal of a fix.

@martijnvermaat
Copy link
Contributor

The scope currently seems to be limited to specific API calls and only to support this syntax as input, not for Mutalyzer to produce it. While this may very well be a good first step, I think eventually we would like to make this the preferred way of selecting a transcript, so I'll try to see this proposal in that context.

@martijnvermaat
Copy link
Contributor

the version number is not needed when specifying a transcript variant by accession number.

That is correct. There are no RefSeq files which have more than one version of a certain transcript annotated. So we could make the version optional, or even disallow it. I can think of some up- and downsides to both approaches:

  1. Genbank files are annotated with versions and syntactically, genbank files could have more than one version of a certain transcript annotated, even if it isn't the case at this moment.
  2. We try to educate people to always include a version number. Choosing not to here could be confusing.
  1. Extend grammar to include transcript accession number as transcript variant. I.e.:

In my opinion the GeneSymbol production rule is not a good name, since it is actually a gene symbol plus optional transcript variant or protein isoform. If we disregard the name for a moment, I think it would make sense to include the AccNo alternative in this rule (instead of having it as an alternative to this rule).

Also, if we choose to make the version optional, reusing AccNo would work fine. If we choose to make it mandatory or disallow it, this would need some other construction.

Although these seem implementation details, I think it is important to think carefully about any BNF changes and naming of production rules. As you can see, the naming of GeneSymbol is already problematic here.

Write test

Yes, I would like to see a few tests, perhaps even start with those 👍

The Position converter does not need to be adapted since it does not accept a transcript variant anyway.

That is incorrect, here's an example of that. However, I think the only occurrences of this are mtDNA genes, since we don't have mappings for NG_ references in our database.

The python example in task 1 above probably won't work as variable AccNo appears twice in the same rule.

I don't know, will have to try.

@martijnvermaat
Copy link
Contributor

Related, it is good to keep in mind that there can be transcripts annotated without an accession number. For this specific use case it will probably not matter, but this is probably enough reason to always keep supporting the old syntax. Also something to be aware of when we start to implement producing descriptions with this syntax, it will not always be possible.

Here's a quick check I ran on our database. It seems that most transcripts are annotated with an accession, but not all.

@ifokkema
Copy link

Please note that whereas it's important to be able to match the transcript correctly on the input so that we don't get unintended EREF errors, it's also needed for LOVD to get the output with the transcript information. We normally loop through the results, and look for the i001 matching the v001 number. So we'll need some way to get the correct result. Perhaps the webservice could include the v-numbers and transcript IDs like the website does on the name checker?

@martijnvermaat
Copy link
Contributor

@ifokkema I assume you are talking about getTranscriptsAndInfo? Isn't the id what you want? example

@ifokkema
Copy link

Yes, exactly, and we don't want to need two webservice calls for each namechecker call... That's why we cache that ID, and only now we found out that it's not stable...

@jfjlaros
Copy link
Member

A downside of using a version number is that you get multiple equivalent descriptions.

@martijnvermaat
Copy link
Contributor

@jfjlaros Only if we make it optional, but that is the same situation we already have for the main accession number. I would actually propose to make the version required, since this matches what is in the annotation.

Also, we already have multiple equivalent descriptions from the transcript selection syntax, and this proposal adds at least one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants