Extension of API for specifying transcript variants #380

mkroon1 · 2016-05-30T10:42:17Z

There is an issue where clients of Mutalyzer's API may query Mutalyzer with outdated information regarding transcript variants. See for example LOVDnl/LOVD3#73

I have made a gist with a more elaborate description of the problem and proposal of a fix.

martijnvermaat · 2016-05-30T10:48:57Z

The scope currently seems to be limited to specific API calls and only to support this syntax as input, not for Mutalyzer to produce it. While this may very well be a good first step, I think eventually we would like to make this the preferred way of selecting a transcript, so I'll try to see this proposal in that context.

martijnvermaat · 2016-05-30T11:18:36Z

the version number is not needed when specifying a transcript variant by accession number.

That is correct. There are no RefSeq files which have more than one version of a certain transcript annotated. So we could make the version optional, or even disallow it. I can think of some up- and downsides to both approaches:

Genbank files are annotated with versions and syntactically, genbank files could have more than one version of a certain transcript annotated, even if it isn't the case at this moment.
We try to educate people to always include a version number. Choosing not to here could be confusing.

Extend grammar to include transcript accession number as transcript variant. I.e.:

In my opinion the GeneSymbol production rule is not a good name, since it is actually a gene symbol plus optional transcript variant or protein isoform. If we disregard the name for a moment, I think it would make sense to include the AccNo alternative in this rule (instead of having it as an alternative to this rule).

Also, if we choose to make the version optional, reusing AccNo would work fine. If we choose to make it mandatory or disallow it, this would need some other construction.

Although these seem implementation details, I think it is important to think carefully about any BNF changes and naming of production rules. As you can see, the naming of GeneSymbol is already problematic here.

Write test

Yes, I would like to see a few tests, perhaps even start with those 👍

The Position converter does not need to be adapted since it does not accept a transcript variant anyway.

That is incorrect, here's an example of that. However, I think the only occurrences of this are mtDNA genes, since we don't have mappings for NG_ references in our database.

The python example in task 1 above probably won't work as variable AccNo appears twice in the same rule.

I don't know, will have to try.

martijnvermaat · 2016-05-30T11:19:26Z

Related, it is good to keep in mind that there can be transcripts annotated without an accession number. For this specific use case it will probably not matter, but this is probably enough reason to always keep supporting the old syntax. Also something to be aware of when we start to implement producing descriptions with this syntax, it will not always be possible.

Here's a quick check I ran on our database. It seems that most transcripts are annotated with an accession, but not all.

ifokkema · 2016-05-30T14:51:07Z

Please note that whereas it's important to be able to match the transcript correctly on the input so that we don't get unintended EREF errors, it's also needed for LOVD to get the output with the transcript information. We normally loop through the results, and look for the i001 matching the v001 number. So we'll need some way to get the correct result. Perhaps the webservice could include the v-numbers and transcript IDs like the website does on the name checker?

martijnvermaat · 2016-05-30T15:06:41Z

@ifokkema I assume you are talking about getTranscriptsAndInfo? Isn't the id what you want? example

ifokkema · 2016-05-30T16:11:14Z

Yes, exactly, and we don't want to need two webservice calls for each namechecker call... That's why we cache that ID, and only now we found out that it's not stable...

jfjlaros · 2016-05-31T07:45:38Z

A downside of using a version number is that you get multiple equivalent descriptions.

martijnvermaat · 2016-05-31T08:52:20Z

@jfjlaros Only if we make it optional, but that is the same situation we already have for the main accession number. I would actually propose to make the version required, since this matches what is in the annotation.

Also, we already have multiple equivalent descriptions from the transcript selection syntax, and this proposal adds at least one.

martijnvermaat added webservices grammar labels May 30, 2016

mkroon1 mentioned this issue Jun 15, 2016

Accept transcript accession number as transcript variant identifier. #405

Merged

martijnvermaat closed this as completed in #405 Jun 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension of API for specifying transcript variants #380

Extension of API for specifying transcript variants #380

mkroon1 commented May 30, 2016

martijnvermaat commented May 30, 2016

martijnvermaat commented May 30, 2016

martijnvermaat commented May 30, 2016

ifokkema commented May 30, 2016

martijnvermaat commented May 30, 2016

ifokkema commented May 30, 2016

jfjlaros commented May 31, 2016

martijnvermaat commented May 31, 2016

Extension of API for specifying transcript variants #380

Extension of API for specifying transcript variants #380

Comments

mkroon1 commented May 30, 2016

martijnvermaat commented May 30, 2016

martijnvermaat commented May 30, 2016

martijnvermaat commented May 30, 2016

ifokkema commented May 30, 2016

martijnvermaat commented May 30, 2016

ifokkema commented May 30, 2016

jfjlaros commented May 31, 2016

martijnvermaat commented May 31, 2016