Mapping NCBI accessions/GIs to Uniprot IDs produces few hits #1419
Replies: 4 comments 13 replies
-
If it was me, I would take all the sequences and blast to Uniprot to get associated accessions. |
Beta Was this translation helpful? Give feedback.
-
Can you please provide link to file(s)?
Out of curiosity, why are you using these? I'm fairly certain these have been deprecated (years ago) and are not really the standard way to identify NCBI "stuff". Which UniProt database were you using (reviewed/unreviewd)? Unreviewed should get you more matches. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
Having perused your BLAST output table, the primary issue is you have a mix of database IDs. Some are GenBank (e.g. QYJ58449.1), some are RefSeq (e.g. XP_033432496.1). This will make things difficult to deal with when trying to map from one database to another (e.g. the UniProt ID mapping service). The service expects a single IDtype (e.g. GenBank) to mapped to a single database (e.g. UniProt SwissProt). A mix of ID types like you have in your BLAST table cannot be mapped via a "batch" process. You'd have to parse out the different IDs, based on the ID type (which is not indicated in the BLAST table). To add to this, not all RefSeq accessions are mapped to/from UniProt. It seems like RefSeq mapping criteria is pretty stringent: https://www.uniprot.org/help/ncbi_mappings So, with all that in mind, I really think you should take @sr320's approach and perform a BLAST to SwissProt yourself in order to obtain SwissProt accessions. This will then allow you to do a batch submission to UniProt to obtain gene ontology terms. |
Beta Was this translation helpful? Give feedback.
-
I have a set of NCBI protein/nucleotide entries in a genome annotation file, and want to ultimately perform GO enrichment analysis. It seems that the best approach is to map the NCBI entries to Uniprot IDs using this Uniprot tool which also retrieves GO IDs. My issue is that very few of my entries map to Uniprot IDs, for example:
Using NCBI accession numbers as my input:
Using NCBI GI numbers as my input (which I retrieved using the Batch Entrez tool:
Anyone else run into this issue? Or has anyone used an enrichment analysis tool that uses NCBI accession numbers as the input?
FYI I've also tried entering the accession and GI numbers in DAVID, but it doesn't recognize them. As reference, here are a few of the entries:
NOTE: I have seen this issue and sam's notebook entry where he uses a python script to retrieve UniprotIDs, which I haven't tested on my gene sets, however I presume that it would use the same Uniprot database and therefore the same results.
Beta Was this translation helpful? Give feedback.
All reactions