-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making entrez_fetch output accessible in R #179
Comments
That output is definitely not xml. I looked into the code of It works for me with |
Hi @allenbaron , thanks a lot for your fast answer and help!! It works perfectly fine with the
I still run into two issues and would be very happy for any suggestions (I´m still an R beginner...). After running entrez_fetch with a list of 181 accessions as a search input ( Strangely, all 181
Would you have an idea what could be wrong with my Second point: to find out the CDS start and end position, I used Thanks a lot in advance and have a great day! |
I could not reproduce your first problem. If you run into more problems, I highly recommend creating a "reprex" (reproducible example). They provide a solid starting point for help with troubleshooting and identifying problems. Here's an example with the first six accessions in your list: library(rentrez)
interest_list_sh1 <- c("NM_001271885.2", "NM_005763.4", "NM_004996.4",
"XM_017020319.1", "NM_007011.8", "NM_001012750.3")
interest_acc <- rentrez::entrez_fetch(
db="nucleotide",
id=interest_list_sh1,
rettype="db",
retmode = "xml",
parsed=T
)
interest_acc1 <- XML::xmlToList(interest_acc)
sapply(interest_acc1, function(.x) .x$GBSeq_locus)
#> GBSeq GBSeq GBSeq GBSeq GBSeq
#> "NM_001271885" "NM_005763" "NM_004996" "XM_017020319" "NM_007011"
#> GBSeq
#> "NM_001012750" Created on 2022-06-20 by the reprex package (v2.0.1) For problem 2, converting the XML to a list from the "native" type and then extracting the relevant info would be really painful and I would avoid it. If you have to get the information from that I'd suggest you look into using XML selectors on the XML directly and don't do the |
Hi!
Thanks for your nice package for communicating with NCBI in R.
I have a list of transcript accessions, of which I want to extract the GenBank entries. My list contains 1232 accessions, which I found to be too much for fetching (HTTP failure 414, the request is too large) and web_history option didn´t help. I read several issues about that problem, e.g. #163, and found out that reducing the input list might solve it. I was now successful by reducing my list to 200 accessions (500 did not work).
Now I´m running into the next problem, as my fetched file does not seem to be accessible.
I tried:
but get the error "XML content does not seem to be XML [....]". If I take the "parsed=T" out, the entrez_fetch command works, but I dont know how to make the fetched file accessible (xmlToList as shown in issue #113 doesnt work due to the XML incompatibility issue, either). Seems to be a similar problem as #91.
Is there any possiblility, how to access the information? I would actually only need the information, in which nucleotide positions of my transcript of interest the CDS starts, and in which it ends. So maybe I could also fetch a more simple record than the whole "native" one?
Thanks a lot for any help or suggestions!
The text was updated successfully, but these errors were encountered: