Web history object - No esummary records found in file #178

d-caraballo · 2022-04-27T16:04:03Z

Hi, I am trying to download all available bat coronaviruses. I used the following query:
bat_cov_ids<-entrez_search(db="nuccore", term="Bat coronavirus", retmax = 10000)
This returned 4214 hits, which I could access using entrez_summary coupled with the get-metadata function.

Now, I am trying to compare these results with a different search strategy. I want to seek for all coronavirus sequences in the "nuccore" database, and then filtering for bat hosts using the standarised taxonomy as in the tutorial.

I use the following code:

covs<-entrez_search(db="nuccore", term="txid11118[Organism]")

Which yields:
Entrez search result with 4715446 hits (object contains 20 IDs and a web_history object)
Search term (as translated): txid11118[Organism]

Then I use entrez_summary:
entrez_summary(db="nuccore", web_history=covs$web_history)

And I get the message:
Error during wrapup: No esummary records found in file

What is going wrong??

allenbaron · 2022-04-27T16:20:43Z

Your trying to retrieve too many records and the only response from the server is "Too many UIDs in request. Maximum number of UIDs is 500 for JSON format output."

d-caraballo · 2022-04-27T16:24:17Z

Thanks, Allen. But the use of web_history wasn't precisely to avoid the "large request" problem? How can I get the complete record (4.7E6 hits!) and then filter by host species?

allenbaron · 2022-04-27T17:29:18Z

I'm sorry to disappoint you but your going to have to do some extra work here if you want this to work. rentrez cannot handle this use case without extra coding.

Before you do anything else, I recommend you review the E-Utilities documentation, particularly where it discusses large requests in Usage Guidelines and Requirements.

rentrez does instantiate an Entrez History object when use_history = TRUE in entrez_search. An Entrez History object is basically required for large requests (> 200 records I think) but the Entrez Utilities still have limits on how many records you can retrieve in a single request. For ESummary the limit is dependent on the record format requested, 500 for json and 10,000 for xml (for more details about each Utility see The E-utilities In-Depth: Parameters, Syntax and More. To obtain more than that from a History object is possible but requires paging (see "Minimizing the Number of Requests" in the E-Utilities documentation; the Application 3 link provides an example of paging).

rentrez does not have the ability to page, so it will not work with the History object created. You could do this using the E-direct utilities on the command line, which I recommend if you are serious about getting this data. It might also be possible to get all the record IDs from entrez_search() and then request them in chunks of 10,000 with entrez_summary() but you should be aware that there is a bug in rentrez that prevents this from working (see PR #174). I fixed this specific issue in a fork when I realized rentrez is not being actively maintained.

One more thing for your consideration, the first 10,000 records of your request have a size of 221 MB.

LauraVP1994 · 2022-07-15T08:28:32Z

You seem to have the same problem as I have. I did find a way around this problem (at least it worked for me with pubmed). You can use an lapply or for loop, I included my code in issue #180.

allenbaron · 2022-07-20T14:15:35Z

Ideally, rentrez would be updated to implement E-utilties paging feature with a web history.

J-Moravec · 2022-09-29T01:24:26Z

Encountered the same issue:

rentrez::entrez_summary(db="gds", web_history=esearch$web_history)
# Esummary includes error message: Too many UIDs in request. Maximum number of UIDs is 500 for JSON format output.

Which got more confusing when specifying retmode="XML" in a hope that this will rectify the problem:

rentrez::entrez_summary(db="gds", web_history=esearch$web_history, retmode="XML")
# Error in UseMethod("parse_esummary") : 
# no applicable method for 'parse_esummary' applied to an object of class "character"

Since documentation specifically says to use the web_history argument when the number of records is too large, it should be documented that it is not a panacea and how to work with a large number of records.

I will try to submit a PR once I figure out how to do it cleanly.

allenbaron mentioned this issue Jul 14, 2022

Get PubMed metadata from large amount of articles #180

Open

allenbaron mentioned this issue Jan 30, 2023

Search and fetch are providing the wrong data? #185

Open

ediconchan mentioned this issue Nov 16, 2023

Error: No esummary records found in file ediconchan/Assignment2#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web history object - No esummary records found in file #178

Web history object - No esummary records found in file #178

d-caraballo commented Apr 27, 2022

allenbaron commented Apr 27, 2022

d-caraballo commented Apr 27, 2022

allenbaron commented Apr 27, 2022

LauraVP1994 commented Jul 15, 2022

allenbaron commented Jul 20, 2022

J-Moravec commented Sep 29, 2022

Web history object - No esummary records found in file #178

Web history object - No esummary records found in file #178

Comments

d-caraballo commented Apr 27, 2022

allenbaron commented Apr 27, 2022

d-caraballo commented Apr 27, 2022

allenbaron commented Apr 27, 2022

LauraVP1994 commented Jul 15, 2022

allenbaron commented Jul 20, 2022

J-Moravec commented Sep 29, 2022