You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I started the last Crossref download via the works API and using a cursor on the 12th April and it finished on the 18th. I didn't get any interruptions this time but I received more pages than I should have - I received 101441 pages, expected around 96102 (total results were 96101447 at the start of the download, now 96251307; with a page size of 1000).
In the result, the same DOI is returned more than once. In some, but not all cases the is-referenced-by-count has changed.
The number of unique DOIs in the whole results is 96220498, something between the total 96101447 at the start and 96251307 by the end.
My questions would be:
Is that expected or a bug?
Is it possible that it omits works or will it at worst only return them multiple times?
(I have all of the JSON responses saved if that helps)
The text was updated successfully, but these errors were encountered:
I think this is Solr thing, where the result set can change during iteration. Else the information about the set contents would need to be saved somewhere == huge amount of RAM / HDD necessary. I downloaded an initial set with until-index-date filter and use until-index-date in combination with from-index-date to get updates. You will only get the whole set if you are lucky, but at least after few updates most of the data should be downloaded.
Hi,
I started the last Crossref download via the works API and using a cursor on the 12th April and it finished on the 18th. I didn't get any interruptions this time but I received more pages than I should have - I received 101441 pages, expected around 96102 (total results were 96101447 at the start of the download, now 96251307; with a page size of 1000).
In the result, the same DOI is returned more than once. In some, but not all cases the is-referenced-by-count has changed.
The number of unique DOIs in the whole results is 96220498, something between the total 96101447 at the start and 96251307 by the end.
My questions would be:
(I have all of the JSON responses saved if that helps)
The text was updated successfully, but these errors were encountered: