Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry api /dataset/search sometimes does not include recordCount #52

Open
rukayaj opened this issue May 8, 2020 · 3 comments
Open
Labels

Comments

@rukayaj
Copy link

rukayaj commented May 8, 2020

Hiya, I was wondering if there's a reason why some results in /dataset/search don't return a recordCount attribute?

An example is the 7th result here https://api.gbif.org/v1/dataset/search?q=plant&publishingCountry=AR - (key = 0ce9ca26-0e89-4f63-94fe-124d47a4451a). The API result doesn't have recordCount included, but you can see on https://www.gbif.org/dataset/0ce9ca26-0e89-4f63-94fe-124d47a4451a it has 10343 records.

@MortenHofft
Copy link
Member

intro: recordCount is a confusing field. It means something different for checklists and occurrences.

But even knowing that the numbers look wrong I agree. It will sometimes be missing (as in your example) and other times it doesn't match number of records in the checklist.

{
"key": "6ac09c4d-bf7b-4e47-9c2d-f5abf6e89aa0",
...
"recordCount": 1109
}

But it has 1211 records and 1205 as source. There might be some filter applied, but I cannot figure out what it would be.

@MattBlissett
Copy link
Member

The counts in the search index are maintained separately from other systems, to allow the registry to be independent of them. Sometimes, updates to the search index fail.

We're just transitioning to a new search index (probably on Monday), and the counts should then be current. I'm not sure if there's been work to improve the reliability other than at a rebuild of the search index though -- @fmendezh ?

@rukayaj
Copy link
Author

rukayaj commented May 15, 2020

Perhaps it would be better to exclude it from the API results if it's not reliable, or to add an explanation on the documentation page?

I guess it's related to this issue gbif/pipelines#245 but I'm not clear what @muttcg means that it returns count number?

Edit: also I just noticed this gbif/registry#9 - suggestion to refactor recordCount into taxonCount and occurrenceCount, which makes the difference between checklist/occurrences more explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants