Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloads per month for datasets and rollups for publishers #117

Closed
MortenHofft opened this issue Apr 29, 2019 · 9 comments
Closed

Downloads per month for datasets and rollups for publishers #117

MortenHofft opened this issue Apr 29, 2019 · 9 comments
Assignees

Comments

@MortenHofft
Copy link
Member

MortenHofft commented Apr 29, 2019

To support
gbif/portal-feedback#1912
gbif/portal16#761
gbif/portal16#138

we need an API to get

  • downloads over time (per month) for a dataset
  • total downloads for a publisher
  • downloads per month for a publisher
  • downloads per user country for datasets and publishers.

We might want to expand this to downloads last 356 days. year to date etc., but above should be a good start.

So something along:

/occurrence/download/dataset/key?facet=yearMonth
// giving me counts a la 2019-jan: 132, 2019-feb: 43
// and of course the usual list of paginated downloads

/occurrence/download/dataset/key?yearMonth=2019-01&facet=country
// counts per country for january 2019
// and the list of downloads for that month

/occurrence/download/dataset/key?facet=year
// breakdown per year only

/occurrence/download/dataset/key?yearMonth=2019-01,2019-10
// downloads from january to october (I cannot remember what our incl/excl convention is)

/occurrence/download/dataset/key?yearMonth=2019-1&facet=yearMonthDay 
// giving me counts per day for januar 2019

//similar for publishers
/occurrence/download/organization/key

or we could do something simple for now (if at all simpler)

/occurrence/download/dataset/key/stats
yearMonth: {
  2019-03: 234,
  ...
}

aside from counting download events, we could perhaps count records from the dataset as well?
so that /occurrence/download/dataset/7f2edc10-f762-11e1-a439-00145eb45e9a would also return the total record count?

@jeroencreuwels
Copy link

Hi Morten,
Just a quick note to let you know that exactly this issue was asked by the management of my museum (as mentioned also in other issues). The number of download events as well the number of occurrence records is useful, but at the moment they are most interested in the number of downloads. Jeroen Creuwels - Data manager Naturalis, The Netherlands

@fmendezh
Copy link
Contributor

I agree that we should provide better analytics API services for downloads, there one thing we should consider for this specific request: it is not very common but the publisher of a dataset can change from time to time, we do not store the publisher of a dataset at the time a download was created if we use the current publisher of a dataset we might be providing inaccurate results, I know this would be very rare, but it can happen, for example, we store the dataset title at the time the download was created to avoid inconsistencies

@jlegind
Copy link

jlegind commented Feb 24, 2021

Hi @fmendezh , I have a SQL statement for a 'complete' csv table, that covers most of the statistics requested by publishers on GBIF data usage. It is broken down to dataset level by month. These are the headers:
number_of_users;download_events;total_downloaded_gbif_records;dataset_title;publisher_title;country;month_

For the year 2020, this amounts to ~400K rows. Is this something you could use to support the analytics API, or put into https://analytics-files.gbif-uat.org/download/csv/ ?

@jlegind
Copy link

jlegind commented Mar 18, 2021

Naturalis has also been requesting API based stats on user downloads:

I have come quite far in analysing the 811k downloads from the 41 occurrence datasets that Naturalis shares with GBIF. We are very interested in information on the country or continent where the users of our data come from. The country report of the Netherlands does include metrics on downloads of Dutch users, so this information should be stored somewhere with the downloads. Is there a way to access this information through the API?

This could be based on a table made of user-downloads count by country, and by year.

@jlegind
Copy link

jlegind commented Mar 19, 2021

In a similar vein, but at dataset granularity=
Rep from Cal. Academy:

For these 10 datasets, could we please get a count of download events and occurrences within those download events for 2020 broken down by dataset and month? Something like this:

Dataset Download Events Occurrences Month
Antweb 50 325 1
Antweb 37 540 2
Antweb 44 1210 3
...      
CAS Botany 29 3010 1
CAS Botany 34 4090 2
CAS Botany 59 10254 3

@jlegind
Copy link

jlegind commented Mar 24, 2021

Continuing Naturalis:

We are (additionally) interested in the regions where the download requests come from, i.e. where are the users of our data located. Maybe something for the next API release?

fmendezh added a commit that referenced this issue May 18, 2021
@fmendezh
Copy link
Contributor

fmendezh commented Jun 1, 2021

The download stats api has been improved with some new services and parameters:

  1. publishingCountry, date range and datasetKey:
    http://api.gbif.org/v1/occurrence/download/statistics/downloadedRecordsByDataset?publishingCountry=US&fromDate=2015-12&toDate=2019-11&datasetKey=4fa7b334-ce0d-4e88-aaae-2e0c138d049e

  2. Filter by publishing organization:
    http://api.gbif.org/v1/occurrence/download/statistics/downloadedRecordsByDataset?publishingOrgKey=ccc2e3ec-98ba-4e74-878d-7dcf0f57baba

  3. More detailed stats: http://api.gbif.org/v1/occurrence/download/statistics?publishingCountry=US&fromDate=2015-12&toDate=2019-11&datasetKey=4fa7b334-ce0d-4e88-aaae-2e0c138d049e
    { "datasetKey": "4fa7b334-ce0d-4e88-aaae-2e0c138d049e", "totalRecords": 48092005443, "numberDownloads": 3061, "month": 11, "year": 2019 }

  4. Export to csv or tsv:
    http://api.gbif.org/v1/occurrence/download/statistics/export?publishingCountry=US&fromDate=2015-12&toDate=2019-11&datasetKey=4fa7b334-ce0d-4e88-aaae-2e0c138d049e

@fmendezh fmendezh closed this as completed Jun 1, 2021
@ManonGros
Copy link
Contributor

@jeroencreuwels take a look at the new download stats api functions.

@jeroencreuwels
Copy link

Thanks for all the work! Seems perfect for our goals!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants