Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better documentation of, &/or support for, other platforms #126

Closed
mjwestgate opened this issue Jan 31, 2022 · 3 comments
Closed

Better documentation of, &/or support for, other platforms #126

mjwestgate opened this issue Jan 31, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@mjwestgate
Copy link
Collaborator

As of v1.4.0, galah supports the ALA (default), and 5 other atlases (Austria, Guatemala, Spain, Sweden & UK). Although other atlases exist (see the full list), it is unclear whether these other atlases are unsupported because of technical issues, or because we simply haven't added them yet.

In increasing order of difficulty, a useful set of activities to address this might include:

  1. [SIMPLE] List all living atlases within show_all_atlases(), and record whether they can be called by galah
  2. Allow other atlases to be queried by data-retrieving functions such as atlas_counts and atlas_occurrences
  3. Support queries to taxonomic name services of non-ALA atlases via search_taxa or search_identifiers, rather than requiring the user to search using taxize. This could work either by importing taxize, or by writing new code
  4. [DIFFICULT] Support calls to the GBIF API using galah functions. This would require galah_filter to support queries other than solr. We would also need to either call rgbif::occ within atlas_occurrences to download the data, or write new code to achieve the same goal.
@mjwestgate mjwestgate added the enhancement New feature or request label Jan 31, 2022
mjwestgate added a commit that referenced this issue Apr 27, 2022
- add `galah_config(atlas = "Global")`
- update `show_all_atlases`
- `show_all_fields` has hard-coded GBIF field names (as per `rgbif::occ_fields`)
- ditto `show_all_ranks`
- `search_taxa` supports calls to `https://api.gbif.org/v1/species/match`

Needed:
- support use of GBIF API in `search_taxa` for atlases that use ALA collectory but GBIF taxonomy
- `atlas_` functions etc
@mjwestgate
Copy link
Collaborator Author

As of commit a0e0b2e, we now support name matching for all atlases that use GBIF taxonomy (currently Austria, Guatemala, Spain & Sweden). This effectively means we can remove the advise to use taxize for this purpose from our documentation. What we can't do yet is query other GBIF APIs from galah, which leaves open the question of what to do next. Basically we have two options:

  • Only support the living atlas community, by supporting name matching services from GBIF, but no other GBIF data/services
  • Expand galah to retrieve record counts or occurrences from GBIF

Based on my reading of the documentation here, there are a few points to consider in reaching a decision. For occurrence data:

  • There is no API for field values, meaning that show_all_values/search_values won't work unless we hard-code some entries. This is effectively the approach taken by e.g. rgbif::isocodes
  • GBIF don't allow the user restrict the columns returned in an occurrence download, meaning that galah_select would be either be ignored or applied post-download (which is inefficient)
  • The occurrence download process for GBIF is fairly involved, but does support filtering on a large number of fields. The syntax is different from solr, in that it requires a JSON-type structure which we don't currently produce; but it could be made to work
  • Downloads appear limited to 101k records, much less than the 50 million allowed by ALA. This is somewhat irrelevant as clearly people still want information from GBIF; but would require we update error messages etc.

For counts the same issues apply, plus a few more:

  • The GBIF counts API only supports a narrow range of facets (effectively only year and basisOfRecord), which limits usefulness of galah_group_by, unless we write our own iteration code
  • There is a hack to apply occurrence filters on counts, by using the occurrence/search API and adding &limit=0. This would maintain equivalence between atlas_counts and atlas_occurrences, which is an important feature.

In summary, while there is merit in supporting count and occurrence queries from GBIF; but the architecture is sufficiently different to make this somewhat challenging.

mjwestgate added a commit that referenced this issue May 5, 2022
Does not (yet) support facets, but this is possible within the API. Probably needs to distinguish in `galah_config()` between the atlas you are calling and the API 'engine' it uses (ALA vs GBIF). This could be built into show_all_atlases pretty easily, or create an object within galah_config that stores atlas-related metadata
@mjwestgate
Copy link
Collaborator Author

There is a new Living Atlases website that lists current participants and their status here. GBIF 'noted portals' are available here, but it is unclear if there are other sites that use GBIF infrastructure hosted elsewhere.

mjwestgate added a commit that referenced this issue Jul 1, 2022
- data on atlases and url information moved to `sysdata.rda`
- definition of `sysdata.rda` moved to `data-raw/internal_data.R`
- rebuild show_all_atlases with new portal information
- convert atlas config from series of functions to an internal list named `all_atlas_config`
mjwestgate added a commit that referenced this issue Jul 1, 2022
- now supports three types of API; name-matching, ALA-species and GBIF
- all atlases now return a `taxon_concept_id` column
- columns are in a similar order regardless of atlas used
@mjwestgate mjwestgate self-assigned this Jul 4, 2022
mjwestgate added a commit that referenced this issue Jul 6, 2022
- add extra API links for Austrian atlas, check all supported functions work
- add new function `R/show_all_atlases.R/species_facets` to ensure correct field is used to define unique species in different atlases
- update `test-international.R` to ensure all supported functions for a given atlas are tested.
- bug fixes to `show_all_reasons` and `show_all_fields`
- `search_all` now supports a `taxa` argument
mjwestgate added a commit that referenced this issue Jul 20, 2022
mjwestgate added a commit that referenced this issue Aug 31, 2022
Previous list version was harder to check and generalise, especially once there are many atlases (#126). New version stores the object internally, and makes it available to users via `show_all_apis`. New approach is to call `atlas_url` to build the url for you. Note that APIs combine `base_url` and `path` from previous versions.
mjwestgate added a commit that referenced this issue Aug 31, 2022
- new function `show_values` to show subsets from `show_all`; deprecate `show_all_values` (#131)
- rename `show_all_species_lists` to `show_all_lists`
- new function `show_all_apis` (#126)
- replace `search_profile_attributes` with `show_profile_values`
@mjwestgate
Copy link
Collaborator Author

This is closed, with the caveat that different atlases perform differently, and GBIF not currently supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant