Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could the IPT take DwCA that are updated online? #2608

Open
ManonGros opened this issue Dec 3, 2024 · 4 comments
Open

Could the IPT take DwCA that are updated online? #2608

ManonGros opened this issue Dec 3, 2024 · 4 comments
Assignees
Milestone

Comments

@ManonGros
Copy link
Contributor

The Specify platforms can generate Darwin Core Archives and makes them available online. These archives can be updated automatically.
Usually, we ask publishers to register the Archive endpoints directly (or helpdesk does it for them).

However, the metadata associated with these datasets isn't always what the publishers would like to share on GBIF. @spalp Asked if it would be possible to have the Darwin core archives content in the IPT but have the EML from the IPT.

@mike-podolskiy90
Copy link
Contributor

mike-podolskiy90 commented Dec 3, 2024

Thanks Marie
cc @spalp

@spalp
Copy link
Contributor

spalp commented Dec 3, 2024

Thank @marie. My question was whether some of the metadata fields could be taken as they are from the Archive, but others are automatically updated/added when ingested. The fields that are almost certain to change between archive versions are:

  1. Coverage: temporalCoverage, geographicCoverage and temporalCoverage,
  2. Additional metadata: dateSamp, citation

I just talked with @mike-podolskiy90 and he ensured me that even if we fetch a complete DwC from a URL, together with its EML, the option to automatically infer Geographic, Temporal and Taxonomic scope, if previously selected, should not be affected.

So, the only new feature I would like is the ability to provide a URL to a DwC-A to be regularly monitored and published via the IPT.

@mike-podolskiy90
Copy link
Contributor

Thanks Salza
This sounds like a useful feature. I would like to gather more opinions on how this should be implemented.

@mike-podolskiy90 mike-podolskiy90 self-assigned this Dec 3, 2024
@mike-podolskiy90 mike-podolskiy90 added this to the 3.2 milestone Dec 3, 2024
@mike-podolskiy90
Copy link
Contributor

My idea here is to create a new source type - DwCA (or URL/DwCA, we can discuss). IPT takes this archive from the provided URL and publishes it. I am not sure if we should unpack the archive and reassemble it with IPT with validation and all. I suppose it should, to ensure the quality of DwCA.

EML will be taken from the archive, but with the ability to automatically infer coverage metadata.

@gbif/dataproducts Andrea and Cecilie, maybe you have something to add here please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants