-
Notifications
You must be signed in to change notification settings - Fork 7
Roadmap for merging with STAC #21
Comments
I guess you could just start with a branch here?
That's meant to be
First week of May sounds good to me. Doodle it good, too. |
Thank you @rabernat @jhamman & @m-mohr for putting this together! Looking forward to seeing this brought to completion.
The first week of May works for me as well. |
Fixed I have created a Doodle here: The goals of the hack session are:
If time permits, we can start updating processing tools (e.g. pangeo catalog, intake-esm) to adapt to the new conventions. However, this is not the main goal. Anything I missed? |
The winning time is May 7 THU 1:00 PM - 3:00 PM EDT We can use https://whereby.com/pangeo to chat / coordinate. |
A little bit of updates before the telco: Based on the last telco, I tried to come up with a new example. I think it better aligns both specs. The biggest change and probably biggest point of discussion is splitting the vocabulary links into assets and a separate array of attribute names. {
"stac_version": "0.9.0",
"stac_extensions": [
"collection-assets",
"https://github.com/NCAR/esm-collection-spec/tree/master/schema.json"
],
"id": "pangeo-cmip6",
"title": "Google CMIP6",
"description": "This is an ESM collection for CMIP6 Zarr data residing in Pangeo's Google Storage.",
"extent": {
"spatial": {
"bbox": [[-180, -90, 180, 90]]
},
"temporal": {
"interval": [["1850-01-15T12:00:00Z", "2014-12-15T12:00:00Z"]]
}
},
"providers": [
{
"name": " World Climate Research Programme",
"roles": ["producer","licensor"],
"url": "https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6"
},
{
"name": "The Pangeo Project",
"roles": ["processor"],
"url": "https://console.cloud.google.com/pangeo.io"
},
{
"name": "Google",
"roles": ["host"],
"url": "https://console.cloud.google.com/marketplace/details/noaa-public/cmip6"
}
],
"license": "proprietary",
"links": [
{
"href": "https://pcmdi.llnl.gov/CMIP6/TermsOfUse/TermsOfUse6-1.html",
"type": "text/html",
"rel": "license",
"title": "CMIP6: Terms of Use"
}
],
"assets": {
"thumbnail": {
"href": "logo.png",
"title": "A preview image for visualization.",
"type": "image/png",
"roles": ["thumbnail"]
},
"catalog": {
"href": "sample-pangeo-cmip6-zarr-stores.csv",
"title": "Catalog",
"description": "Path to a the CSV file with the catalog contents.",
"type": "text/csv",
"roles": ["esm-catalog"],
"esm:column_name": "path"
},
"activity_id": {
"href": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_activity_id.json",
"type": "application/json",
"roles": ["esm-vocabulary"],
"esm:column_name": "activity_id"
},
"source_id": {
"href": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_source_id.json",
"type": "application/json",
"roles": ["esm-vocabulary"],
"esm:column_name": "source_id"
},
"institution_id": {
"href": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_institution_id.json",
"type": "application/json",
"roles": ["esm-vocabulary"],
"esm:column_name": "institution_id"
},
"experiment_id": {
"href": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_experiment_id.json",
"type": "application/json",
"roles": ["esm-vocabulary"],
"esm:column_name": "experiment_id"
},
"table_id": {
"href": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_table_id.json",
"type": "application/json",
"roles": ["esm-vocabulary"],
"esm:column_name": "table_id"
},
"grid_label": {
"href": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_grid_label.json",
"type": "application/json",
"roles": ["esm-vocabulary"],
"esm:column_name": "grid_label"
}
},
"esm:catalog": {},
"esm:attributes": ["activity_id", "source_id", "institution_id", "experiment_id", "member_id", "table_id", "variable_id", "grid_label"],
"esm:aggregation_control": {
"variable_column_name": "variable_id",
"groupby_attrs": [
"activity_id",
"institution_id",
"source_id",
"experiment_id",
"table_id",
"grid_label"
],
"aggregations": [
{
"type": "join_new",
"attribute_name": "member_id",
"options": { "coords": "minimal", "compat": "override" }
},
{
"type": "join_existing",
"attribute_name": "time_range",
"options": { "dim": "time" }
},
{
"type": "union",
"attribute_name": "variable_id"
}
]
}
} There were recently also some discussions in STAC on how to best integrate things like zarr. Based on radiantearth/stac-spec#779 I'm working on collection-level assets (PR is coming in the next hours), which we'll probably use for the ESM collection extension. There also have been discussions on how we could allow Items to represent "parts" of a zarr archive and came up with nullable timestamps (see radiantearth/stac-spec#798). |
I've had some family stuff come up, so may miss thursday meeting completely, and at the very least will likely be in and out. But I don't think I'm core to it - psyched to see what the group comes up with! |
Hi All! I'm looking forward to little sprint today at 1pm EST. I suggest we convene briefly at https://whereby.com/pangeo at 1pm to discuss our work plan. |
Sounds good 👌! I will be there at 1pm. |
Great work today. I went through the example PRs with the new JSON schema in #27 and left comments how they could validate. |
Hi Folks--sorry for letting this hang for so long. I'd like to get the PRs merged asap. It seems like the only PR missing is @jhamman's narrative description of the new spec. Am I remembering things correctly? I have assigned reviewers to all the PRs. Let's get them reviewed, approved, and merged. |
It seems there are some points left for discussion, especially self-contained catalogs (i.e. esm:catalog). |
Just wanted to drop a quick note here to highlight the upcoming STAC sprint (https://medium.com/radiant-earth-insights/join-us-for-stac-sprint-6-our-first-fully-remote-event-28e118a5279c). Might be a good opportunity to push things forward on the esp spec front. |
Would definitely be great if people could join. I'd really love to get at least a small sample zarr+stac catalog up. May even be able to structure some sort of 'prize' to make that happen, as there are sponsors interested in seeing this happen, and I think it'd be a great test to ensure STAC is ready for 1.0 |
I'll be available the first and last day of the data sprint until around 11pm CEST, if you need me for anything. |
I'm curious how this issue has progressed. Are we any closer to being able to catalog our cloud-based data in STAC? Is there a way I can help? |
This is a follow up to the discussion in radiantearth/stac-spec#713 (comment).
On 2020-04-20, we had a call with myself, @jhamman, @cholmes, @m-mohr, and @matthewhanson. The aim was to make progress on something everyone wants: to merge esm collection spec with STAC. That was our intention from the beginning, but we chose to fork temporarily to get something working fast.
The goal for now is to do as minimal changes as possible to make this work. My recollection of the meeting is that there are two steps to the proposed plan:
esm
extention as a new valid STAC extension. That extension will probably need to live in a new repo (I proposeNCAR/stac-esm
), or alternatively this repo could morph into that project. The esm extension would include most of the custom fields. we have defined already for esm-collection-spec.esm-collection.json
files as valid STAC Collections. This means adding some additional required metadata fields per the collection spec. These collections will use the esm extension.In radiantearth/stac-spec#713 (comment), @m-mohr worked up a really nice example of how it might look. During the meeting, we agreed that we won't try to also use the
datacube
extension. That is an eventual goal as well, but we noted several challenges in terms of reconciling datacube with Zarr and CF metadata.So here I repeat @m-mohr's example minus the datacube part
One thing I changed was to define the
role
for theasset
asesm-catalog
rather thancatalog
. This can hopefully let a processor (likeintake-esm
) know that this asset has a special role within the esm extension.I'd love some feedback on whether I remembered the meeting accurately (it was a few days ago and our notes were sparse) and whether this sounds like a good plan. The STAC folks proposed organizing a 2-hour spring to bang this out, and I think that's a great idea. I would not be free until the first week of May. If others agree (particularly need help from @andersy005 and @charlesbluca), I'll send out a Doodle.
The text was updated successfully, but these errors were encountered: