Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment with schema.org #285

Closed
m-mohr opened this issue Oct 12, 2018 · 9 comments
Closed

Alignment with schema.org #285

m-mohr opened this issue Oct 12, 2018 · 9 comments
Milestone

Comments

@m-mohr
Copy link
Collaborator

m-mohr commented Oct 12, 2018

Now that we have the Google Dataset Search and schema.org vocabulary, I think we should evaluate whether we want to somehow align or support schema.org with STAC. I'm not really into schema.org yet, but it seems like a good idea to make STAC datasets even better discoverable by the Google Dataset Search. Opinions? Anyone with more insights into schema.org?

cc @simonff

@cholmes
Copy link
Contributor

cholmes commented Oct 15, 2018

Afaik google dataset earch and schema.org stuff is on html pages, not json. Like JSON-LD isn't crawled by search engines. We originally talked about an HTML version of the spec, but a better solution seems to be STAC Browser where we can generate html from the json.

So I think we may just need it so STAC Browser html does the mapping to schema.org / dcat / linked data stuff that gets crawled, and then publish that mapping as a recommendation.

But a big +1 to aligning - I just think we might have an alternate route to achieving the goal.

cc @mojodna

@m-mohr
Copy link
Collaborator Author

m-mohr commented Oct 15, 2018

Can Google crawl dynamically generated JS-based Vue.js pages? @simonff

@cholmes
Copy link
Contributor

cholmes commented Oct 15, 2018

If that doesn't work @mojodna had an idea to use another tool to autogenerate the html, either ahead of time or as needed. So worst case people could crawl their own json and write out the html so it can be crawled.

@simonff
Copy link

simonff commented Oct 15, 2018

Google can crawl anything that looks like a human-oriented HTML page, though we might have to make sure the crawl goes through all the pages correctly in case of paged results.

I'm talking with the Dataset Search people about setting up the process for aligning schema.org, DCAT and STAC vocabularies.

@cholmes cholmes added this to the 0.7.0 milestone Dec 28, 2018
@cholmes cholmes modified the milestones: 0.7.0, future Apr 22, 2019
@cholmes
Copy link
Contributor

cholmes commented Apr 22, 2019

There's lots of progress on this on #378 and we are writing up html best practices in #32

Those do not quite 'finish' this - ideally we have something in the spec that talks explicitly about this. But moving this to 'future' for now, and we'll make stac browser the focus of our alignment and then eventually pull its learnings into the spec.

@dazza-codes
Copy link

dazza-codes commented Apr 23, 2019

I have experience with ontologies and linked-data solutions, having worked with the ontology group at Stanford and the Stanford library on several linked-data projects. However, there is a lot of work to be done on GIS ontologies and linked-data for catalog systems. The technology that is most aligned with STAC is obviously JSON-LD, but defining the context for STAC needs some work. I'm generally open to interests and discussions about EOS metadata standards (CF) and GIS metadata as ontologies and linked data. Related projects:

Also, with regard to linking data with publications:

@cholmes
Copy link
Contributor

cholmes commented Apr 23, 2019

Awesome @darrenleeweber! Your help would be great.

We have flirted with trying to make STAC JSON into JSON-LD, but stopped short of going all in since then we'd give up geojson compatibility, which is pretty important for working right in geospatial tools. There is a geojson-ld, but it doesn't have much traction yet, unfortunately.

It would be awesome if we could get to a JSON-LD version of the core STAC JSON, especially one that is compatible with google's dataset search. STAC Browser does have a mapping to JSON-LD, example:

{
  "@context": "https://schema.org/",
  "@type": "Dataset",
  "name": "20170831_162740_ssc1d1",
  "identifier": "20170831_162740_ssc1d1",
  "keywords": [
    "disaster",
    "open"
  ],
  "license": "https://spdx.org/licenses/CC-BY-SA-4.0.html",
  "isBasedOn": "https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/20170831_162740_ssc1d1.json",
  "url": "/item/5k3UqPNLpDJMxoAfw1YUV9y9QsbZpgkBacBWwUJ9/3MxsQZbdxjScFVpNqiHrDSMjKgPQo9Uq1JYtn2CAwxwSj9F/HuwgeSATdb44aYX7WSCsDbh9RmKQG6FvYNWicrF6j51Kqqfg1AE2TmRv59g9W1bBcjF",
  "workExample": [],
  "includedInDataCatalog": [
    {
      "isBasedOn": "https://storage.googleapis.com/pdd-stac/disasters/catalog.json",
      "url": "2smqrzEmZVVnN6aCM"
    },
    {
      "isBasedOn": "https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/catalog.json",
      "url": "5k3UqPNLpDJMxoAfw1YUV9y9QsbZpgkBacBWwUJ9"
    }
  ],
  "spatialCoverage": {
    "@type": "Place",
    "geo": {
      "@type": "GeoShape",
      "box": "-95.46998977661133 28.872261720487128 -95.23927688598633 29.064872627755797"
    }
  },
  "temporalCoverage": "2017-08-31T16:27:42.176605Z",
  "distribution": [
    {
      "contentUrl": "https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/SkySat_Freeport_s03_20170831T162740Z.png",
      "fileFormat": "image/png",
      "name": "Thumbnail"
    },
    {
      "contentUrl": "https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/SkySat_Freeport_s03_20170831T162740Z.tif",
      "fileFormat": "image/vnd.stac.geotiff; cloud-optimized=true",
      "name": "SkySatScene Visual GeoTIFF"
    }
  ],
  "image": "https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/SkySat_Freeport_s03_20170831T162740Z.png"
}

But it doesn't attempt to capture all the STAC fields, just to map the relevant bits for dataset search.

If you want to come to the next sprint this can definitely be a topic.

@dazza-codes
Copy link

dazza-codes commented Apr 23, 2019

I can't make the next sprint due to company commitments on those dates, but I will keep an eye on these developments and help where I can. Consider extending an invite to @gkellogg since he is bay-area native too. Perhaps also @azaroth42 might have some interest in these catalog developments.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Jul 22, 2019

This is basically tackled by #378 and more or less a duplicate. See #378 for further discussions.

@m-mohr m-mohr closed this as completed Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants