Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context must be a map #151

Closed
fils opened this issue Feb 17, 2021 · 41 comments
Closed

context must be a map #151

fils opened this issue Feb 17, 2021 · 41 comments
Milestone

Comments

@fils
Copy link
Collaborator

fils commented Feb 17, 2021

So I have been running into this with @smrgeoinfo and I saw it in the example by @datadavev

Using Dave's example of

{
  "@context":"https://schema.org/",
  "@type":"Dataset",
  "name":"test",
  "description": "This is a description of the test. Here's some more words to make it long enough."
}

If you place this in the JSON-LD playground link you will see it expands to http, not https

modify the context to a map as

{
  "@context": {
    "@vocab": "https://schema.org/"
  },
  "@type": "Dataset",
  "name": "test",
  "description": "This is a description of the test. Here's some more words to make it long enough."
}

It will expand correctly with https as at https://tinyurl.com/y99kj7d7

reference https://www.w3.org/TR/json-ld/#context-definitions

specifically:

A context definition MUST be a map whose keys MUST be either terms, 
compact IRIs, IRIs, or one of the keywords @base, 
@import, @language, @propagate, @protected, @type, @version, or @vocab.

It would appear that we need to make sure examples and recommendations (at least if we want JSON-LD 1.1, which I suspect this is part of) must be maps.

I've been running into this issue in some of my development work....
Comments and observations welcome..

@datadavev
Copy link
Collaborator

Contexts can either be directly embedded into the document (an embedded context) or be referenced using a URL.
-- w3.org/TR/json-ld11/

The JSON-LD processor makes a request like:

 curl -v -H "Accept: application/ld+json" "https://schema.org/" > /dev/null

it gets back a response that includes a link:

link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

That is followed to the context document located at https://schema.org/docs/jsonldcontext.jsonld which is the remote context referenced in the example. That context specifies, among other items:

"schema": "http://schema.org/",

Hence, the properties are expanded with the namespace http://schema.org/.

This is exactly why we needed clarification on the "https" vs "http" namespace issue in #52.

I agree that sticking with https://schema.org/ as the namespace does require specifying the default context like:

"@context: {"@vocab":"https://schema.org/"}

@fils
Copy link
Collaborator Author

fils commented Feb 17, 2021

@datadavev

Thanks for the nice expansion...

Going further you can look at the context file pulled down and look for http

https is sadly missing
and curl for either https://schema.org/docs/jsonldcontext.jsonld or http://schema.org/docs/jsonldcontext.jsonld returns the
same file.. don't get me started..

looking for http (or https via substring match) we get

~/tmp grep http jsonldcontext.json 
        "@vocab": "http://schema.org/",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "xsd": "http://www.w3.org/2001/XMLSchema#",
        "schema": "http://schema.org/",
        "owl": "http://www.w3.org/2002/07/owl#",
        "dc": "http://purl.org/dc/elements/1.1/",
        "dct": "http://purl.org/dc/terms/",
        "dctype": "http://purl.org/dc/dcmitype/",
        "void": "http://rdfs.org/ns/void#",
        "dcat": "http://www.w3.org/ns/dcat#",
        "httpMethod": { "@id": "schema:httpMethod"},

yet.. in the example https://tinyurl.com/y99kj7d7 things correctly expand to their https namespace, not http. Any insight into why this is the case?

This seems like it should not occur is the above context is pulled. Seems like application logic coming into play perhaps?

@datadavev
Copy link
Collaborator

datadavev commented Feb 17, 2021

This is the challenge of namespace ambiguity introduced by the "s". Despite progression towards a duality of schema.org concepts under http and https, the official and current context for schema.org resides at https://schema.org/docs/jsonldcontext.jsonld and that context specifies http://schema.org/ as the namespace.

Writing:

"@context": {"@vocab":"https://schema.org/"}

tells the JSON-LD processor that the entire context definition for the document is exactly the map that is the value of the "@context" key. Since that map does not contain a reference to a remote context (i.e. using the @import key), that map is the entirety of the context and so the JSON-LD processor does not retrieve a remote context when processing the document. Instead, the default context IRI specified by the value of @vocab is used to expand the relative IRIs in the document. Dataset is equal to https://schema.org/Dataset.

It's important to note that remote contexts are retrieved by a JSON-LD processor by following the spec for Remote Document and Context Retrieval. Basically, requests are made, following 303 redirects and using a Accept: application/ld+json header. Steps 4 and 5 therein describe how Link headers in the response are handled, and this step is typically not visible when using curl and other common HTTP clients unless specifically looking for that information.

Anyway, the outcome of all this is that specifying a context of "@context":{"@vocab":"https://schema.org/"} means that is the entire context. Specifying "@context":"https://schema.org/" means the JSON-LD processor will go and fetch a context document from that IRI, and that document provides the context map that uses a namespace of http://schema.org/ for the schema.org terms.

This of course does have much broader implications, since in specifying the context of "@vocab":"https://schema.org/", none of the information in the remote context is being retrieved and utilized in the processing of the document.

[edit: added note on default context]

@fils
Copy link
Collaborator Author

fils commented Feb 17, 2021

It is as I figured.... I appreciate the confirmation though. Sigh. From a developer POV, this little "s" really cause a lot of "hit" (sorry. there is a missing "s" in that "hit") ;)

@datadavev
Copy link
Collaborator

It's a widespread challenge, e.g. RDFLib/rdflib#1120

@smrgeoinfo
Copy link
Contributor

its the cost of conflating the location of the resolver to dereference an identifier with the identifier.

@datadavev
Copy link
Collaborator

Note that this issue will vaporize when schema.org v 12 comes out in March.

See: https://github.com/schemaorg/schemaorg/blob/main/data/releases/12.0/schemaorgcontext.jsonld

@fils
Copy link
Collaborator Author

fils commented Feb 18, 2021

@datadavev you made my day!!!!!!!

@datadavev
Copy link
Collaborator

Big relief for me too - there's a whole bunch of normalization code and gymnastics that can go away. Huzzah!

@bonnland
Copy link

bonnland commented Dec 15, 2021

Hi, could someone confirm if these two @context definitions are different or equivalent now? I'm seeing both forms in the ESIP recommendations examples, and I want to know if there is a "more correct" version:

{
  "@context": "https://schema.org/",
  "@type": "Dataset",
  "author": {
    "@type": "Person",
    "name": "Jane Goodall"
  }
}

vs.

{
  "@context": {
    "@vocab": "https://schema.org/"
  },  
  "@type": "Dataset",
  "author": {
    "@type": "Person",
    "name": "Jane Goodall"
  }
}

@fils
Copy link
Collaborator Author

fils commented Dec 15, 2021

@bonnland

The first is valid for JSON-LD 1.0
The second for JSON-LD 1.1

If you are working at this point forward, you should be using the map, the second one.

@mbjones
Copy link
Collaborator

mbjones commented Dec 15, 2021

We probably should update all of our examples to use the recommended form.

@datadavev
Copy link
Collaborator

Those two contexts are quite different. The first basically indicates "use the context that you can find at this address" (remote context 1), the second "the default context for this document is this value" (default vocabulary2).

Footnotes

  1. https://www.w3.org/TR/json-ld11/#example-5-referencing-a-json-ld-context

  2. https://www.w3.org/TR/json-ld11/#default-vocabulary

@fils
Copy link
Collaborator Author

fils commented Dec 15, 2021

@datadavev I get your point.. that is only true in the context (no pun intended) that you view the document as a JSON-LD 1.1 document in both cases, correct?

I need to revisit now why I had processing errors in 1.1 mode with the previous approach when, as you point out, it seems a valid 1.1 pattern for remote context. (though that seems very poorly worded in the docs.. since all the contexts are typically web resolved in principle)

oddly there is

A context definition MUST be a map whose keys MUST be either terms, compact IRIs, IRIs, or one of the keywords @base, @import, @language, @propagate, @protected, @type, @version, or @vocab.

which seems at odds with the remote context reference https://www.w3.org/TR/json-ld11/#example-5-referencing-a-json-ld-context

Have you had the previous (un-mapped version) fail in a forced 1.1 process? I have.

@fils
Copy link
Collaborator Author

fils commented Dec 15, 2021

@datadavev Is it just me or the docs say...

"a context MUST be a map, except when it's not a map and then it is a remote context, though you can use @import for a remote context too, to make the context a map.... oh .. and any context you provide that isn't relative, is pulled remotely based on the IRI you provide" (this seems even more fun to read if you do it in an English accent) ;)

that seems less than wonderful :)

@datadavev
Copy link
Collaborator

it is messy, and further complicated by the opacity of what can go on behind the scenes when retrieving a remote context 1.

If the value of @context is a relative or absolute URL, the document retrieved from that URL becomes the context.

In this case:

{
  "@context": "http://shorturl.at/ciqMW",
  "title": "A remote context doc"
}

the contents of the document retrieved by following the rules for JSON-LD retrieval becomes the context. That URL resolves to the JSON-LD:

{
  "@context": {
    "@vocab":"http://a.b/c/"
  }
}

That JSON-LD is processed like:

{
  "@context": {
    "@vocab":"http://a.b/c/"
  },
  "title": "A remote context doc"
}

and so expands like:

[
  {
    "http://a.b/c/title": [
      {
        "@value": "A remote context doc"
      }
    ]
  }
]

On the other hand, if the value of @context is a map, then that map becomes the context. So for example:

{
  "@context": {
    "@vocab": "http://shorturl.at/ciqMW/"
  },
  "title": "A local context doc"
}

The context is exactly as written, and the document expands to:

[
  {
    "http://shorturl.at/ciqMW/title": [
      {
        "@value": "A local context doc"
      }
    ]
  }
]

Footnotes

  1. https://www.w3.org/TR/json-ld11-api/#loaddocumentcallback, especially steps 4-5

@fils
Copy link
Collaborator Author

fils commented Dec 15, 2021

@datadavev

Your post above really needs to go into the docs and It's more clear the JSON-LD docs IMHO. I do follow what you are saying and based on that I think I have a bug report to make up for a JSON-LD lib I use. :)

@mbjones mbjones added this to the v1.3 milestone Jan 27, 2022
@mbjones
Copy link
Collaborator

mbjones commented Jan 27, 2022

Just to clarify all of this, I think our recommendations have shifted but we have not updated our documentation. Now that schema.org has clarified that the true namespace is http://schema.org/, but that https://schema.org/ can be used to retrieve a context file, I think this is what we are recommending:

  1. Best option for context
{
  "@context": {
    "@vocab": "http://schema.org/"
  },  
  "@type": "Dataset",
  "author": {
    "@type": "Person",
    "name": "Jane Goodall"
  }
}
  1. Acceptable for context
{
  "@context": "http://schema.org/",  
  "@type": "Dataset",
  "author": {
    "@type": "Person",
    "name": "Jane Goodall"
  }
}

OR

{
  "@context": "https://schema.org/",  
  "@type": "Dataset",
  "author": {
    "@type": "Person",
    "name": "Jane Goodall"
  }
}
  1. Incorrect / invalid as it produces the wrong namespace (https)
{
  "@context": {
    "@vocab": "https://schema.org/"
  },  
  "@type": "Dataset",
  "author": {
    "@type": "Person",
    "name": "Jane Goodall"
  }
}

If this is right, we need to updated all docs, guidelines, examples, and shacl rules.

@mbjones
Copy link
Collaborator

mbjones commented Jan 27, 2022

Started branch feature_151_context_namespace for fixing the namespace context consistency issues. More changes needed before we have a consistent set of guides.

@datadavev
Copy link
Collaborator

(1) has the effect of setting the default vocabulary. (2) has the effect of including the context statements defined in the referenced context document.

Effectively (1) replaces the document https://schema.org/docs/jsonldcontext.jsonld with the document:

  "@context": {
    "@vocab": "http://schema.org/"
  }

Hence, the general recommendation would be (2).

@smrgeoinfo
Copy link
Contributor

@mbjones in your recent post it says "schema.org has clarified that the true namespace is http://schema.org", but in the examples
'http://schema.org/' is used (with the terminal backslash). I'm guessing the true namespace should be http://schema.org/?

@datadavev
Copy link
Collaborator

For reference, the schema.org context document, and so namespace definition, is located here: https://schema.org/docs/jsonldcontext.jsonld

@smrgeoinfo
Copy link
Contributor

the @vocab there is http://schema.org/, there's my answer. Thanks!

@mbjones
Copy link
Collaborator

mbjones commented Jan 27, 2022

Thanks for the clarifications, and yes, I should have said http://schema.org/. I'll go fix that.

@mbjones
Copy link
Collaborator

mbjones commented Jan 27, 2022

So, if the preference is for option 2, in our full example, how do we define the additional namespaces we need? Right now, on the branch I have the full.jsonld example as:

"@context": {
"@vocab": "http://schema.org/",
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
"spdx": "http://spdx.org/rdf/terms#"
}

Should the guidance be that we recommend option 2, except for when people need to define additional namespace prefixes?

@datadavev
Copy link
Collaborator

datadavev commented Jan 27, 2022

"@context": [
    "https://schema.org/",
    {
        "prov": "http://www.w3.org/ns/prov#",
        "provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
        "spdx": "http://spdx.org/rdf/terms#"
    }
]

[edit: use https for schema.org retrieval]

@fils
Copy link
Collaborator Author

fils commented Jan 27, 2022

So to be clear the schema.org FAQ at https://schema.org/docs/faq.html#19 is now wrong? Schema.org is saying to use http? Also the developer section at https://schema.org/docs/developers.html shows there are multiple context files for the various namespace approaches. Yet our recommendation is to stick with the old http pattern?

@datadavev
Copy link
Collaborator

datadavev commented Jan 27, 2022

I think the FAQ is a bit misleading. The namespace is http://schema.org/, associated documents (such as the context) can be retrieved using http or https. The context document for schema.org defines the namespace and that is currently located at https://schema.org/docs/jsonldcontext.jsonld.

However, just to confuse things more, there are http and https variants of the vocabulary!

@fils
Copy link
Collaborator Author

fils commented Jan 27, 2022

That's what I mean.. the multiple vocab elements. I understand all of this. and I appreciate that currently the https file call returns http namespaced file (which I don't agree with) :)

this just worries me... it's a kicking the can down the road event IMHO.

agree to disagree I guess

@datadavev
Copy link
Collaborator

Adding to the confusion, some libraries, e.g. RDFLib internally define constants for common namespaces, and it is using https://schema.org/ as the namespace. So I guess be prepared to be flexible.

@fils
Copy link
Collaborator Author

fils commented Jan 27, 2022

The libraries are going to be a pain.. major pain..
Also, you can't content negotiate for the schema.org JSON-LD context anyway. Due to DOS issues they don't allow it so then libraries have to implement the resolution as a special case.

you can't curl negotiate at https://schema.org for the context.

@datadavev
Copy link
Collaborator

Right, there's a different set of rules beyond simple content negotiation1 for finding the context - need to look at the response link header:

link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

This is also something that is poorly implemented in the major libs (pyld and rdflib at least). I use a patched version of pyld to get around this issue and honor the json-ld processing rules in the spec.

Footnotes

  1. https://www.w3.org/TR/json-ld11-api/#remote-document-and-context-retrieval

@fils
Copy link
Collaborator Author

fils commented Jan 27, 2022

right..

curl -v https://schema.org
...
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
...

and I get it.. (literal and figurative) ;)

As you point out though the issues with the python libraries (same as in the Go libraries by the way)..
This is an implementation mess... my point though is that the trend in general will be toward https not away and since both namespace uses are accepted by schema.org (unless that policy is now changed?) we are tossing out future LOD patterns if we go http since the data web will be https, it has to be.

I'm not trying to change any minds. It sounds like it is already a done deal. I just have to resolve how to connect the other groups I work who are https focused now with SOS which will be http focused.

@mbjones
Copy link
Collaborator

mbjones commented Jan 27, 2022

I don't think it's a done deal if @fils and @datadavev aren't on board -- you two have more practical experience with this than anyone I know. I am just trying to clean up our recs and be consistent. And I don;'t have a strong opinion myself -- I agree the future is https, but thought SO had decided to stick with http in their context doc. If there is a straightforward way for us to recommend https where most libs and the shacl processor, etc would recognize the terms as SO properly, then that has advantages. But given that https://schema.org/ returns a JSON-LD context with the http namespace, it seems like they are still using http. Please, propose what you think we should do, and how providers and consumers should handle it.

@fils
Copy link
Collaborator Author

fils commented Jan 27, 2022

You are correct there.. their default is to return the http namesapce even though they are rather indecisive elsewhere in their documentation. The result of that unfortunately is they seed confusion and delay (cue Thomas the Tank Engine) in the library developers and elsewhere. :)

@datadavev
Copy link
Collaborator

datadavev commented Jan 28, 2022

Science-on-schema.org is about recommendations for application of schema.org to this domain, and so my impression is this group should not be overriding the specification. Hence, the recommendation here should be to use the namespace as published, which would be http://schema.org/. Options for specifying the context then include:

  1. {
      "@context":"https://schema.org/"
    }
  2. {
      "@context":"https://schema.org/docs/jsonldcontext.jsonld"
    }
  3. {
      [
        "@context":"https://schema.org/",
        {
          "prov": "http://www.w3.org/ns/prov#",
          "provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
          "spdx": "http://spdx.org/rdf/terms#"
        }
      ]
    }
  4. {
      "@context": {
          "@vocab": "http://schema.org/"
      }
    }

Where:

  1. Remote context reference (note that http or https may be specified here)
  2. Functionally equivalent to (1). The JSON-LD processor should resolve to this document from (1) if it implements the specification for following link headers.1
  3. Remote context reference for schema.org and including other namespaces. Note that other remote contexts may also be specified in the list.
  4. Ignores the remote schema.org context, but makes http://schema.org/ the default namespace for the document.

Implementors should be aware that this may change in the future (i.e. "http" -> "https") and that existing implementations may internally use "https://schema.org/" as the namespace (e.g. RDFLib). Hence consumers should probably be applying namespace normalization to schema.org content to ensure consistent interpretation in an RDF processing environment.

Footnotes

  1. https://www.w3.org/TR/json-ld11-api/#remote-document-and-context-retrieval

@smrgeoinfo
Copy link
Contributor

+1 on Recommending namespace normalization. Dealing with the two namespaces has been an ongoing challenge with metadata integration in EarthCube GeoCodes, requiring messy SPARQL queries.

mbjones added a commit that referenced this issue Jan 29, 2022
@mbjones
Copy link
Collaborator

mbjones commented Jan 29, 2022

OK, summarizing... going with Dave's examples, I'll write up a plan to recommend using the http namespace definition (as SO uses by default) by retrieving the context file from the https location, noting that its also possible to retrieve it from the http location, and that the @vocab default can be used with http as well. We don't recommend using @vocab with the https URL, but harvesters and processors should in general normalize and treat https versions of the terms as equivalent to the http terms for SO. Finally, if one needs to include multiple namespaces, that can be done by building a context map from the retrieved context file plus additional namespace definitions. In my testing, I think the syntax in Dave's examples was a little turned around, so I think we should be using:

{
  "@context": [
    "https://schema.org/",
    {
      "prov": "http://www.w3.org/ns/prov#",
      "provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
      "spdx": "http://spdx.org/rdf/terms#"
    }
  ],
  "@type": "Dataset",
  "name": "Test data",
  "prov:wasDerivedFrom": {
    "@id": "https://doi.org/10.xxxx/Dataset-1"
  }
}

@mbjones
Copy link
Collaborator

mbjones commented Jan 29, 2022

Work on branch feature_151_context_namespace:

  • Update all example files in examples
  • Write context summary for GETTING_STARTED
  • Update all context statements in examples in guidelines md files
  • Update decision 52 document

mbjones added a commit that referenced this issue Feb 1, 2022
Feature 151 context namespace. 

Note that, while this PR is being merged to develop now to keep things clean because of the extent of changes, comments are still welcome on issue #151 . If no changes are deemed necessary, this PR will be released with v 1.3.
@mbjones
Copy link
Collaborator

mbjones commented Feb 1, 2022

Checked that shapes all validate with the namespace changes on our example files, and merged PR #199. This issue will remain open for commentary for a bit longer, but the planned changes are now merged into develop.

@mbjones
Copy link
Collaborator

mbjones commented Feb 7, 2022

Reviewed at meeting on 7 Feb 2022 -- agreed it was complete, but reopen this issue if discrepancies are found.

@mbjones mbjones closed this as completed Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants