Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Structured Data Testing Tool doesn't like mixing Schema.org and Dublin Core in our RDFa #1074

Open
seth-shaw-unlv opened this issue Mar 29, 2019 · 9 comments
Labels
Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Subject: Metadata related to metadata issues. Consider also using the search tag.
Milestone

Comments

@seth-shaw-unlv
Copy link
Contributor

Related to our SEO issue #882:

This is probably a documentation issue, but the Google Structured Data Tool doesn't like mixing Dublin Core and Schema.org terms.

Declaring something with a Schema.org Type (e.g. schema:ImageObject) and adding Dublin Core elements to it will throw errors because those properties are not in their scope. E.g.
Screen Shot of the structured data tester showing the existence of errors.

The reverse is also true, if you add Schema.org properties (e.g. schema:sameas) to something that doesn't have a Schema.org type it will also complain:
Screen Shot showing the structured data tester throwing an error on the type because a schema property was used. It doesn't complain that the property is there, just that the PCDM type is not known to Google.

@seth-shaw-unlv
Copy link
Contributor Author

I should have also noted that our default Repository Item works fine because the only fields using schema are the created/modified dates which don't appear to show up in the RDFa.

Also should have included an example of straight-up Dublin Core working fine:
Screen Shot 2019-03-29 at 8 31 10 AM
Although, how much SEO benefit that actually gives us on those fields is unclear to me.

Also, some of the Dublin Core fields, such as subject, don't appear to work well:
Screen Shot 2019-03-29 at 8 35 54 AM
You can see the subjects values on the left clearly (Costume design, Dancers, etc.) but the structured data tester says these subjects are of an "Unspecified Type". This may be an RDFa issue.

@seth-shaw-unlv
Copy link
Contributor Author

That last bit with the subjects is my fault. That comes from the subject's href linking a resource the tester can't access. It actually works fine.

@dannylamb
Copy link
Contributor

dannylamb commented Apr 2, 2019

@seth-shaw-unlv If we give a resource both a schema and dc type then is it ok? Making everything a schema:article by default (in addition to pcdm) seems appropriate. Not sure about dc (I'm assuming there's a generic thing from the dcmi types we can use or something).

In theory that should appease our semantic robot overlords.

@seth-shaw-unlv
Copy link
Contributor Author

@dannylamb Nope. Still grumpy. I added dcterms:BibliographicResource to the RDFa and Google was still mad, because the dcterm is being applied to a schema Type. It looks like Google doesn't like other vocabularies being used near schema things.

Screen Shot 2019-04-02 at 12 26 14 PM

It looks like either all schema for a Node or none at all as far as Google is concerned.

In semi-related news, the Schema.org architypes proposal was accepted and added to schema.org! This makes schema-only descriptions a bit easier to do. BTW, has anyone thought to do a Dublin Core -> schema.org comparison/map? Could a repository conceivably abandon Dublin Core for pure schema.org without (much) loss?

@DiegoPino
Copy link
Contributor

@seth-shaw-unlv, 2 cents here: one of the reasons why mixing and matching ontologies and properties from different ones is not such a good idea without making sure one property is valid in another's class definition/domain/ontology. Its a bit like the work on MODS to RDF mapping that happened in that great working group: It works for internal use, but is not semantically correct for exposing the data to the outside(and by saying that now i deserve to be hated).

Google tries to apply its Ontology validation correctly and in that one, if an Object is of type Schema:thing, only properties in that domain are valid. And google can not do Ontology Intersection, aligning nor inference, so specifically in RDFa it will try to match any property given to all classes. A better way of getting away with this is avoiding other ontologies in the RDFa(stick with schema) but embed a JSON-LD as script in the body. it is what Zenodo and DataCite are doing with great success. In that case your JSON-LD can have many contexts and Google will not comply (namespaces will match also because the expansion will only apply to the right RDF (or OWL) Class). Still, its good to check if a certain group of properties can freely be moved between ontologies, i highly recommend not doing that without validating.

@seth-shaw-unlv
Copy link
Contributor Author

@DiegoPino I'm not seeing any examples that would allow us to use multiple ontologies in the JSON-LD and Google still not freaking out. The multiple contexts seem to mostly be used as namespace definitions (multiple mappings of predicates to a field names) but the resulting set of edges still results in a mixing of ontologies. The datacite examples I found of JSON-LD only use schema.org.

Having one set in the JSON-LD script tag and another in the RDFa doesn't work because Google appears to ignore the RDFa when it finds JSON-LD.

So, really, it looks like anything we want to hand off to Google needs to ontological consistency but we can index in our Fedora and triple-store whatever we want. This implies to me that we need to keep the JSON-LD just for indexing and have some way to either filter what gets pushed into the RDFa v. JSON-LD OR separate configs for each.

@DiegoPino
Copy link
Contributor

DiegoPino commented Apr 2, 2019 via email

@dannylamb
Copy link
Contributor

@seth-shaw-unlv @DiegoPino https://www.drupal.org/project/schema_metatag does just that. We can set up how we want stuff for google and that gets embedded as jsonld. At that point there is a discrepancy between the RDFa and the embedded JSONLD and what goes in Fedora/Triplestore, but I guess Google's behaviour works in our favor there w/rt/t RDFa vs. embedded JSONLD. And really, we have no choice but to separate what Google wants and how users choose to model their data.

@seth-shaw-unlv
Copy link
Contributor Author

Based on the devel call this week, this issue will likely wait until someone has an Islandora 8 site live and indexed by Google/Bing so we can test the real-world impact of multiple ontologies.

If it truly is a problem, then we can probably have a module pull in the JSON-LD and do a simple filter or map so only schema.org appears in the page's script element and trim the "_format=jsonld" off the URIs.

@whikloj whikloj added this to the 1.x milestone Apr 11, 2019
@kstapelfeldt kstapelfeldt added Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Subject: Metadata related to metadata issues. Consider also using the search tag. and removed Linked Data labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Subject: Metadata related to metadata issues. Consider also using the search tag.
Projects
Development

No branches or pull requests

5 participants