Google Structured Data Testing Tool doesn't like mixing Schema.org and Dublin Core in our RDFa #1074

seth-shaw-unlv · 2019-03-29T15:29:55Z

Related to our SEO issue #882:

This is probably a documentation issue, but the Google Structured Data Tool doesn't like mixing Dublin Core and Schema.org terms.

Declaring something with a Schema.org Type (e.g. schema:ImageObject) and adding Dublin Core elements to it will throw errors because those properties are not in their scope. E.g.

The reverse is also true, if you add Schema.org properties (e.g. schema:sameas) to something that doesn't have a Schema.org type it will also complain:
It doesn't complain that the property is there, just that the PCDM type is not known to Google.

seth-shaw-unlv · 2019-03-29T15:37:45Z

I should have also noted that our default Repository Item works fine because the only fields using schema are the created/modified dates which don't appear to show up in the RDFa.

Also should have included an example of straight-up Dublin Core working fine:

Although, how much SEO benefit that actually gives us on those fields is unclear to me.

Also, some of the Dublin Core fields, such as subject, don't appear to work well:

You can see the subjects values on the left clearly (Costume design, Dancers, etc.) but the structured data tester says these subjects are of an "Unspecified Type". This may be an RDFa issue.

seth-shaw-unlv · 2019-03-29T15:43:18Z

That last bit with the subjects is my fault. That comes from the subject's href linking a resource the tester can't access. It actually works fine.

dannylamb · 2019-04-02T18:27:21Z

@seth-shaw-unlv If we give a resource both a schema and dc type then is it ok? Making everything a schema:article by default (in addition to pcdm) seems appropriate. Not sure about dc (I'm assuming there's a generic thing from the dcmi types we can use or something).

In theory that should appease our semantic robot overlords.

seth-shaw-unlv · 2019-04-02T19:35:00Z

@dannylamb Nope. Still grumpy. I added dcterms:BibliographicResource to the RDFa and Google was still mad, because the dcterm is being applied to a schema Type. It looks like Google doesn't like other vocabularies being used near schema things.

It looks like either all schema for a Node or none at all as far as Google is concerned.

In semi-related news, the Schema.org architypes proposal was accepted and added to schema.org! This makes schema-only descriptions a bit easier to do. BTW, has anyone thought to do a Dublin Core -> schema.org comparison/map? Could a repository conceivably abandon Dublin Core for pure schema.org without (much) loss?

DiegoPino · 2019-04-02T19:48:20Z

@seth-shaw-unlv, 2 cents here: one of the reasons why mixing and matching ontologies and properties from different ones is not such a good idea without making sure one property is valid in another's class definition/domain/ontology. Its a bit like the work on MODS to RDF mapping that happened in that great working group: It works for internal use, but is not semantically correct for exposing the data to the outside(and by saying that now i deserve to be hated).

Google tries to apply its Ontology validation correctly and in that one, if an Object is of type Schema:thing, only properties in that domain are valid. And google can not do Ontology Intersection, aligning nor inference, so specifically in RDFa it will try to match any property given to all classes. A better way of getting away with this is avoiding other ontologies in the RDFa(stick with schema) but embed a JSON-LD as script in the body. it is what Zenodo and DataCite are doing with great success. In that case your JSON-LD can have many contexts and Google will not comply (namespaces will match also because the expansion will only apply to the right RDF (or OWL) Class). Still, its good to check if a certain group of properties can freely be moved between ontologies, i highly recommend not doing that without validating.

seth-shaw-unlv · 2019-04-02T21:05:39Z

@DiegoPino I'm not seeing any examples that would allow us to use multiple ontologies in the JSON-LD and Google still not freaking out. The multiple contexts seem to mostly be used as namespace definitions (multiple mappings of predicates to a field names) but the resulting set of edges still results in a mixing of ontologies. The datacite examples I found of JSON-LD only use schema.org.

Having one set in the JSON-LD script tag and another in the RDFa doesn't work because Google appears to ignore the RDFa when it finds JSON-LD.

So, really, it looks like anything we want to hand off to Google needs to ontological consistency but we can index in our Fedora and triple-store whatever we want. This implies to me that we need to keep the JSON-LD just for indexing and have some way to either filter what gets pushed into the RDFa v. JSON-LD OR separate configs for each.

DiegoPino · 2019-04-02T22:19:10Z

Hi, i will share some examples with you tomorrow(on the phone now), google can handle some other stuff if inside json-ld. Contexts can in fact contain many ontologies (thats what namespaces are for amongs others) e.g the the iiif presentation context, uses quite a few. But also, you just answered your own issue :). Since you have basically no control in your islandora 8 architecture to remove some predicates from rdfa without affecting every other mapping you have in drupal to talk to fedora, etc, by having a simpler json-ld (and with that i say schema.org only seems the lowest barrier) embedded, you ensure google is happy and you can keep your full blown mix and match for your rdfa and triple store needs. Seems like a win win situation. Now you just need to embed it. El El mar, 2 de abr. de 2019 a las 17:05, Seth Shaw < notifications@github.com> escribió:

@DiegoPino <https://github.com/DiegoPino> I'm not seeing any examples that would allow us to use multiple ontologies in the JSON-LD and Google still not freaking out. The multiple contexts seem to mostly be used as namespace definitions (multiple mappings of predicates to a field names) but the resulting set of edges still results in a mixing of ontologies. The datacite examples I found <https://blog.datacite.org/schema-org-register-dois/> of JSON-LD only use schema.org. Having one set in the JSON-LD script tag and another in the RDFa doesn't work because Google appears to ignore the RDFa when it finds JSON-LD. So, really, it looks like anything we want to hand off to Google needs to ontological consistency but we can index in our Fedora and triple-store whatever we want. This implies to me that we need to keep the JSON-LD just for indexing and have some way to either filter what gets pushed into the RDFa v. JSON-LD OR separate configs for each. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1074 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGn857-BmTpQLGOHsjAGfxpFcOhC_shYks5vc8YkgaJpZM4cSpl5> .

-- Diego Pino Navarro Digital Repositories Developer Metropolitan New York Library Council (METRO)

dannylamb · 2019-04-03T14:37:10Z

@seth-shaw-unlv @DiegoPino https://www.drupal.org/project/schema_metatag does just that. We can set up how we want stuff for google and that gets embedded as jsonld. At that point there is a discrepancy between the RDFa and the embedded JSONLD and what goes in Fedora/Triplestore, but I guess Google's behaviour works in our favor there w/rt/t RDFa vs. embedded JSONLD. And really, we have no choice but to separate what Google wants and how users choose to model their data.

seth-shaw-unlv · 2019-04-11T17:42:06Z

Based on the devel call this week, this issue will likely wait until someone has an Islandora 8 site live and indexed by Google/Bing so we can test the real-world impact of multiple ontologies.

If it truly is a problem, then we can probably have a module pull in the JSON-LD and do a simple filter or map so only schema.org appears in the page's script element and trim the "_format=jsonld" off the URIs.

seth-shaw-unlv mentioned this issue Mar 29, 2019

Meta-Issue: SEO #882

Open

whikloj added this to the 1.x milestone Apr 11, 2019

kstapelfeldt added Linked Data labels Sep 9, 2021

kstapelfeldt added Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Subject: Metadata related to metadata issues. Consider also using the search tag. and removed Linked Data labels Sep 25, 2021

kstapelfeldt added this to Islandora Issues Queue Feb 8, 2022

kstapelfeldt moved this to Todo in Islandora Issues Queue Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Structured Data Testing Tool doesn't like mixing Schema.org and Dublin Core in our RDFa #1074

Google Structured Data Testing Tool doesn't like mixing Schema.org and Dublin Core in our RDFa #1074

seth-shaw-unlv commented Mar 29, 2019

seth-shaw-unlv commented Mar 29, 2019

seth-shaw-unlv commented Mar 29, 2019

dannylamb commented Apr 2, 2019 •

edited

Loading

seth-shaw-unlv commented Apr 2, 2019

DiegoPino commented Apr 2, 2019

seth-shaw-unlv commented Apr 2, 2019

DiegoPino commented Apr 2, 2019 via email

dannylamb commented Apr 3, 2019

seth-shaw-unlv commented Apr 11, 2019

Google Structured Data Testing Tool doesn't like mixing Schema.org and Dublin Core in our RDFa #1074

Google Structured Data Testing Tool doesn't like mixing Schema.org and Dublin Core in our RDFa #1074

Comments

seth-shaw-unlv commented Mar 29, 2019

seth-shaw-unlv commented Mar 29, 2019

seth-shaw-unlv commented Mar 29, 2019

dannylamb commented Apr 2, 2019 • edited Loading

seth-shaw-unlv commented Apr 2, 2019

DiegoPino commented Apr 2, 2019

seth-shaw-unlv commented Apr 2, 2019

DiegoPino commented Apr 2, 2019 via email

dannylamb commented Apr 3, 2019

seth-shaw-unlv commented Apr 11, 2019

dannylamb commented Apr 2, 2019 •

edited

Loading