-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specific recommendations for metadata distribution info #4
Comments
Good topic! It doesn't seem like schema.org has a defined way of describing related resources. It looks like this is being discussed by the W3C Data Exchange Working Group w3c/dxwg#482 where ORE got mentioned w3c/dxwg#482 (comment) |
dxwg#482 is about distributions in which the files in a package are distinct but all are necessary. A pointer to richer metadata is different. Perhaps https://www.w3.org/TR/vocab-dcat-2/#Property:resource_relation with subrelation dct:ReferencedBy? |
Note that @smrgeoinfo incorporated such a link in IEDA JSON-LD as an entry in the {
"@type": "DataDownload",
"additionalType": "http://www.w3.org/ns/dcat#DataCatalog",
"encodingFormat": "text/xml",
"name": "ISO Metadata Document",
"url": "http://get.iedadata.org/metadata/iso/usap/609070iso.xml"
} That provides the link, but there is no way to semantically distinguish it from any of the other data files in the Dataset. The |
One approach might be to use
[edited] |
@ashepherd That's interesting. Wouldn't the list of |
The DCAT revision group seems to be of a mind that distributions are different representations of the described resource content (some level of 'information equivalence', see w3c/dxwg#531). The metadata is a description of the resource, not a representation of its actual information content. I think what is really needed is a related resource pattern that provides a qualified association, something like xlink:href, xlink:role or the qualifiedAssociations in PROV. Schema.org Role might do the trick. If one took this pattern to heart, it could be used for distributions as well, where distribution is just one of the roles for a related resource. p.s. I agree dcat:DataCatalog is an odd property to put on a link to a metadata record describing the resource, but I'm pretty sure I got the recommendation to use that from DataOne (DV?). |
@smrgeoinfo do you get the sense the data packaging is out of scope of what DCAT group is thinking about? @mbjones I was thinking those MediaObject could be in the same JSON-LD document like IEDA has above (but I guess they could live elsewhere too). Maybe my approach isn't the best if data packaging isn't aligned with the meaning of distributions like @smrgeoinfo mentions. |
Its hard to say, but it seems that they're taking a pretty narrow interpretation of distribution. In the long run, I think a broader concept of 'related resource links' that include properties specifying what the links are about would be a better long term solution, so it could be used for landing pages, ftp directories, data packages, services for visualization and subsetting, applicable specifications. In the ISO 19115 world, the distributionInfo does get used to link to a broad array of resources related to using the metadata subject resource. The dct:conformsTo property could be used to identify a specification that describes the link function/role, and DCAT has this property on datasets and distributions. The boundary between distributions and related resources is pretty fuzzy-- the DCAT profiles ontology group has stepped into that one big time with their ResourceDescriptor class-- basically a set of related resources about a profile (but not distributions!) https://github.com/w3c/dxwg/issues/573 |
will review in EarthCube P419. @fils mentioned the Digital Object model i'm curious if anyone else has heard of this and if there are any big diffs between it and OAI-ORE. |
I'm not really up to speed on recent Digital Object architecture, but from a quick look an Larry L's paper and C2CAMP, it looks like the basic concepts are overlapping-- package data with metadata. What I don't see is any specification of precisely what the metadata for both file and data typing would look like, and the exercise is academic until that exists. |
At least for metadata describing the same dataset as
The
Note that the The premise of these suggestions is that |
I like Dave's proposal for the use of encoding on
@datadavev, have you seen an example where the specific distribution has another MediaObject? JI just want to make sure we are clear on the distinction. |
@ashepherd, I totally agree, my mistake there. Adding a separate There are a couple of situations where additional |
@datadavev, those are good examples! If you feel up to adding another pull request by adding that as an example just above here: https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#accessing-data-through-a-service-endpoint and below the basic example of a DataDownload. @smrgeoinfo, does this help address your thoughts about MIME Type (application/xml) and format (ISO 19115-2)? Should we consider making { "@type":"Dataset", ... "encoding":{ "@type": "MediaObject", "contentUrl":"https://example.org/link/to/iso.xml", "encodingFormat": ["application/xml", "http://www.isotc211.org/2005/gmd"], "description":"ISO TC211 XML rendering of metadata.", "dateModified":"2019-06-12T14:44:15Z" }, ... } |
I really don't think using sdo:encoding to link to another metadata record about the same resource is consistent with the intention of sdo:encoding. The closest match I can see is using sdo:subjectOf (scope note "A CreativeWork or Event about this Thing"). The ISO metadata record is a separate CreativeWork.Dataset that is about the same thing that the sdo:Dataset record describes. Encoding might look like this:
of course this would be even better if there were a subtype of 'Dataset' for data that is about other datasets, i.e. metadata; lacking that we have to rely on the encoding format including a URI for the metadata scheme as in the example. |
@smrgeoinfo can you tell us how you interpret the sdo:encoding definition? That might help us understand where you disagree. |
A media object that encodes this CreativeWork. This property is a synonym for associatedMedia. My thinking is that the metadata record is a separate resource, thus NOT an encoding of the resource that a sdo:Dataset object is about. Simple test is how you would use them-- can you do any scientific analysis with a metadata record about a dataset, or do you need an actual representation of the data? |
Consider though that an |
Key question-- Is the subject of sdo:encoding SELF (i.e. the sdo:DataSet instance), or the dataset that the sdo:Dataset instance describes. As usual the documentation for the element is unclear -- what does 'this CreativeWork' refer to? SELF, or a dataset that SELF is about? sdo:MediaObject is defined as "A media object, such as an image, video, or audio object embedded in a web page or a downloadable dataset", is that what the sdo:encoding definition is referring to? Googling 'media object' is interesting. I think the more useful and consistent interpretation is that the subject of the elements in the sdo:DataSet instance is the dataset, not the sdo:DataSet instance. ISO 19115 makes this distinction clear with properties named gmd:metadata... (not always consistently...), and DCAT distinguishes dcat:CatalogRecord and dcat:Resource. Schema.org has some proposed properties on sdo:Dataset with sd... prefixes (sdDatePublished, sdLicense, sdPublisher) that appear to be about the metadata record ('structured data') as opposed to the dataset that is the subject of the record. |
I wrote up three approaches to providing external links to metadata associated with an
See: https://so-tools.readthedocs.io/en/latest/external_metadata.html The write up focussed on functionality being implemented for harvesting in DataONE, so emphasis is more on functionality that can be leveraged by that infrastructure, though is intended to be generally applicable. In summary each approach will work, with It would not be difficult to recast that document to align with the Guideline document, which I'm happy to do if there's broader agreement. |
+1 for the subjectOf approach. In the 'about' example, a client parsing the so:DataSet/hasPart links would face a problem determining which part is actually the data (as opposed to metadata describing the data). One could probably infer the correct answer, but it requires client developers to write more code. In the encoding approach, I think that since the information content of the data and of the metadata describing the data are different, they are not encodings of the same resource. The data is about something in the world, the metadata is about that data. |
would return the So I think the |
Implemented just showing I typed the metadata as DataDownload (which I'm not sold on), but it was the only type that let you specify both the contentUrl and contentSize, encodingFormat. I also investigated using NOTE: I also used the
|
@ashepherd I think your 'sdo:subjectOf' example looks good. My only suggestion is to add "http://www.isotc211.org/2005/gmd" as an encoding format, since I assume the noaa profile is consistent with base gmd. I'm guessing a client might likely look for a particular xml metadata format by searching for the URI for the xml schema to which it conforms; it could be in sdo:encodingFormat or sdo:additionalType. They might be fine with any flavor of ISO19139 xml... Perhaps we can recommend conventions for recording encoding scheme hierarchies like application/xml --> http://www.isotc211.org/2005/gmd --> xml world: |
note to above -- my assumption was that gmd-noaa was schema valid against gmd (Type 1 profile), but its actually not ISO schema valid, so the http://www.isotc211.org/2005/gmd uri shouldn't be included. |
for profiles, see this W3C spec https://www.w3.org/TR/dx-prof/ with this
where |
Bioschemas.org draft profile for DataRecord is probably relevant. |
Schema.org WebAPI issue also has a relevant discussion with respect to using mime type for encoding formats (as well as additional properties). |
ESIP WInter Meeting Notes Doc: https://docs.google.com/document/d/1ycG9Dlt6xRr9wxjqkQrPkJQJvm83E34eue_cxkrSGUI/edit?ts=5e1503e3# |
Please review the DECISION on this issue at: https://github.com/ESIPFed/science-on-schema.org/blob/master/decisions/4-dataset-metadata-distributions.md |
@ashepherd I reviewed the ADR for this, and it generally looks good. I think we need to still:
I have created a feature branch feature_4_dataset_metadata_distributions to start incorporating these changes. I added a diagram, revised the text to be consistent with the ADR, and removed the SHACL block, which seems like a more advanced topic than would be needed in the guide itself. @datadavev it would be great if you could edit this branch with any changes you see are needed, as I modified a bunch of your text. I created a PR #81 from this branch for review and commenting, but please feel free to edit the branch directly. Other issues We also discussed controlled vocabularies for the encodingFormat, and it seems there are multiple options, and if we want to make a recommendation for that, we should do so in a separate issue. We also agreed that following the development of |
@ashepherd The diagram I made for the metadata section is here: https://www.lucidchart.com/invitations/accept/b1db3455-e7a1-486e-9f54-2e3bc692450c I couldn't seem to gain access to your folder with my free Lucid account. Probably missing something simple. Can you move it over? |
This ADR is probably fine as is - but the bigger issue of typing multiple kinds of relationships hasn't been resolved. Is that a new issue? |
Decision on 02-27 call; merge to develop. |
Done, merged to develop. Closing issue. |
Some schema.org providers use DataDownload to provide an explicit distribution link to an associated metadata file that might have more detailed metadata in a common XML format like ISO-19115. Is there a clear convention on how one can indicate that a particular distribution represents the metadata for the package? In DataONE's ORE-based packaging format, we use the
cito:documents
property and its inverse to indicate that a specific metadata document provides documentation for a particular set of data files. Would that be reasonable here, or is there a schema.org property that I missed?The text was updated successfully, but these errors were encountered: