Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to generalise property dcat:dataset #116

Closed
andrea-perego opened this issue Feb 14, 2018 · 11 comments
Closed

Proposal to generalise property dcat:dataset #116

andrea-perego opened this issue Feb 14, 2018 · 11 comments

Comments

@andrea-perego
Copy link
Contributor

This proposal is related to requirement [RDST] ("Dataset type") #64

In the current version of DCAT, the relationship used to link a dcat:Catalog with the documented resources is dcat:dataset, which is meant to be used only for dcat:Dataset's.

If we plan to support the possibility of documenting resources not falling into the DCAT definition of "datasets", we need to use another property.

For instance, as illustrated by UC20, in GeoDCAT-AP, catalogues can include records about services. In this case, the relationship between the catalogue and the service is specified by using dct:hasPart.

This is in line with DCAT, as dcat:dataset is defined as a subproperty of dct:hasPart. The issue is that dct:hasPart is quite generic, and it is already used for expressing other "inclusion" relationships. E.g., in DCAT-AP 1.1, it is used to specify the relationship between catalogues and subcatalogues (i.e., "a catalogue that is part of the described catalogue").

So, an option would be to mint a new property (dcat:resource?), more specific than dct:hasPart - e.g.:

dcat:resource rdfs:subPropertyOf dct:hasPart .

dcat:dataset rdfs:subPropertyOf dct:hasPart , dcat:resource .

Please note that the proposal is not to replace / deprecate dcat:dataset: this property should be still used for linking dcat:Catalog's to dcat:Dataset's, whereas dcat:resource (or whatever we decide to call it) will be used for those resources which are not dcat:Dataset's.

NB: This issue does not concern how dcat:CatalogRecord's are linked to the the corresponding resources, as this relationship is expressed in the current version of DCAT by using a generic property, namely, foaf:primaryTopic. Moreover, the creation of a new property (i.e., the hypotetical dcat:resource) has no impact on the current definition of dcat:CatalogRecord, and the related properties.

@dr-shorthair
Copy link
Contributor

Related to #117

@stijngoedertier
Copy link

Andrea, would this also open the door to a more generic class than dcat:Dataset?

Next to your use case ID20, also the Flemish public administration ‘Agentschap Informatie Vlaanderen’ has a use case for a single vocabulary to describe resource types such as datasets, dataset series, services, and documents (see e.g. this presentation at SDSVoc). The approach of soft typing in #64 is helpful, but the current definition of dcat:Dataset still reads ‘a collection of data, published or curated by a single agent, and available for access or download in one or more formats.
If we want to support describing resource types that arguably do not meet this definition (e.g. services, documents, software, …), we may also need a more general class than dcat:Dataset. Perhaps a class like 'Work' or ‘Expression’, taking inspiration from definitions in FRBR.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Mar 20, 2018

This requirement - to be able to have datasets, series, services and documents in the same Catalog - is now looking very strong. @stijngoedertier could you write a UC making this explicit?

Then I suggest that it is better solved by adding more classes to DCAT, like DataService, with their own membership predicates - e.g. see https://github.com/w3c/dxwg/wiki/Cataloguing-data-services

Note that dcat:dataset rdfs:subPropertyOf dcterms:hasPart . is already in DCAT (notwithstanding my concern about mereology #117 (comment) ). If other types are in the catalog, then I suggest they have their own membership predicates, siblings of dcat:dataset rather than losing precision by generalizing dcat:dataset.

@dr-shorthair
Copy link
Contributor

FWIW - overnight I was on a meeting where people are looking at catalogs of samples.
Out of scope for DXWG/DCAT, but I plan to propose a domain extension to DCAT to manage that case as well.

The pattern is general. We just need to decide

  1. how many of these resource classes are in scope for DCAT, and
  2. whether a single membership predicate should be used, or multiple.

After the discussion here and on #117 my votes are

  1. dataset-related resources yes, others no
  2. different classes, different predicates, all sub-properties of (here he grits his teeth) dcterms:hasPart

@andrea-perego
Copy link
Contributor Author

@stijngoedertier , @dr-shorthair ,

I think that one of the motivations behind the idea of supporting resources different from dcat:Dataset is related to a general use case, namely, the use of DCAT as a cross-domain and cross-platform metadata interchange format. This is actually the use case behind the domain-specific extensions of DCAT-AP (GeoDCAT-AP and StatDCAT-AP).

So, looking at other metadata schemas, as ISO 19115 and DataCite, datasets are not the only resource "types" - as explained in UC20. The same use case also argues that most of these "resource types" can match the definition of dcat:Dataset (e.g., software, document), but not resources as "services" and "events". For the former (i.e., those resources matching the definition of dcat:Dataset) the soft typing approach would ensure backward compatibility, but for the other ones we need to have some specific classes (e.g., dctype:Service, dctype:Event).

Not supporting resources different from dcat:Dataset's would therefore result in preventing such resources from being shared when ISO 19115 / DataCite / etc. records are transformed into DCAT.

@dr-shorthair
Copy link
Contributor

See #172 which is a specific issue to clarify scope, the result of which will determine whether this issue is live or not.

@dr-shorthair
Copy link
Contributor

Resolved in DCAT team meeting
https://www.w3.org/2018/04/18-dxwgdcat-minutes.html#x06

No relaxation of domain and range of dcat:dataset

@stijngoedertier
Copy link

@dr-shorthair as you asked some time ago, people from Agentschap Informatie Vlaanderen have come up with this use case to make the requirement to have datasets, series, services and documents in the same Catalog more explicit. Could this still be included in the URC document?

@dr-shorthair
Copy link
Contributor

That would be helpful. However, I'm not an editor of the UCR document.
This relates to #180 #181 #182 and in particular #56 .
Let's continue the discussion over there.

@andrea-perego
Copy link
Contributor Author

@dr-shorthair as you asked some time ago, people from Agentschap Informatie Vlaanderen have come up with this use case to make the requirement to have datasets, series, services and documents in the same Catalog more explicit. Could this still be included in the URC document?

@stijngoedertier , was this use case contributed via the GH issue tracker? If not, I would suggest you create a specific issue, so to have a space where to discuss it, and link to the related use cases and requirements.

@stijngoedertier
Copy link

Thank you, Simon and Andrea. I have submitted it here: #223.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants