Profiles and distributions #531

davebrowning · 2018-11-04T16:45:49Z

(Issue created to track comments received on Second PWD of DCAT Recommendation from Clemens Portele - email archived here)

Additional question about the interplay of profile choice and distribution:

Say I have a dataset of buildings and it is made accessible according to two different profiles (e.g. two different XML schemas or two different JSON schemas). The two profiles use different vocabularies and there are differences in the content. However, both representations are sourced from the same data. To me this would be a single dataset. However, this is not that clear in DCAT 1.0 and one could also take the view that these are two different datasets - with separate dataset metadata. At least I know cases where this has been represented as two datasets in catalogs. The new DCAT draft adds language about dataset as "a single conceptual entity" which seems to support the view that there is a single dataset in this case. Could guidance be included in the revision to support more consistent implementations, maybe just an example for such a case?

Assuming this would be consondered one dataset: If both profiles would be served through the same API (or service) and profile negotiation would be used, would this be one distribution (since it is a single API) or two distributions (one per profile, but with the same accessURL)?

Currently you can only specify the media type of a distribution. Considering the work on profiles and profile negotiation in the DXWG wouldn’t it make sense to be able to specify the profile(s) that a distribution supports in DCAT?

makxdekkers · 2018-11-05T09:32:16Z

This issue seems to be related to the discussion about 'informational equivalence' between distributions. If I understand correctly, serving up data according to a particular profile will deliver a different set of data -- assuming a profile will always deliver a selection/subset of the available data. If we require informational equivalence, in the sense that, for distributions A and B, a transformation A->B->A delivers the exact same data, such profiled data should be modelled as two datasets. Maybe the reference to the profile should then be at the level of Dataset, not on Distribution?

rob-metalinkage · 2018-11-06T04:13:45Z

Each distribution can have multiple values of dct:conformsTo to indicate profiles the distribution conforms to. Different distributions can conform to different sets of profiles.

The range of conformsTo is dct:Standard

The profiles ontology subclasses dct:Standard - so the mechanisms are available for the declarations required.

The issue of enforcing informational equivalence seems fraught however - its hard to imagine any distribution provided by a service that doesnt support either subsetting, calculations, additional links or lossy transformations - e.g. flattening objects to CSV with "magic" needed to relate columns together.

e.g. cost="USD 23.44" => "cost" = 23.44, "currency"="USD"

I havent seen a cogent argument for requiring informational equivalence in distributions yet...

makxdekkers · 2018-11-06T07:24:03Z

@rob-metalinkage I do not understand your statement that you haven't seen a 'cogent argument'. Is it that you haven't seen any arguments, or that you think that arguments made are not cogent?
Let's try this argument: Suppose there is a dataset that says it contains the data for the budget for the year 2019. As a user, I think it would be reasonable to expect that every one of the distributions contain all the data for that year and that they only differ in the format. I would not expect to having to browse through the descriptions of the distributions to find out which of them did contain all the data rather than some subset of it. Requiring 'informational equivalence' among distributions would provide this guarantee. If not, you really do not know beforehand what distributions contain.
My earlier point was (https://www.w3.org/TR/dcat-ucr/#ID34 and https://www.w3.org/TR/dcat-ucr/#RDIDF) that there needs to be a clear definition, so I would love to hear what your opinion is on this issue.
Please note that for distributions we're not talking about data services; distributions essentially give access to static files.

pwin · 2018-11-06T14:38:49Z

@makxdekkers would https://w3c.github.io/dxwg/ucr/#ID50 come into play here? ... a flag could indication if the distribution was the result of a lossy transformation of the dataset

makxdekkers · 2018-11-06T15:01:51Z

@pwin Indeed. The important part in the use case is "these events do not reduce the information content" which to me sounds a lot like "informationally equivalent".

davebrowning · 2019-02-05T22:21:18Z

This will be addressed with #317

rob-metalinkage · 2019-02-06T00:34:21Z

noticed this is being worked on and I missed a question earlier...

I believe the arguments for "information equivalence" have not sufficiently defined this term and the competency questions for the DCAT ontology related to it.

the example about whether a dataset contains data for the year 2019 is more clear cut than, for example, the rounding off of microseconds in dates in a different encoding. So the interpretation of each and every perspective seems to hang on the precise nature of "informational equivalence". I do not believe we have grounded this term well enough in Use Cases or derived Requirements.

dr-shorthair · 2019-02-06T03:01:18Z

@rob-metalinkage the consensus is that anything short of losslessly-convertible would be use-case specific.

And since we are reluctant (unwilling) to go with the former (and we think it would be a hard sell in the market), we will have to just come up with some wording to hedge the issue.

dr-shorthair · 2019-07-18T16:15:13Z

Ultimately it is the prerogative of the provider or cataloguer or indexer to make a judgement about how to factor the descriptions between Datasets and Distributions. Different applications and different communities will have different needs and different practices and I do not think we can provide universal guidelines. The big NOTE in https://www.w3.org/TR/vocab-dcat-2/#Class:Distribution partially speaks to this, but maybe could be improved further. I'll have a go.

davebrowning added dcat profile-guidance dcat:Distribution labels Nov 4, 2018

dr-shorthair mentioned this issue Nov 4, 2018

Distributions, services and implementation-resources #411

Closed

agbeltran mentioned this issue Nov 28, 2018

Confusion between major classes as newly defined in DCAT #431

Closed

smrgeoinfo mentioned this issue Dec 4, 2018

specific recommendations for metadata distribution info ESIPFed/science-on-schema.org#4

Closed

agbeltran added the feedback Issues stemming from external feedback to the WG label Jan 15, 2019

agbeltran mentioned this issue Feb 4, 2019

Change domain or create superclass of dcat:Distribution #317

Closed

davebrowning assigned agbeltran Feb 5, 2019

makxdekkers mentioned this issue Mar 2, 2019

Distribution composed of more than one file, but not packaged #482

Closed

dr-shorthair mentioned this issue Jul 18, 2019

Further tweaks to distribution explanations. #1008

Merged

davebrowning added the due for closing Issue that is going to be closed if there are no objection within 6 days label Jul 24, 2019

davebrowning closed this as completed Sep 5, 2019

davebrowning removed the due for closing Issue that is going to be closed if there are no objection within 6 days label Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiles and distributions #531

Profiles and distributions #531

davebrowning commented Nov 4, 2018

makxdekkers commented Nov 5, 2018

rob-metalinkage commented Nov 6, 2018

makxdekkers commented Nov 6, 2018

pwin commented Nov 6, 2018

makxdekkers commented Nov 6, 2018

davebrowning commented Feb 5, 2019

rob-metalinkage commented Feb 6, 2019

dr-shorthair commented Feb 6, 2019

dr-shorthair commented Jul 18, 2019

Profiles and distributions #531

Profiles and distributions #531

Comments

davebrowning commented Nov 4, 2018

makxdekkers commented Nov 5, 2018

rob-metalinkage commented Nov 6, 2018

makxdekkers commented Nov 6, 2018

pwin commented Nov 6, 2018

makxdekkers commented Nov 6, 2018

davebrowning commented Feb 5, 2019

rob-metalinkage commented Feb 6, 2019

dr-shorthair commented Feb 6, 2019

dr-shorthair commented Jul 18, 2019