-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiles and distributions #531
Comments
This issue seems to be related to the discussion about 'informational equivalence' between distributions. If I understand correctly, serving up data according to a particular profile will deliver a different set of data -- assuming a profile will always deliver a selection/subset of the available data. If we require informational equivalence, in the sense that, for distributions A and B, a transformation A->B->A delivers the exact same data, such profiled data should be modelled as two datasets. Maybe the reference to the profile should then be at the level of Dataset, not on Distribution? |
Each distribution can have multiple values of dct:conformsTo to indicate profiles the distribution conforms to. Different distributions can conform to different sets of profiles. The range of conformsTo is dct:Standard The profiles ontology subclasses dct:Standard - so the mechanisms are available for the declarations required. The issue of enforcing informational equivalence seems fraught however - its hard to imagine any distribution provided by a service that doesnt support either subsetting, calculations, additional links or lossy transformations - e.g. flattening objects to CSV with "magic" needed to relate columns together. e.g. cost="USD 23.44" => "cost" = 23.44, "currency"="USD" I havent seen a cogent argument for requiring informational equivalence in distributions yet... |
@rob-metalinkage I do not understand your statement that you haven't seen a 'cogent argument'. Is it that you haven't seen any arguments, or that you think that arguments made are not cogent? |
@makxdekkers would https://w3c.github.io/dxwg/ucr/#ID50 come into play here? ... a flag could indication if the distribution was the result of a lossy transformation of the dataset |
@pwin Indeed. The important part in the use case is "these events do not reduce the information content" which to me sounds a lot like "informationally equivalent". |
This will be addressed with #317 |
noticed this is being worked on and I missed a question earlier... I believe the arguments for "information equivalence" have not sufficiently defined this term and the competency questions for the DCAT ontology related to it. the example about whether a dataset contains data for the year 2019 is more clear cut than, for example, the rounding off of microseconds in dates in a different encoding. So the interpretation of each and every perspective seems to hang on the precise nature of "informational equivalence". I do not believe we have grounded this term well enough in Use Cases or derived Requirements. |
@rob-metalinkage the consensus is that anything short of losslessly-convertible would be use-case specific. And since we are reluctant (unwilling) to go with the former (and we think it would be a hard sell in the market), we will have to just come up with some wording to hedge the issue. |
Ultimately it is the prerogative of the provider or cataloguer or indexer to make a judgement about how to factor the descriptions between Datasets and Distributions. Different applications and different communities will have different needs and different practices and I do not think we can provide universal guidelines. The big NOTE in https://www.w3.org/TR/vocab-dcat-2/#Class:Distribution partially speaks to this, but maybe could be improved further. I'll have a go. |
(Issue created to track comments received on Second PWD of DCAT Recommendation from Clemens Portele - email archived here)
Additional question about the interplay of profile choice and distribution:
Say I have a dataset of buildings and it is made accessible according to two different profiles (e.g. two different XML schemas or two different JSON schemas). The two profiles use different vocabularies and there are differences in the content. However, both representations are sourced from the same data. To me this would be a single dataset. However, this is not that clear in DCAT 1.0 and one could also take the view that these are two different datasets - with separate dataset metadata. At least I know cases where this has been represented as two datasets in catalogs. The new DCAT draft adds language about dataset as "a single conceptual entity" which seems to support the view that there is a single dataset in this case. Could guidance be included in the revision to support more consistent implementations, maybe just an example for such a case?
Assuming this would be consondered one dataset: If both profiles would be served through the same API (or service) and profile negotiation would be used, would this be one distribution (since it is a single API) or two distributions (one per profile, but with the same accessURL)?
Currently you can only specify the media type of a distribution. Considering the work on profiles and profile negotiation in the DXWG wouldn’t it make sense to be able to specify the profile(s) that a distribution supports in DCAT?
The text was updated successfully, but these errors were encountered: