Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digest for DCAT distributions #1287

Closed
cristianolongo opened this issue Jan 6, 2021 · 16 comments · Fixed by #1323
Closed

Digest for DCAT distributions #1287

cristianolongo opened this issue Jan 6, 2021 · 16 comments · Fixed by #1323
Labels
dcat:Distribution dcat due for closing Issue that is going to be closed if there are no objection within 6 days feedback Issues stemming from external feedback to the WG requirement
Milestone

Comments

@cristianolongo
Copy link

A digest of the file may be useful for downloadable dataset distributions, in order to verify the authenticity of the downloaded file and to verify that the dataset has not been updated after that the digest has been created (the digest should be updated just on the last update time)

@andrea-perego
Copy link
Contributor

@cristianolongo , I wonder whether the use case you are proposing is one of those addressed in DCAT-AP and its extensions by using spdx:checksum. Or you have additional / different requirements?

@cristianolongo
Copy link
Author

thanks @andrea-perego , yes spdx:checksum do the case. However I can't find the specification of DCAT-AP

@andrea-perego
Copy link
Contributor

thanks @andrea-perego , yes spdx:checksum do the case. However I can't find the specification of DCAT-AP

https://github.com/SEMICeu/DCAT-AP

@cristianolongo
Copy link
Author

great, thanks

@agbeltran
Copy link
Member

Hi @cristianolongo @andrea-perego - this seems to me a quite generic use case that might be good to consider in DCAT too.

@cristianolongo
Copy link
Author

of course, it is relevant for all downloadable datasets, and may be also for datasets provided via a SPARQL endpoint. Should I reopen the issue?

@andrea-perego
Copy link
Contributor

andrea-perego commented Jan 18, 2021

@agbeltran said:

Hi @cristianolongo @andrea-perego - this seems to me a quite generic use case that might be good to consider in DCAT too.

Agreed.

For our records:

DCAT-AP 2.0.1 describes the purpose of spdx:checksum as follows:

This property provides a mechanism that can be used to verify that the contents of a Distribution have not changed. The checksum is related to the download URL.

An example of its use:

https://www.europeandataportal.eu/sparql?default-graph-uri=&query=describe+%3CnodeID%3A%2F%2Fb621486445%3E&format=text%2Fturtle

@prefix spdx: <http://spdx.org/rdf/terms#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://europeandataportal.eu/set/distribution/2f3d36a4-79de-4cfb-85d7-706519be7b25> spdx:checksum _:b621486445 .

_:b621486445 a spdx:Checksum ;
  spdx:algorithm spdx:checksumAlgorithm_sha1 ;
  spdx:checksumValue "71bf58e542a47d7092ed1924f34db91bb24fe2c2"^^xsd:hexBinary .

@andrea-perego andrea-perego reopened this Jan 18, 2021
@riccardoAlbertoni
Copy link
Contributor

of course, it is relevant for all downloadable datasets, and may be also for datasets provided via a SPARQL endpoint. Should I reopen the issue?

@cristianolongo: we discussed the adoption of spdx:checksum in the last DCAT meeting (see meeting minutes).
I wonder if you could specify the mentioned use case for "datasets provided via a SPARQL endpoint" more, and whether you have any spdx example solution when it comes to SPARQL- distributed datasets.

@cristianolongo
Copy link
Author

Yes, I dump my dataset via a construct query like

construct {?x ?y ?z} where {?x ?y ?z}

With appropriate ordering clauses, the output should be predictable (depending on the knowledge base content of course).

Other cases may be dataset exposed via a REST API which returns json-ld.

However, I'm not fully convinced that these examples are in the scope of DCAT.

@andrea-perego andrea-perego added the feedback Issues stemming from external feedback to the WG label Feb 19, 2021
@agreiner
Copy link
Contributor

agreiner commented Mar 4, 2021

For what it's worth, I have used checksums for datasets in a web app, though not with SPARQL. The use case was differentiating datasets of electronic potentials for use in quantum mechanical calculations.

@riannella
Copy link

Agree to the need for a "Checksum" (algorithm + value) for dataset integrity (like spdx).

@andrea-perego
Copy link
Contributor

I've created a draft PR to integrate the relevant SPDX class and properties: #1323

A preview of the newly added sections:

I've included a couple of EDNOTEs about additional issues to be discussed.

Please review.

@andrea-perego andrea-perego changed the title digest for DCAT distributions Digest for DCAT distributions Mar 13, 2021
@riannella
Copy link

I would update the definition to be "The Checksum includes the algorithm and value that allows the integrity of a file to be verified to ensure no errors were detected in transmission or storage."

@andrea-perego
Copy link
Contributor

@riannella said:

I would update the definition to be "The Checksum includes the algorithm and value that allows the integrity of a file to be verified to ensure no errors were detected in transmission or storage."

I've added it as a usage note to spdx:Checksum - see https://raw.githack.com/w3c/dxwg/dcat-distribution-digest/dcat/index.html#Class:Checksum

@andrea-perego andrea-perego linked a pull request Mar 23, 2021 that will close this issue
@andrea-perego
Copy link
Contributor

The relevant updates have been merged into the ED via PR #1323

Unless there are any objections, I propose we close this issue.

@andrea-perego andrea-perego added the due for closing Issue that is going to be closed if there are no objection within 6 days label Mar 27, 2021
@riccardoAlbertoni
Copy link
Contributor

We are closing this issue as proposed above and as a result of tonight's DCAT subgroup meeting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat:Distribution dcat due for closing Issue that is going to be closed if there are no objection within 6 days feedback Issues stemming from external feedback to the WG requirement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants