-
Notifications
You must be signed in to change notification settings - Fork 47
Provenance patterns
This page is for proposals related to the provenance patterns milestone. These requirements need to be addressed.
ℹ️ Source: UC19: Guidance on the use of qualified forms
In most cases, the relationships between datasets and related resources (e.g., author, publisher, contact point, publications / documentation, input data, model(s) / software used to create the dataset) can be specified with simple, binary properties available from widely used vocabularies - as [DCTerms] and [VOCAB-DCAT].
As an example, dcterms:source
can be used to specify a relationship between a dataset (output:Dataset
), and the dataset it was derived from (input:Dataset
):
output:Dataset a dcat:Dataset ;
dcterms:source input:Dataset .
input:Dataset a dcat:Dataset .
However, there may be the need of providing additional information concerning, e.g., the temporal context of a relationship, which requires the use of a more sophisticated representation, similar to the "qualified" forms used in [PROV-O]. For instance, the previous example may be further detailed by saying that the output dataset is an anonymized version of the input dataset, and that the anonymization process started at time t and ended at time t′. By using [PROV-O], this information can be expressed as follows:
output:Dataset a dcat:Dataset ;
prov:qualifiedDerivation [
a prov:Derivation ;
prov:entity input:Dataset ;
prov:hadActivity :data_anonymization
] .
input:Dataset a dcat:Dataset .
# The process of anonymizing the data (load the data, process it, and generate the anonymized version)
:data_anonymization
a prov:Activity ;
# When the process started
prov:startedAtTime "2018-01-23T01:52:02Z"^^xsd:dateTime;
# When the process ended
prov:endedAtTime "2018-01-23T02:00:02Z"^^xsd:dateTime .
ℹ️ Source: #132
In federated scenarios, metadata are harvested across catalog, and possibly transformed from the original metadata schema to the one implemented in the target catalog.
For instance, the following example illustrates a scenario where a catalog (a:Catalogue
) include a metadata record (a:Record
), that has been harvested from another catalog (the:SourceCatalog
). In the example, the original record corresponds to the:SourceRecord
.
a:Catalogue a dcat:Catalog ;
dcat:record a:Record .
the:SourceCatalogue a dcat:Catalog .
dcat:record the:SourceRecord .
a:Record a dcat:CatalogRecord ;
dct:source the:SourceRecord .
the:SourceRecord a dcat:CatalogRecord .
The example can be further extended by specifying the metdata schema the two metadata records conform to.
a:Catalogue a dcat:Catalog ;
dcat:record a:Record .
the:SourceCatalogue a dcat:Catalog .
dcat:record the:SourceRecord .
a:Record a dcat:CatalogRecord ;
# The record conforms to the DCAT metadata schema
dct:conformsTo dcat: ;
dct:source the:SourceRecord .
the:SourceRecord a dcat:CatalogRecord ;
# The record conforms to the ISO-19115 metadata schema
dct:conformsTo iso-19115: .
Since a:Record
and the:SourceRecord
are encoded in different metadata schemas, it is very likely that the:SourceRecord
has been transformed from [ISO-19115-1] to [VOCAB-DCAT] by using a software agent (e.g., the XSLT implementing the transformation rules defined in [GeoDCAT-AP])
This information can be expressed by using [PROV-O], in a way similar to the example illustrated in the previous section:
a:Catalogue a dcat:Catalog ;
dcat:record a:Record .
the:SourceCatalogue a dcat:Catalog .
dcat:record the:SourceRecord .
a:Record a dcat:CatalogRecord ;
dct:conformsTo dcat: ;
prov:qualifiedDerivation [
a prov:Derivation ;
prov:entity the:SourceRecord ;
prov:hadActivity :data_transformation
] .
the:SourceRecord a dcat:CatalogRecord ;
dct:conformsTo iso-19115: .
# The process of transforming the metadata record (load the data, process it, and generate the transformed version)
:data_transformation
a prov:Activity ;
prov:wasAssociatedWith :GeoDCAT-AP-XSLT .
# The software agent used for running the transformation
:GeoDCAT-AP-XSLT a prov:SoftwareAgent .
ℹ️ Source: UC15: Modeling conformance test results on data quality
One of the ways of expressing data quality is to verify whether a given dataset is (or not) conformant with a given quality standard / benchmark.
[VOCAB-DQV] allows to specify data conformance with a reference quality standard / benchmark. However, this can model only one of the possible scenarios - i.e., when data are conformant.
[GeoDCAT-AP] provides an alternative and extended way of expressing "conformance" by using [PROV-O], allowing the specification of additional information about conformance tests (when this has been carried out, by whom, etc.), but also different conformance test results (namely, conformant, not conformant, not evaluated).
An example of the [GeoDCAT-AP] [PROV-O]-based representation of conformance is provided by the following code snippet:
a:Dataset a dcat:Dataset .
a:TestingActivity a prov:Activity ;
prov:used a:Dataset ;
prov:generated a:TestResult ;
prov:qualifiedAssociation [
a prov:Association ;
# Here you can specify which is the agent did the test, when, etc.
prov:hadPlan a:ConformanceTest
] .
# Conformance test result
a:TestResult a prov:Entity ;
dcterms:type <http://inspire.ec.europa.eu/metadata-codelist/DegreeOfConformity/conformant> .
a:ConformanceTest a prov:Plan ;
# Here you can specify additional information on the test
prov:wasDerivedFrom <http://data.europa.eu/eli/reg/2014/1312/oj> .
# Reference standard / specification
<http://data.europa.eu/eli/reg/2014/1312/oj> a prov:Entity, dct:Standard ;
dcterms:title "Commission Regulation (EU) No 1089/2010 of 23 November 2010 implementing
Directive 2007/2/EC of the European Parliament and of the Council as regards
interoperability of spatial data sets and services"@en
dcterms:issued "2010-11-23"^^xsd:date .
The example states that the reference dataset is conformant with the Commission Regulation (EU) No 1089/2010 of 23 November 2010 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards interoperability of spatial data sets and services. Since this case corresponds to the scenario supported in [VOCAB-DQV], the [PROV-O]-based representation above is equivalent to:
a:Dataset a dcat:Dataset ;
dcterms:conformsTo <http://data.europa.eu/eli/reg/2014/1312/oj> .
# Reference standard / specification
<http://data.europa.eu/eli/reg/2014/1312/oj> a prov:Entity, dct:Standard ;
dcterms:title "Commission Regulation (EU) No 1089/2010 of 23 November 2010 implementing
Directive 2007/2/EC of the European Parliament and of the Council as regards
interoperability of spatial data sets and services"@en
dcterms:issued "2010-11-23"^^xsd:date .