Skip to content

Provenance patterns

Nicholas Car edited this page Sep 3, 2018 · 7 revisions

This page is for proposals related to the provenance patterns milestone. These requirements need to be addressed.

Specifying the input dataset

ℹ️ Source: UC19: Guidance on the use of qualified forms

In most cases, the relationships between datasets and related resources (e.g., author, publisher, contact point, publications / documentation, input data, model(s) / software used to create the dataset) can be specified with simple, binary properties available from widely used vocabularies - as [DCTerms] and [VOCAB-DCAT].

As an example, dcterms:source can be used to specify a relationship between a dataset (output:Dataset), and the dataset it was derived from (input:Dataset):

output:Dataset a dcat:Dataset ;
  dcterms:source input:Dataset .
  
input:Dataset a dcat:Dataset .

However, there may be the need of providing additional information concerning, e.g., the temporal context of a relationship, which requires the use of a more sophisticated representation, similar to the "qualified" forms used in [PROV-O]. For instance, the previous example may be further detailed by saying that the output dataset is an anonymized version of the input dataset, and that the anonymization process started at time t and ended at time t′. By using [PROV-O], this information can be expressed as follows:

output:Dataset a dcat:Dataset ;
  prov:qualifiedDerivation [
    a prov:Derivation ;
    prov:entity input:Dataset ; 
    prov:hadActivity :data_anonymization 
] .

input:Dataset a dcat:Dataset .

# The process of anonymizing the data (load the data, process it, and generate the anonymized version)

:data_anonymization
  a prov:Activity ;
# When the process started  
  prov:startedAtTime  "2018-01-23T01:52:02Z"^^xsd:dateTime;
# When the process ended  
  prov:endedAtTime "2018-01-23T02:00:02Z"^^xsd:dateTime .

Specifying the source catalog and the source metadata

ℹ️ Source: #132

In federated scenarios, metadata are harvested across catalog, and possibly transformed from the original metadata schema to the one implemented in the target catalog.

For instance, the following example illustrates a scenario where a catalog (a:Catalogue) include a metadata record (a:Record), that has been harvested from another catalog (the:SourceCatalog). In the example, the original record corresponds to the:SourceRecord.

a:Catalogue a dcat:Catalog ;
  dcat:record a:Record .

the:SourceCatalogue a dcat:Catalog .
  dcat:record the:SourceRecord .

a:Record a dcat:CatalogRecord ;
  dct:source the:SourceRecord .

the:SourceRecord a dcat:CatalogRecord .

The example can be further extended by specifying the metdata schema the two metadata records conform to.

a:Catalogue a dcat:Catalog ;
  dcat:record a:Record .

the:SourceCatalogue a dcat:Catalog .
  dcat:record the:SourceRecord .

a:Record a dcat:CatalogRecord ;
# The record conforms to the DCAT metadata schema
  dct:conformsTo dcat: ;
  dct:source the:SourceRecord .

the:SourceRecord a dcat:CatalogRecord ;
# The record conforms to the ISO-19115 metadata schema
  dct:conformsTo iso-19115: .

Since a:Record and the:SourceRecord are encoded in different metadata schemas, it is very likely that the:SourceRecord has been transformed from [ISO-19115-1] to [VOCAB-DCAT] by using a software agent (e.g., the XSLT implementing the transformation rules defined in [GeoDCAT-AP])

This information can be expressed by using [PROV-O], in a way similar to the example illustrated in the previous section:

a:Catalogue a dcat:Catalog ;
  dcat:record a:Record .

the:SourceCatalogue a dcat:Catalog .
  dcat:record the:SourceRecord .

a:Record a dcat:CatalogRecord ;
  dct:conformsTo dcat: ;
  prov:qualifiedDerivation [
    a prov:Derivation ;
    prov:entity the:SourceRecord ; 
    prov:hadActivity :data_transformation 
] .

the:SourceRecord a dcat:CatalogRecord ;
  dct:conformsTo iso-19115: .

# The process of transforming the metadata record (load the data, process it, and generate the transformed version)

:data_transformation
  a prov:Activity ;
  prov:wasAssociatedWith :GeoDCAT-AP-XSLT .

# The software agent used for running the transformation

:GeoDCAT-AP-XSLT a prov:SoftwareAgent .

Specifying conformance test results on data quality

ℹ️ Source: UC15: Modeling conformance test results on data quality

One of the ways of expressing data quality is to verify whether a given dataset is (or not) conformant with a given quality standard / benchmark.

[VOCAB-DQV] allows to specify data conformance with a reference quality standard / benchmark. However, this can model only one of the possible scenarios - i.e., when data are conformant.

[GeoDCAT-AP] provides an alternative and extended way of expressing "conformance" by using [PROV-O], allowing the specification of additional information about conformance tests (when this has been carried out, by whom, etc.), but also different conformance test results (namely, conformant, not conformant, not evaluated).

An example of the [GeoDCAT-AP] [PROV-O]-based representation of conformance is provided by the following code snippet:

a:Dataset a dcat:Dataset .

a:TestingActivity a prov:Activity ;
  prov:used a:Dataset ;
  prov:generated a:TestResult ;
  prov:qualifiedAssociation [ 
    a prov:Association ;
    # Here you can specify which is the agent did the test, when, etc.
    prov:hadPlan a:ConformanceTest 
  ] .

# Conformance test result
a:TestResult a prov:Entity ;
  dcterms:type <http://inspire.ec.europa.eu/metadata-codelist/DegreeOfConformity/conformant> .

a:ConformanceTest a prov:Plan ;
  # Here you can specify additional information on the test
  prov:wasDerivedFrom <http://data.europa.eu/eli/reg/2014/1312/oj> .

# Reference standard / specification
<http://data.europa.eu/eli/reg/2014/1312/oj> a prov:Entity, dct:Standard ;
  dcterms:title "Commission Regulation (EU) No 1089/2010 of 23 November 2010 implementing 
                 Directive 2007/2/EC of the European Parliament and of the Council as regards 
                 interoperability of spatial data sets and services"@en
  dcterms:issued "2010-11-23"^^xsd:date .

The example states that the reference dataset is conformant with the Commission Regulation (EU) No 1089/2010 of 23 November 2010 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards interoperability of spatial data sets and services. Since this case corresponds to the scenario supported in [VOCAB-DQV], the [PROV-O]-based representation above is equivalent to:

a:Dataset a dcat:Dataset ;
  dcterms:conformsTo <http://data.europa.eu/eli/reg/2014/1312/oj> .
# Reference standard / specification
<http://data.europa.eu/eli/reg/2014/1312/oj> a prov:Entity, dct:Standard ;
  dcterms:title "Commission Regulation (EU) No 1089/2010 of 23 November 2010 implementing 
                 Directive 2007/2/EC of the European Parliament and of the Council as regards 
                 interoperability of spatial data sets and services"@en
  dcterms:issued "2010-11-23"^^xsd:date .