Skip to content
Joachim Neubert edited this page Apr 9, 2016 · 16 revisions

Using skos-history

This Tutorial is intended for you, if you want to compare two or more versions of a SKOS vocabulary - may be as the publisher of this vocabulary, or as an independent user of it. This includes creating a "version store" which can be queried by SPARQL.

If you just want to learn about the skos-history approach and experiment with queries on version differences, you may use the public SPARQL endpoint with a pre-loaded version store and according queries.

Requirements

  • A SPARQL 1.1 compliant triple store, accessible in read/write mode. (A tutorial on setting up a Jena Fuseki 2 triple store is (available here.)
  • For the data load script: An environment for executing bash scripts (any Linux should do, Cygwin may).

Creating the version store

The steps to set up a SPARQL triple store for a dataset differ from implementation to implementation, and are here considered out of scope (for Fuseki see the docs).

When the triple store with an empty dataset is functional, the version data can be loaded. This is done completely through the SPARQL 1.1 HTTP protocols for query and update and the graph store protocol. The bash script for the loading itself, for creating the version diffs and the version history graph with the interlinking triples is available as load_versions.sh (source). The load script creates a "version store" which implements the basic concepts of skos-history.

Config file

It requires a dataset-specific config file, which describes where the version files can be found, and which endpoints can be used for loading and querying the data. As an example, the yso.config (source) for the General Finnish Ontology (YSO) is given here:

DATASET=yso
BASEDIR=/home/sinpessa/tunk/skos-history-yso
FILENAME=yso-skos.ttl
VERSIONS=(20150102 20150128 20150129)
SCHEMEURI='http://www.yso.fi/onto/yso/'

ENDPOINT=http://sparql.dev.finto.fi/skos-history
PUT_URI=$ENDPOINT/data
UPDATE_URI=$ENDPOINT/update
QUERY_URI=$ENDPOINT/sparql

The file is sourced (executed as bash code) by the load script. The variables are defined and used as follows:

Variable Use
DATASET A short name for the SKOS dataset.
BASEDIR Base directory for the location of the version files to load.
FILENAME File name for all version files. (Currently, local files in Turtle syntax are expected. See an example here for automated download and conversion of RDF/XML files from the web.).
VERSIONS List of version identifiers, in order from the oldest version to the current one. The identifiers must not include white space, they may be version numbers following any convention, or may be derived from dates. The version files are expected by the load script under the full path $BASEDIR/$VERSION/$FILENAME. The version identifiers are also used as part of the version and delta graph names. (For an example on how the version identifiers can be obtained dynamically, see here).
SCHEMEURI The URI of the skos:ConceptScheme for the dataset. The URI for the version history set is constructed as $SCHEMEURI/version, and is used as the base URI for version and delta graph names.
ENDPOINT Introduced here as a base URI within the config file (not used by the load script).
UPDATE_URI URI for a SPARQL Update endpoint.
PUT_URI URI for an endpoint supporting the SPARQL Graph Store protocol.
QUERY_URI URI for a SPARQL Query endpoint. $QUERY_URI may be publicly accessible, whereas that would not be recommended for $UPDATE_URI and $PUT_URI.

The endpoint URIs in the example given above follow the Fuseki conventions; for Sesame, see an example here.

Further config files for other datasets are availible as examples.

Actual loading

After creating the dataset-specific config file, it should be possible the load the data and construct the version store by calling:

./load_versions.sh -f {configfile}

The load script computes deltas between every consecutive version (as determined by the sequence of identifiers in $VERSIONS), and additionally deltas between every version and the current one.

See the querying section of this tutorial on how to inspect the created data structures.

Additional adaptions

The ISO25964/SKOS and dataset versioning approach suggests a dc:identifier and dc:date property for each version history record. The load script tries to extract (and sometimes fix) this information from the loaded version data - more precisely, from the properties of the skos:ConceptScheme given by $SCHEMEURI. It already covers diverse properties used in several vocabularies (and different versions of these vocabularies ...) in the wild. If you extend the according section of the script (starting with the comment "add triples to the version history graph") in order to cover the property usage in your vocabulary, please consider to contribute the code.

Querying the version store

Having the version store with its metadate structures in place, it can be queried in a quite flexible and generic way via the SPARQL 1.1 Query Language.

An interactive environment

For the example queries of the skos-history project, an environment have been set up which allows you to load queries from Github and execute them with one click - and also to modify them in an IDE-like environment for further exploration. This is triggerd by the links to the example queries, which normally executes the query on an example endpoint with version data from the STW Thesaurus for Economics. With an additional request argument, &endpoint=$QUERY_URI, the queries can be executed against the version store built in the previous step.

The SPARQL Lab environment allows for overriding the VALUES parameters of a query by request arguments with the same name. So if &language=de is added to the request, a default value "en" for a VALUES parameter ?language in the query code is replaced with "de", before the query is executed.

(For the curious: The SPARQL Lab environment is provided by ZBW. It is based on the YASGUI libraries, and extended to load queries from the web and to inject parameters into queries through HTTP request arguments. The SPARQL Lab code is publicly availible).

The example queries

The example queries cover the exploration of the version store (version_overview, version_graph, service_graph) as well as the basic change information (added_concepts, deleted_concepts, changed_notations), more subtle changes such as concept splits (labels_moved_to_added_concepts), queries for aggregated values or the history of single concepts across multiple versions (concept_deltas).

These queries are intended to work with every SKOS vocabulary. Please feel free to experiment and possibly improve or adapt them to the specifics of the vocabulary at hand. If you store the adapted queries on the web (e.g., on Github), they can be loaded into the SPARQL Lab environment via the &queryRef= parameter by URI.

Dataset-specific queries

A set of STW specific queries, which is in active use use for generating the change reports on the production STW site, may provide further examples and suggestions, how vocabulary-specific features (such as the subject category system of STW) can be exploited to provide improved change information.

Some boilerplate statements to begin with

Maybe you want to query for something completely different. We found it useful to start queries with a fairly standardized block of statements, which bind some variables for later use:

  GRAPH ?versionHistoryGraph {
    # parameters
    VALUES ( ?versionHistoryGraph ?oldVersion ?newVersion ?language ) {
        ( undef undef undef "en" )
    }
    # get the current and the previous version as default versions
    ?versionset dsv:currentVersionRecord/xhv:prev/dc:identifier ?previousVersion .
    ?versionset dsv:currentVersionRecord/dc:identifier ?latestVersion .
    # select the versions to actually use
    BIND(coalesce(?oldVersion, ?previousVersion) AS ?oldVersionSelected)
    BIND(coalesce(?newVersion, ?latestVersion) AS ?newVersionSelected)
    # get the delta and via that the relevant graphs
    ?delta a sh:SchemeDelta ;
      sh:deltaFrom/dc:identifier ?oldVersionSelected ;
      sh:deltaTo/dc:identifier ?newVersionSelected ;
      sh:deltaFrom/sh:usingNamedGraph/sd:name ?oldVersionGraph ;
      sh:deltaTo/sh:usingNamedGraph/sd:name ?newVersionGraph .
    ?insertions a sh:SchemeDeltaInsertions ;
      dcterms:isPartOf ?delta ;
      sh:usingNamedGraph/sd:name ?insertionsGraph .
    ?deletions a sh:SchemeDeltaDeletions ;
      dcterms:isPartOf ?delta ;
      sh:usingNamedGraph/sd:name ?deletionsGraph .
  }

This gives you the relevant old and new version graphs as well as the insertions and deletions graphs to deal with, and also some other often-used variables defined in the VALUES clause.

Contributing custom queries

If you found skos-history useful for exploring your vocabulary in perhaps unprecedented ways, please consider contributing to the project. We would happily include queries tailored for a particular vocabulary on a separate page, or link to a location where such queries can be found on the web.