Skip to content

Inventory of existing extensions to SPARQL 1.1

Marcelo Barbieri edited this page Apr 21, 2024 · 30 revisions

An inventory of existing extensions

Eclipse RDF4J

less strict datatype restrictions in comparison operators

Example:

FILTER("1999"^^xsd:gYear < "2009-01-01T20:20:20Z"^^xsd:dateTime) => true (strict semantics is type error)

extended mathematics operations on date/time/duration datatypes

Examples:

"2013-11"^^xsd:gYearMonth + "P1Y1M"^^xsd:yearMonthDuration => "P2Y1M"^^xsd:yearMonthDuration
"12"^^xsd:Integer * "P1Y"^^:xsd:yearMonthDuration => "P12Y"^^xsd:yearMonthDuration

GeoSPARQL operators

See GeoSPARQL specs

full-text search extensions

Example:

?subj search:matches [
          search:query "search terms...";
          search:property my:property;
          search:score ?score;
          search:snippet ?snippet ] .

Apache Jena

ARQ extension functions

Functions from Functions in ARQ with no SPARQL 1.1 equivalent:

afn:bnode(?x)
afn:localname(?x)
afn:namespace(?x)
afn:sprintf(format, v1, v2, ...)
afn:min(num1, num2)
afn:max(num1, num2)
afn:pi()
afn:e()
afn:sqrt(num)
afn:now()

XSD Datatypes

Jena provides the functions from "XPath and XQuery Functions and Operators 3.1" for the atomic (so not sequences), non-XML-related datatypes expect for the picture string formatting operations (no contribution and no requests so far). It does provide a "afn:sprintf" with the more programmer-centric formatting strings.

Property Functions

Example:

?segment apf:strSplit (?s ", ")

This is covered by issue 6.

The list syntax is reused to become multiple arguments/results for the operation.

CONSTRUCT quads

https://jena.apache.org/documentation/query/construct-quad.html

CONSTRUCT {
    GRAPH :g { ?s :p ?o }
    :s ?p :o
} WHERE {  ... }

Generate JSON

https://jena.apache.org/documentation/query/generate-json-from-sparql.html

JSON {
  "author": ?author, 
  "title": ?title 
} WHERE { ... }

Dynamic function call

BIND(CALL(?x, ?y) AS ?z)

See issue #20

Statistics Aggregators

STDEV, STDEV_SAMP, STDEV_POP, VARIANCE, VAR_SAMP, VAR_POP following the SQL operations.

Dataset HTTP verbs

Fuseki provides for GET, PUT, POST on the whole dataset treated as quads.

Extended assignement

https://jena.apache.org/documentation/query/assignment.html

Jena supports LET where the variable being assigned may already be set in the query. If is set, the expression tests to see if the value is the same as the expression evaluation - it is like a FILTER sameTerm in this case.

GraphDB

See GraphDB documentation, especially Plugins (most are introduced with comprehensive use cases)

RDF-star and SPARQL-star

Functions

  • Functions Reference: mathematical, datetime, SPIN functions, RDF-star extensions, GeoSPARQL and GeoSPARQL extensions
  • Also supports RDF4J "extended mathematics operations" described above
  • JavaScript Functions

Full-Text-Search, Faceting, Text Analysis

  • Autocomplete for simple autocomplete queries
  • Connectors to Lucene, SOLR, Elastic that implement FTS, advanced ranking, limit/offset, snippet extraction (hit highlighting), facets (including hierarchical), aggregations, sub-aggregations
  • Text Mining to invoke any text mining software returning JSON. Examples with Spacy, Ontotext CES (concept extraction service), Google Cloud Natural Language API, Refinitiv API

Connectors, Virtualization

  • SQL Access over JDBC: exposes the result of SPARQL queries as SQL tables, which you can further query and combine with Apache Calcite to use in traditional BI tools
  • RDBMS Virtualization: translates SPARQL queries to SQL queries following an R2RML or OBDA mapping.
  • GraphDB-Mongo Integration: so you can store voluminous JSON in Mongo, and take only relevant parts of them as JSON-LD to GraphDB.
  • Kafka GraphDB connector: send selected changes from the RDF store to Kafka topics.
  • Internal Federation: faster and secure SPARQL Federation
  • FedX Federation: federation without having to explicitly say which endpoint should execute which part of the query

Geospatial

Graph Search

  • RDF Rank: like Page Rank but for RDF graphs. Very useful for ranking nodes if you don't have a more elaborate measure
  • Semantic Similarity Search based on text and graph (predication) vector embedding (distributional semantics)
  • Graph Path Search that uses SERVICE in two different ways:
    • (outer) To invoke the search algorithm (like Blazegraph and others do)
    • (inner) To specify the triple pattern for each step (where you can use Sparql Property Paths or any other patterns)

Eg here's a search of actors related through films (your typical Six Degrees of Kevin Bacon problem):

PREFIX path: <http://www.ontotext.com/path#>
SELECT ?edge ?index ?path
WHERE {
    VALUES (?src ?dst) {
        ( <http://dbpedia.org/resource/Chris_Evans_(actor)> <http://dbpedia.org/resource/Chris_Hemsworth> )
    }
    SERVICE <http://www.ontotext.com/path#search> {
        <urn:path> path:findPath path:allPaths ;
                   path:sourceNode ?src ;
                   path:destinationNode ?dst ;
                   path:pathIndex ?path ;
                   path:minPathLength 2 ;
                   path:startNode ?start;
                   path:resultBinding ?edge ;
                   path:endNode ?end;
                   path:resultBindingIndex ?index .
        SERVICE <urn:path> {
            ?film a <http://dbpedia.org/ontology/Film> .
            ?film <http://dbpedia.org/property/starring> ?start .
            ?film <http://dbpedia.org/property/starring> ?end .
        }
    }
}

Reasoning, Proof, Explain

  • Reasoning: standard (from RDFS to OWL RL and QL) and custom
  • Dynamic operations on rulesets
  • Builtin graphs onto:implicit and onto:explicit return only inferred/explicit triples respectively. This is similar to SPARQL Entailment regimes but not compatible.
  • Delete optimization: inferred triples without support are retracted (no full re-infer is needed). Predicate onto:schemaTransaction is used to mark axiomatic (T-Box) triples to make this process more efficient.
  • sameAs optimization: sameAs-equivalent URLs are treated as a cluster, and combinatorial triple expansion is avoided. Graph onto:disable-sameAs controls whether such triples should be returned in result sets (whether the clustered URLs should be enumerated).
  • Explain Plan: Graph onto:explain returns a query plan, instead of actually executing the query
  • Change Tracking in the context of a transaction identified by a unique ID.
  • Data History and Versioning enables you to access past states of the database. Complements the SPOC index with D (transaction Datetime) and Insert/Delete flag
  • Provenance: Generation of inference closure from a specific named graph at query time.
  • Proof: Find out how a given statement has been derived by the inferencer.

Blazegraph

The documentation for Blazegraph extensions can be found on Blazegraph wiki.

Named Subqueries

Named subqueries let you pre-compute solution sets which may be used multiple times within your query. They are useful when you want to process some subset of your data in multiple ways within a single query. You may also have multiple named subqueries. Each named subquery result can be INCLUDEd into the query in one or more places. The solution sets will be stored on the native heap (HTree) if the analytic query mode is enabled.

SELECT ...
WITH {
	# Subquery goes here
} AS %NAME
WHERE {
	# Main query goes here
	INCLUDE %NAME
}

Named solutions sets

Blazegraph supports persistent named solution sets, which can be created either with INSERT INTO ... SELECT syntax:

INSERT INTO %solutionSet1
SELECT ?product ?reviewer
WHERE {
          ?product a bsbm-inst:ProductType1 .
          ?review bsbm:reviewFor ?product ;
                  rev:reviewer ?reviewer .
          ?reviewer bsbm:country ?country .
}

or can be managed explicitly by using:

CREATE ( SILENT )? (GRAPH IRIref | SOLUTIONS %VARNAME ( QuadData )? )
DROP ( SILENT )? (GRAPH IRIref | DEFAULT | NAMED | ALL | GRAPHS | SOLUTIONS | SOLUTIONS %VARNAME)
CLEAR ( SILENT )? (GRAPH IRIref | DEFAULT | NAMED | ALL | GRAPHS | SOLUTIONS | SOLUTIONS %VARNAME)

Named solution sets can be used with INCLUDE %solutionSet syntax mentioned above in Named Subqueries chapter.

RDF Statement Reification

Blazegraph supports extended syntax that allows attaching triples to a statement as a subject (RDF Reification). Example:

@prefix : <http://bigdata.com> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dct:  <http://purl.org/dc/elements/1.1/> .

:bob foaf:name "Bob" .
<<:bob foaf:age 23>> dct:creator <http://example.com/crawlers#c1> ;
                     dct:source <http://example.net/homepage-listing.html> .

and the query using this:

PREFIX : <http://bigdata.com>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct:  <http://purl.org/dc/elements/1.1/>

SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   <<?bob foaf:age ?age>> dct:source ?src .
}

Full-text search

Blazegraph provides an integrated full text indexing and search facility.

Example:

SELECT ?subj ?score
 WHERE {
   ?lit bds:search "mike" .
   ?lit bds:relevance ?score .
   ?subj ?p ?lit .
 }

This is translated by query optimizer into a SERVICE clause:

SELECT ?sub ?score
 WHERE {
   SERVICE <http://www.bigdata.com/rdf/search#search> {
     ?lit bds:search "mike" .
     ?lit bds:relevance ?score .
   }
   ?subj ?p ?lit .
}

External FTS engines are supported too:

PREFIX fts: <http://www.bigdata.com/rdf/fts#>
SELECT ?res WHERE {
  ?res fts:search "Alice" .
  ?res fts:endpoint "http://localhost:1234/solr/blazegraph/select" .
}

Geospatial search

Geospatial datatypes can be queried using Blazegraph’s custom SERVICE extension.

Example:

SELECT * WHERE {
  SERVICE geo:search {
    ?event geo:search "inCircle" .
    ?event geo:searchDatatype geoliteral:lat-lon-time .
    ?event geo:predicate example:happened .
    ?event geo:spatialCircleCenter "48.13743#11.57549" .
    ?event geo:spatialCircleRadius "100" . # default unit: Kilometers
    ?event geo:timeStart "1356994800" .
    ?event geo:timeEnd "1388530799" .   # 31.12.2013, 23:59:59
  }
}

RDF GAS API

Blazergaph provides set of algorithms that allow to implement (graph traversals)[https://wiki.blazegraph.com/wiki/index.php/RDF_GAS_API ]. Example for BFS search in graph:

PREFIX gas: <http://www.bigdata.com/rdf/gas#>
SELECT ?depth (count(?out) as ?cnt) {
  SERVICE gas:service {
     gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS" .
     gas:program gas:in <ip:/112.174.24.90> . # one or more times, specifies the initial frontier.
     gas:program gas:out ?out . # exactly once - will be bound to the visited vertices.
     gas:program gas:out1 ?depth . # exactly once - will be bound to the depth of the visited vertices.
     gas:program gas:maxIterations 4 . # optional limit on breadth first expansion.
     gas:program gas:maxVisited 2000 . # optional limit on the #of visited vertices.
  }
} 
group by ?depth
order by ?depth

Virtual Graphs

Graphs can be members of a virtual graph. Membership in the virtual graph can be declared as triple:

:vg bd:virtualGraph :g1
:vg bd:virtualGraph :g2

and then can be used as:

FROM VIRTUAL GRAPH :vg
FROM NAMED VIRTUAL GRAPH :vg

Truth management

Blazegraph has SPARQL Update syntax to control incremental truth maintenance and entailments:

DISABLE ENTAILMENTS;
ENABLE ENTAILMENTS;
CREATE ENTAILMENTS;
DROP ENTAILMENTS;

Query hints

Blazegraph supports query hints using magic triples in SPARQL queries. Query hints may be used to change the default behavior of the query plan generator, or the runtime evaluation of the compiled query plan.

Example:

SELECT ?x ?o
WHERE {

  # disable join order optimizer for this group graph pattern.
  hint:Query hint:optimizer "None" .

  ?x rdfs:label ?o .
  ?x rdf:type foaf:Person .
}

Hint scope can be: Query, SubQuery, Group, GroupAndSubGroups, Prior. See the docs for the full list of hints.

Kineo

Window Functions

Kineo implementes a SQL-like syntax for window functions that allow queries implementing "limit per resource", moving averages, quantiles, etc. For example:

# 3 photos from each country
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?image ?country WHERE {
	?image a foaf:Image ;
		dcterms:coverage [ foaf:name ?country ; dcterms:type "Country" ] ;
		.
}
HAVING (ROW_NUMBER() OVER (PARTITION BY ?country) <= 3)
ORDER BY ?country

This is covered by issue 47.

Comunica

Generalize SERVICE

The SERVICE can by default be used to delegate queries to other SPARQL endpoints. Comunica generalizes this behaviour by allowing different kinds of sources do be federated over, such as raw RDF documents.

SELECT *
WHERE {
  SERVICE <http://example.org/me.rdf> {
     ?s ?p ?o .
  } 
}

Related to issue 10.

Stardog

Path queries

Stardog's path queries are similar to SPARQL 1.1 property paths except that they return all intermediate nodes on each path.

Simple example: find all paths from :Alice to :Bob via :knows triples: PATHS START ?x = :Alice END ?y VIA :knows.

More complex path queries can return paths where each edge represents a complex connection between nodes in the graph, for example, when two actors co-star in a movie:

PATHS START ?x = :Kevin_Bacon END ?y = :Robert_Redford
VIA { ?movie a :Film ; :starring ?x , ?y  }

Stardog documentation

Full text search

Stardog supports special triple pattern syntax (inspired by Jena's LARQ) to integrate full-text search into SPARQL:

SELECT DISTINCT ?s ?score
WHERE {
?s ?p ?l.
(?l ?score) <tag:stardog:api:property:textMatch> 'mac'.
}

Stardog documentation

Geospatial queries

Stardog supports geospatial queries over data encoded using WGS 84 or the OGC’s GeoSPARQL vocabulary. Any RDF data stored in Stardog using one or both of these vocabularies will be automatically indexed for geospatial queries.

Stardog documentation

Query hints

Query hints in Stardog are encoded in comments using the #pragma <<hint>> syntax:

select ?s where {
  ?s :p ?o1 .
  {
    #pragma group.joins
    # these patterns will be joined first, before being joined with the other pattern
    ?s :p ?o2 .
    ?o1 :p ?o3 .
  }
}

In addition to controlling certain behaviour of the SPARQL optimiser hints can also be used change the semantics of queries, for example, it's possible to selectively turn off reasoning for parts of the query.

Stardog documentation

Graph templates in CONSTRUCT queries

Stardog supports the graph keyword in CONSTRUCT templates. Example (putting each subject into its own named graph):

CONSTRUCT { graph ?s { ?s ?p ?o } } where { ... }

Inputs and outputs of SERVICE patterns

Stardog supports use of services as functions which take inputs and return outputs, particularly, for machine learning:

SELECT * WHERE {
  graph spa:model {
      :myModel  spa:arguments (?director ?year ?studio) ;
                spa:predict ?predictedGenre .
  }

  :TheGodfather :directedBy ?director ;
                :year ?year ;
                :studio ?studio ;
                :genre ?originalGenre .
}

This query evaluates the BGP below and feeds values of ?director, ?year, and ?studio to an ML model which then computes the predicted genre of the movie and returns it via the ?predictedGenre variable. This is a deviation from the standard bottom-up SPARQL evaluation semantics according to which the graph spa:model {..} pattern could be evaluated independently.

Custom functions and aggregation functions

Both normal functions (used in FILTERs, projections, and BIND expressions) and aggregation functions can be written in any JVM-compatible language and dropped in a jar.

Interaction between property paths and named graphs

The query.pp.contexts server property controls how property paths interact with named graphs in the data. When set to true and the property path pattern is in the default scope (i.e. not inside a graph keyword), Stardog will check that paths do not span multiple named graphs (as per 18.1.7). Otherwise paths can span multiple graphs.

Virtuoso

Anytime queries

Return partial evaluation results if a time limit reached. Works for aggregates, where it can be used to return, for example, the count of matching values found within the time limit. Activated via &timeout=xxxx request parameter in the SPARQL protocol. Partial results are indicated by the presence of certain HTTP headers. Documentation

Backquoted expressions in BGPs

In triple patterns, graphs, subjects, predicates and objects can be expressions. The beginning and end of such an expression is marked by backquotes. The semantics is trivial: any backquoted epxression expr is replaced with a unique blank node _:x and a FILTER (_:x = (expr)) is added to the context BGP

Backquoted expressions in constructor templates

Like backquoted expressions in BGPs, expressions are allowed in constructor templates. The order/count of calculations of these expressions is fully arbitrary.

Property variables

In any place where SPARQL allows the use of expressions or backquoted expressions, there may be expressions of form ?var +> qNameOfProperty and ?var *> qNameOfProperty . This is syntax sugar for accessing properties of subjects. Every ?var +> qNameOfProperty is replaced with use of plain variable ?x with name composed from var and qNameOfProperty, and the group pattern related to the scope of the ?x is extended with triple pattern ?var qNameOfProperty ?x ., but it is extended this vay only once per scope regardless number of uses of expression ?var +> qNameOfProperty inside the scope . Every ?var *> qNameOfProperty is replaced in a similar way but the group pattern is extended with OPTIONAL { ?var qNameOfProperty ?x . } . The left hand of +> and *> can in turn be "property variable", so +> and *> can be "chained". Property variables were especially useful in pre-SPARQL-1.0 and pre-SPARQL-1.1 times, i.e., before property paths of SPARQL 1.1; they are still convenient when the query contains "interesting" joins and filters to find relevant subjects and "boring" retrieval of numerous property values of that subjects.

NOT FROM and NOT FROM NAMED

In absence of FROM/FROM NAMED, Virtuoso uses all available graphs as a dataset. NOT FROM <iri> excludes some specific graph from the default graph of the dataset, ditto NOT FROM NAMED <iri>.

OPTION list

Every triple pattern may get a list of compiler options as OPTION ( comma-delimited-list ) placed after last field of triple. Options are used to provide hits for SQL optimizer or details about inference, free-text, geo/spatial, transitive execution etc. Similar lists of options are supported for subqueries (mostly SQL and transitive execution) and for FROM/FROM NAMED clauses (mostly details of downloading of external documents on the fly). Option list can be placed after SERVICE service-iri without the OPTION keyword.

Compiler directives

At the very beginning of the query, define prefix:localname literal is some server-specific directive. (Some of these directives may appear as items of OPTION lists too.)

ASSUME optimization hints

The clause ASSUME ( boolean-expr ) is allowed in any place where FILTER ( boolean-expr ) is allowed. Unlike FILTER that should be checked to restrict the result set, ASSUME expression is a "promise" of the application developer to the SPARQL optimizer. The developer promises to the optimizer that the graphs of the dataset has such actual content at the time of query execution that any solution will make the expression true. The becavior of SPARQL processor is undefined on any data that may turn any ASSUME expression to false or error. ASSUME is not an assertion: in case of false ASSUME, the query MAY fail but it don't HAVE TO. SPARQL processor may silently ignore any or all ASSUME expressions.

External parameters

If variable name begins with colon, then it is an external parameter. The prefixes ?: or $: are stripped of by SPARQL compiler and the rest of the name is treated as a name of host language, unchanged. As the host language may permit more characters for writing names, the part of variable name after ?: may not match the syntax for SPARQL variables. Say, if SPARQL query may be part of SQL query or SQL stored procedure then ?:variable, ?::ODBC_parameter and ?:"variable with weird name" are all valid.

EXTRACT FROM CONSTRUCT

The clause EXTRACT { group-pattern-E } FROM CONSTRUCT { ctor-template-1 } WHERE { group-pattern-1 } UNION CONSTRUCT { ctor-template-2 } WHERE { group-pattern-2 } UNION... means: (1) create a new empty "private" graph (maybe a "virtual" one, partially or as a whole); (2) store the result of execution of every CONSTRUCT { ctor-template-i } WHERE { group-pattern-i } into that graph; (3) calculate the solutions matching group-pattern-E over that graph, these solutions are the outcome of the whole clause; (4) mark the graph for an eventual garbage collection. The use of graph group pattern in group-pattern-E is not permitted. In the list of CONSTRUCT-s, a keyword STORAGE is a shorthand for CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }, i.e. the group-pattern-E should have access to the context dataset too.

RDF Views

The RDF storage may consist of mix of "physical triples", storead as usual, and "virtual triples", composed on demand from relational tables or views. The mapping of relational data to RDF and back is described by a hierarchy of so-called "quad maps", there's a rich DDL for that purpose. SPARQL compiler chose the appropriate quad maps automatically analyzing the query, but the SPARQL syntax is also extended by QUAD MAP list-of-IRIs-of-quad-maps-to-use { body-of-group-pattern } syntax.

Macro

The query may have a list of macro definitions between preamble and the body of the query.

DEFMACRO macro-iri ( parameter-names ) macro-body makes an expression macro if macro-body is an expression or a group pattern macro if macro-body is a group pattern. When defined, macro-iri ( parameter-values ) can be used in any place where expression or a group pattern is permitted. The notation MACRO macro-iri ( parameter-values ) can be used for macro invocation, to make things clear on syntactically unambiguous.

DEFMACRO macro-iri { subject predicate object } macro-body makes a macro that is instantiated when a triple pattern matches the subject predicate object template of a macro, i.e. when every constant in template of the macro presents in same position of a triple pattern. Typical use case is "magic predicate" when subject and object are variables and predicate is the only constant of the template, but there may be "magic subjects" or magic combinations of fields, such as ?s rdf:type <my-magic-type-name> used to enumerate subjects matching some criterion. DEFMACRO macro-iri GRAPH graph { subject predicate object } macro-body and DEFMACRO macro-iri DEFAULT GRAPH { subject predicate object } macro-body variants are for use cases where graph is important.

Macro may refer to macro defined before, named libraries of macro can be created, imported to each other and of course used in queries.

OPT+

See Sijin Cheng and Olaf Hartig: OPT+: A Monotonic Alternative to OPTIONAL in SPARQL

AnzoGraph DB

Window Aggregate and Ranking Functions https://docs.cambridgesemantics.com/anzograph/v2.5/userdoc/system-window.htm

Some other extensions can be found here: https://docs.cambridgesemantics.com/anzograph/v2.5/userdoc/system.htm

Clone this wiki locally