Skip to content

Inventory of existing extensions to SPARQL 1.1

Zachary Whitley edited this page Apr 8, 2019 · 30 revisions

An inventory of existing extensions

Eclipse RDF4J

less strict datatype restrictions in comparison operators

Example:

FILTER("1999"^^xsd:gYear < "2009-01-01T20:20:20Z"^^xsd:dateTime) => true (strict semantics is type error)

extended mathematics operations on date/time/duration datatypes

Examples:

"2013-11"^^xsd:gYearMonth + "P1Y1M"^^xsd:yearMonthDuration => "P2Y1M"^^xsd:yearMonthDuration
"12"^^xsd:Integer * "P1Y"^^:xsd:yearMonthDuration => "P12Y"^^xsd:yearMonthDuration

GeoSPARQL operators

See GeSPARQL specs

full-text search extensions

Example:

?subj search:matches [
          search:query "search terms...";
          search:property my:property;
          search:score ?score;
          search:snippet ?snippet ] .

Apache Jena

XSD Datatypes

Jena provides the functions from "XPath and XQuery Functions and Operators 3.1" for the atomic (so not sequences), non-XML-related datatypes expect for the picture string formatting operations (no contribution and no requests so far). It does provide a "afn:sprintf" with the more programmer-centric formatting strings.

Property Functions

Example:

?segment apf:strSplit (?s ", ")

This is covered by issue 6.

The list syntax is reused to become multiple arguments/results for the operation.

CONSTRUCT quads

https://jena.apache.org/documentation/query/construct-quad.html

CONSTRUCT {
    GRAPH :g { ?s :p ?o }
    :s ?p :o
} WHERE {  ... }

Generate JSON

https://jena.apache.org/documentation/query/generate-json-from-sparql.html

JSON {
  "author": ?author, 
  "title": ?title 
} WHERE { ... }

Dynamic function call

BIND(CALL(?x, ?y) AS ?z)

See issue #20

Statistics Aggregators

STDEV, STDEV_SAMP, STDEV_POP, VARIANCE, VAR_SAMP, VAR_POP following the SQL operations.

Dataset HTTP verbs

Fuseki provides for GET, PUT, POST on the whole dataset treated as quads.

GraphDB

GraphDB documentation

Enterprise Connectors

Connectors to Lucene, SOLR, Elastic that implement FTS, sorting, limit/offset, snippet extraction (hit highlighting), facets (including hierarchical), aggregations, sub-aggregations

Mongo Connector

GraphDB-Mongo Integration so you can store voluminous JSON in Mongo, and take only relevant parts of them as JSON-LD to GraphDB.

Reasoning Control

Builtin graphs onto:implicit and onto:explicit return only inferred/explicit triples respectively. This is similar to SPARQL Entailment regimes but not compatible. Delete optimization: inferred triples without support are retracted (no full re-infer is needed). Predicate onto:schemaTransaction is used to mark axiomatic (T-Box) triples to make this process more efficient. sameAs optimization: sameAs-equivalent URLs are treated as a cluster, and combinatorial triple expansion is avoided. Graph onto:disable-sameAs controls whether such triples should be returned in result sets (whether the clustered URLs should be enumerated).

Explain Plan

Explain Plan: Graph onto:explain returns a query plan, instead of actually executing the query

Semantic Similarity

Semantic similarity searches based on text and graph (predication) vector embedding (distributional semantics)

Blazegraph

The documentation for Blazegraph extensions can be found on Blazegraph wiki.

Named Subqueries

Named subqueries let you pre-compute solution sets which may be used multiple times within your query. They are useful when you want to process some subset of your data in multiple ways within a single query. You may also have multiple named subqueries. Each named subquery result can be INCLUDEd into the query in one or more places. The solution sets will be stored on the native heap (HTree) if the analytic query mode is enabled.

SELECT ...
WITH {
	# Subquery goes here
} AS %NAME
WHERE {
	# Main query goes here
	INCLUDE %NAME
}

Named solutions sets

Blazegraph supports persistent named solution sets, which can be created either with INSERT INTO ... SELECT syntax:

INSERT INTO %solutionSet1
SELECT ?product ?reviewer
WHERE {
          ?product a bsbm-inst:ProductType1 .
          ?review bsbm:reviewFor ?product ;
                  rev:reviewer ?reviewer .
          ?reviewer bsbm:country ?country .
}

or can be managed explicitly by using:

CREATE ( SILENT )? (GRAPH IRIref | SOLUTIONS %VARNAME ( QuadData )? )
DROP ( SILENT )? (GRAPH IRIref | DEFAULT | NAMED | ALL | GRAPHS | SOLUTIONS | SOLUTIONS %VARNAME)
CLEAR ( SILENT )? (GRAPH IRIref | DEFAULT | NAMED | ALL | GRAPHS | SOLUTIONS | SOLUTIONS %VARNAME)

Named solution sets can be used with INCLUDE %solutionSet syntax mentioned above in Named Subqueries chapter.

RDF Statement Reification

Blazegraph supports extended syntax that allows attaching triples to a statement as a subject (RDF Reification). Example:

@prefix : <http://bigdata.com> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dct:  <http://purl.org/dc/elements/1.1/> .

:bob foaf:name "Bob" .
<<:bob foaf:age 23>> dct:creator <http://example.com/crawlers#c1> ;
                     dct:source <http://example.net/homepage-listing.html> .

and the query using this:

PREFIX : <http://bigdata.com>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct:  <http://purl.org/dc/elements/1.1/>

SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   <<?bob foaf:age ?age>> dct:source ?src .
}

Full-text search

Blazegraph provides an integrated full text indexing and search facility.

Example:

SELECT ?subj ?score
 WHERE {
   ?lit bds:search "mike" .
   ?lit bds:relevance ?score .
   ?subj ?p ?lit .
 }

This is translated by query optimizer into a SERVICE clause:

SELECT ?sub ?score
 WHERE {
   SERVICE <http://www.bigdata.com/rdf/search#search> {
     ?lit bds:search "mike" .
     ?lit bds:relevance ?score .
   }
   ?subj ?p ?lit .
}

External FTS engines are supported too:

PREFIX fts: <http://www.bigdata.com/rdf/fts#>
SELECT ?res WHERE {
  ?res fts:search "Alice" .
  ?res fts:endpoint "http://localhost:1234/solr/blazegraph/select" .
}

Geospatial search

Geospatial datatypes can be queried using Blazegraph’s custom SERVICE extension.

Example:

SELECT * WHERE {
  SERVICE geo:search {
    ?event geo:search "inCircle" .
    ?event geo:searchDatatype geoliteral:lat-lon-time .
    ?event geo:predicate example:happened .
    ?event geo:spatialCircleCenter "48.13743#11.57549" .
    ?event geo:spatialCircleRadius "100" . # default unit: Kilometers
    ?event geo:timeStart "1356994800" .
    ?event geo:timeEnd "1388530799" .   # 31.12.2013, 23:59:59
  }
}

RDF GAS API

Blazergaph provides set of algorithms that allow to implement (graph traversals)[https://wiki.blazegraph.com/wiki/index.php/RDF_GAS_API ]. Example for BFS search in graph:

PREFIX gas: <http://www.bigdata.com/rdf/gas#>
SELECT ?depth (count(?out) as ?cnt) {
  SERVICE gas:service {
     gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS" .
     gas:program gas:in <ip:/112.174.24.90> . # one or more times, specifies the initial frontier.
     gas:program gas:out ?out . # exactly once - will be bound to the visited vertices.
     gas:program gas:out1 ?depth . # exactly once - will be bound to the depth of the visited vertices.
     gas:program gas:maxIterations 4 . # optional limit on breadth first expansion.
     gas:program gas:maxVisited 2000 . # optional limit on the #of visited vertices.
  }
} 
group by ?depth
order by ?depth

Virtual Graphs

Graphs can be members of a virtual graph. Membership in the virtual graph can be declared as triple:

:vg bd:virtualGraph :g1
:vg bd:virtualGraph :g2

and then can be used as:

FROM VIRTUAL GRAPH :vg
FROM NAMED VIRTUAL GRAPH :vg

Truth management

Blazegraph has SPARQL Update syntax to control incremental truth maintenance and entailments:

DISABLE ENTAILMENTS;
ENABLE ENTAILMENTS;
CREATE ENTAILMENTS;
DROP ENTAILMENTS;

Query hints

Blazegraph supports query hints using magic triples in SPARQL queries. Query hints may be used to change the default behavior of the query plan generator, or the runtime evaluation of the compiled query plan.

Example:

SELECT ?x ?o
WHERE {

  # disable join order optimizer for this group graph pattern.
  hint:Query hint:optimizer "None" .

  ?x rdfs:label ?o .
  ?x rdf:type foaf:Person .
}

Hint scope can be: Query, SubQuery, Group, GroupAndSubGroups, Prior. See the docs for the full list of hints.

Kineo

Window Functions

Kineo implementes a SQL-like syntax for window functions that allow queries implementing "limit per resource", moving averages, quantiles, etc. For example:

# 3 photos from each country
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?image ?country WHERE {
	?image a foaf:Image ;
		dcterms:coverage [ foaf:name ?country ; dcterms:type "Country" ] ;
		.
}
HAVING (ROW_NUMBER() OVER (PARTITION BY ?country) <= 3)
ORDER BY ?country

This is covered by issue 47.

Comunica

Generalize SERVICE

The SERVICE can by default be used to delegate queries to other SPARQL endpoints. Comunica generalizes this behaviour by allowing different kinds of sources do be federated over, such as raw RDF documents.

SELECT *
WHERE {
  SERVICE <http://example.org/me.rdf> {
     ?s ?p ?o .
  } 
}

Related to issue 10.

Stardog

Path queries

Stardog documentation

find paths between nodes in the RDF graph

START ?s [ = <IRI> | <GRAPH PATTERN> ] END ?e [ = <IRI> | <GRAPH PATTERN> ]
VIA <GRAPH PATTERN> | <VAR> | <PATH>
[MAX LENGTH <int>]
[ORDER BY <condition>]
[OFFSET <int>]
[LIMIT <int>]

Full text search

Stardog documentation

GeoSPARQL

Stardog documentation

Query hints

Stardog documentation

Virtual graphs

Stardog documentation

Machine learning

Stardog documentation