Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier addition of support for custom datatypes to SPARQL endpoints #130

Open
JervenBolleman opened this issue Nov 6, 2020 · 16 comments
Open
Labels
help wanted Extra attention is needed

Comments

@JervenBolleman
Copy link
Collaborator

JervenBolleman commented Nov 6, 2020

Why?

Currently, most stores require significant work to add new data-types. e.g. anything beyond the inbuild XSD types requires custom code. This makes it more difficult to use datatypes as a significant determinant of meaning.

This come up as part of a solution to issue #129

Previous work

Matching of arbitrary data-types in the earlier Specifications

Proposed solution

A RDF file that lists datatypes and can be read by stores.

unit:K rdfs:subClassOf xsd:decimal ;
  rdfs:label "Kelvin"@en ;
  rdfs:comment "SI Unit for temperature" .

A file such as this adds datatype definitions and allows the mathematical functions of xsd:decimal be called with this value (sameTerm("273.1"^^unit:K + "100"^^unit:K, "373.1"^^unit:K).

Allowing casts and additions xsd numerics allows for more convenient math operations in the queries.

{
 ?x ex:temperatureMeasurement ?tempInKelvin .
 FILTER(datatype(?tempInKelvin) = unit:K)

 # adding by xsd  numerics preserves type
 BIND(?tempInKelvin + 273.15) AS ?firstStepToCelsius)
 FILTER(datatype(?firstStepToCelsius) = unit:K)

 # adding by xsd  numerics preserves type
 BIND(xsd:decimal(?firstStepToCelsius) AS ?tempInCelsiusDecimal)
 FILTER(datatype(?tempInCelsiusDecimal) = xsd:decimal)

 # adding by xsd  numerics preserves type, cast from one datatype to another must be from
 # an xsd numeric. (Custom conversion functions are a different issue)
 BIND(unit:degC(tempInCelsiusDecimal) AS ?tempInDegreeCelsius)
 FILTER(datatype(?tempInDegreeCelsius) = unit:degC)
}

Some times, custom datatypes should extend xsd:string as appropriate.

iupac:DNA rdfs:subClassOf xsd:string ;
  rdfs:label "DNA"@en ;
  rdfs:comment "An representation of a DNA sequence in encoded in IUPAC spec" .

Considerations for backward compatibility

More data in the wild will be inconvenient to use in SPARQL 1.1. endpoints.

@kasei
Copy link
Collaborator

kasei commented Nov 6, 2020

Would the use of subClassOf imply that this would be considered a derived type of decimal? I think it would be really strange to be able to do things like type-promote between Kelvin and decimal, or subtract an integer value from a Kelvin value.

@VladimirAlexiev
Copy link
Contributor

How can I express the relation between Kelvin and Celsius?

Or between meter and cm?

--

LINDT shows an example if declaring a datatype and implementing it in JS (see #129).
The real LINDT is implemented in Java.

@ericprud
Copy link
Member

ericprud commented Nov 7, 2020

Interestingly, I'm not sure °C and °K are more related than °C and °F.

supportedUnits

At a minimum, we could add a sd:supportedUnits property to the SPARQL Service Description spec. That would allow clever clients to tailor their queries to whatever the remote endpoint would support.

There's another dimension: which operators are supported. XPath provides names for e.g. lessThan, so a service description might look like:

[] a sd:Service ;
    sd:endpoint <http://www.example/sparql/> ;
    sd:supportedUnits # RDF representation of https://www.w3.org/TR/sparql11-query/#OperatorMapping
      [ sd:left ucum:m ; sd:function op:numericEquals ; sd:right ucum:ft_i ],
      [ sd:left ucum:ft_i ; sd:function op:numericEquals ; sd:right ucum:m ]
      # ...
    .

It's kinda tedious to have to write both of those, but maybe we can't assume symmetry.

unitConversions

The above wouldn't enable clever servers to do automagic conversion. We could pick some base units, e.g. MKS, and have a linear function to capture the mapping a la:

[] a sd:Service ;
    sd:endpoint <http://www.example/sparql/> ;
    sd:unitConversions u:Length, u:Mass, u:Time
    .

and centrally maintain the mappings:

u:Length u:baseUnit ucum:m ;
  u:conversion u:Foot, u:Smoot . # ...
u:Mass uLbaseUnit ucum:kg ;
  u:conversion u:Gram, u:Ton, u:LongTon, u:ShortTon, u:Tonne . # ......

u:Foot a u:conversion ; u:factor 0.3048 ; u:offset 0.0 .
u:Smoot a u:conversion ; u:factor 1.7 ; u:offset 0.0 .
...
u: Fahrenheit a u:conversion ; u:factor .555 ; u:offset -17.77 . # assuming offset follows factor.

@JervenBolleman
Copy link
Collaborator Author

@VladimirAlexiev @ericprud Easier support of conversion is easier sharing of custom functions/named queries, for which there are few issues already. I wanted to separate out sub parts of the problem to discuss one facet at a time.

@kasei I am editing the issue to expand the thought behind rdfs:subClassOf

@JervenBolleman JervenBolleman changed the title Easier addition of support for newdatatypes Easier addition of support for custom datatypes to SPARQL endpoints Nov 8, 2020
@JervenBolleman JervenBolleman added the help wanted Extra attention is needed label Nov 8, 2020
@afs
Copy link
Collaborator

afs commented Dec 4, 2020

Should the title be something like:

"Extend SPARQL Service Description to allow declaration of the supported datatypes"

?

@JervenBolleman
Copy link
Collaborator Author

@afs no, that is not what I was going for. I was going for a declarative system to declare what properties and operators a new datatype has. i.e. to extends https://www.w3.org/TR/sparql11-query/#matchArbDT. e.g. declare that a new datatype has greater than operator and how that works (in collaboration with issue #131 ) as well as how it can be cast/converted to a different datatype.

@VladimirAlexiev
Copy link
Contributor

@JervenBolleman "declare what properties and operators a new datatype has"
@ericprud "RDF representation of https://www.w3.org/TR/sparql11-query/#OperatorMapping"

I agree these would be very useful features.


@maximelefrancois86 and @Antoine-Zimmermann have proposed:


The standards (esp OWL2) have a lot on datatypes:


Can we try to flesh out a list of requirements? Eg

  • subclassing to XSD datatypes
  • hooking to SPARQL Description
  • hooking to SPARQL Operator Mapping
  • API (WebIDL or other)
  • declarative description
  • a variety of implementations
  • web-fetchable implementation
  • datatype (data range) restrictions using facets
  • central register of datatypes

@jmkeil
Copy link

jmkeil commented Feb 2, 2021

Hi. I like the idea and would like to add a requirement for consideration:

Requirement: Unambiguous definition of conversion values

In an evaluation of several unit ontologies, we identified multiple cases of wrong conversion values caused by mixing up the direction of factor and offset. This mistakes have been made by the people who defined the property. In a standard with wide application, this is even more critical. In fact, factor and offset allow four possible interpretations:

a = b × factor + offset
a = (b + offset) × factor
a × factor + offset = b
(a + offset) × factor = b

To take the example by @ericprud:

u:Fahrenheit a u:conversion ; u:factor .555 ; u:offset -17.77 . # assuming offset follows factor.

could be less ambiguously expressed e.g. in the following way:

u:Fahrenheit u:oneEquals    ".55555556"^^u:degreeCelsius ;
             u:zeroAt    "-17.77777778"^^u:degreeCelsius .

or

u:Fahrenheit u:oneEquals    .55555556 ;
             u:zeroAt    -17.77777778 ;
             u:of        u:degreeCelsius .

or similar.

@steveraysteveray
Copy link

steveraysteveray commented Feb 2, 2021 via email

@JervenBolleman
Copy link
Collaborator Author

Unit's are not the only datatypes to consider. One I would find very nice to have is conversion between dna/rna plus forward reverse strands etc. in the biological sphere.

@ericprud
Copy link
Member

ericprud commented Feb 4, 2021

With linear numeric units, we can define oneEquals and zeroAt per @jmkeil's proposal, which allows a naive, generic engine to handle these with no unit-specific code. We won't achieve that for e.g. your example of converting thymine to uracil or mapping astronomical coordinates, but I think we can still leverage datatypes and operator mappings to advertise capabilities and perform rudimentary unit analysis.

@jmkeil
Copy link

jmkeil commented Feb 4, 2021

Please take a look at this link https://github.com/qudt/qudt-public-repo/wiki/Support-for-measures-of-absolute-values-and-for-intervals-(differences)#converting-absolute-values for how I believe unit conversion with offsets should always be calculated. Your examples above seem to only have one offset value which I find confusing. Steve

Yes, there is a difference between absolute values and intervals. But I don't see which information (conversion offset, conversion multiplier) is missing to do both. (I must confess - the property name "oneEquals" does not fit very well to absolute value conversion.) Of course, it would be good practice to define all units in reference to SI base units and I would expect an implementation to combine at least two conversion definitions for conversion between not directly connected units.

One thing, what is missing, is the information when to apply which conversion (absolute or interval). One solution I could think of, is the definition of two datatypes (e.g. u:degreeCelsiusAbsolute and u:degreeCelsiusInterval). However, this raises new problems on the definition of basic calculations:

  • absolute - absolute = interval
  • absolute + absolute = ?
  • absolute ± interval = absolute
  • interval ± interval = interval

@steveraysteveray
Copy link

Quoting from the QUDT wiki,

"To support this, the qudt:Quantity class has a property qudt:isDeltaQuantity. It is associated with the qudt:Quantity because the quantity describes the context of the measurement. qudt:isDeltaQuantity is a boolean property to record whether the measurement is an absolute value of the Quantity instance, or a delta (or difference) value. Setting isDeltaQuantity to "true" means the measurement is an interval. isDeltaQuantity set to "false" means the measurement is an absolute value. An application can then take the appropriate action, such as in unit conversion, etc.

It should be noted that in these cases, the unit is still the same unit on the same scale. There is nothing special about the unit."

@jmkeil
Copy link

jmkeil commented Feb 5, 2021

Quoting from the QUDT wiki,

"To support this, the qudt:Quantity class has a property qudt:isDeltaQuantity. It is associated with the qudt:Quantity because the quantity describes the context of the measurement. qudt:isDeltaQuantity is a boolean property to record whether the measurement is an absolute value of the Quantity instance, or a delta (or difference) value. Setting isDeltaQuantity to "true" means the measurement is an interval. isDeltaQuantity set to "false" means the measurement is an absolute value. An application can then take the appropriate action, such as in unit conversion, etc.

It should be noted that in these cases, the unit is still the same unit on the same scale. There is nothing special about the unit."

That works out, if a quantity-value is represented using an individual (represented by IRI or blank node) of a class and with several properties. To represent them as a literal (e.g. "37"^^u:degreeCelsius), it isn't an option to add further property).

@VladimirAlexiev
Copy link
Contributor

@jmkeil @steveraysteveray @ericprud please make a separate issue for discussing conversions (and take a look at the LINDT issue here that's closely related).

Otherwise your valuable comments will be lost in this issue, which is about a different (though related) topic.

@VladimirAlexiev
Copy link
Contributor

SciSPARQL is a specialized implementation that includes matrices and tensors and I think will pose strong requirements on this issue.
Eg see
https://ieeexplore.ieee.org/document/6313648 by Andrej Andrejev and Tore Risch.
Does anyone know how to contact them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants