SimpleIOHandler prints xsd datatype as an abreviated string, which is subtly off #54

JervenBolleman · 2024-05-02T12:24:13Z

Dear BioPax developers,

On line 669 of the SimpleIOHandler the datatype print is slightly off.

Using a tool like rapper or riot to convert the output of the rdf/xml to n-triples, we see the difference (example from rhea-db)

<http://biopax.rhea-db.org/level3/57967_stoichiometry_right_3249> 
  <http://www.biopax.org/release/biopax-level3.owl#stoichiometricCoefficient> 
     "1.0"^^<xsd:float> .

was printed but we expect

<http://biopax.rhea-db.org/level3/57967_stoichiometry_right_3249>
  <http://www.biopax.org/release/biopax-level3.owl#stoichiometricCoefficient> 
    "1.0"^^<http://www.w3.org/2001/XMLSchema#float> .

Quite a lot of tools auto-correct this difference down stream. But if one uses a tool that is spec rdf/xml compliant this gives a difference when doing SPARQL values comparison. Changing from a numeric comparison to a string comparison.

We will open a PR with this fix soon.

Regards,
Jerven

The text was updated successfully, but these errors were encountered:

…ibute

IgorRodchenkov · 2024-05-03T13:19:01Z

Why using xsd:float or xsd:string in all those cases is wrong? I think it is correct, and that older versions of Paxtools printed long “<http://www.w3.org/2001/XMLSchema#float” over and over, making BioPAX file larger… I suspect the problem is somewhere in those other tools that you use. Tools should understand different styles RDFXML syntax rather that expect a particular one… Also, as I know, e.g. Jena has bugs, e.g. uses some weird rules rather than standard to tell valud local uri from valid absolute uri, adds xml:base unnecessarily or inserts ‘#’ sign into absolute uri, etc…

JervenBolleman · 2024-05-03T14:09:21Z

I think it is a bug. The other tools are compliant in how they interpret the string, this also includes RDFlib and RDF4j. Why it does not show up as a bug for most people is because the the value "xsd:float" is also an IRI. Just not the IRI that one expects "http://www.w3.org/2001/XMLSchema#float". This is because rdf/xml uses xml infoset for iri expansion and that only works on elements and not on attributes (other than the xml:base setting which does not apply here).

Now this only matters if someone actually uses RDF tools like SPARQL. Which not many people might, but we do.

So currently the full iri of the datatype is just <xsd:float> not <http://www.w3.org/2001/XMLSchema#float>, this is an unknown non standard datatype which means sparql uses string comparison for ordering and worse. When writing a sparql query things don't match which on first sight should.

e.g. when writing a sparql query like this

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#float>
PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#>
SELECT * 
WHERE {
 ?x biopax3:stoichiometricCoefficient "1.0"^^xsd:float .
}

fails to return with the current biopax file.
The below would work

PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#>
SELECT * 
WHERE {
 ?x biopax3:stoichiometricCoefficient "1.0"^^<xsd:float> .
}

but that is because

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#float>
SELECT ?same
WHERE { BIND((<xdf:float> = xsd:float) AS ?same) }

I saw the comments to the pull request about this being super long and ugly, Which I understand and feel. In that case I think we can use an XML entity.

<?xml version="1.0"?>

<!DOCTYPE rdf [
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">
]>

...
 <bp:comment rdf:datatype="&xsd;string"
...

And because we live in the world of RDF 1.1 we can actually delete all the xsd:string attributes.

Of course you don't need to accept either option nor the pull request, I just hate opening issues in an important but underfunded project without trying to make life as easy as possible for the kind people maintaining it.

For us it is trivial to "patch" the files before publishing. Because all the other work in new updates in Paxtools are totally worth that small hassle. I just want to avoid someone else wondering about why their query doesn't work when it all looks right.

…ntities to make long IRI's for xsd datatype references without introducing a lot of bytes

JervenBolleman · 2024-05-07T09:04:21Z

Wanted to add that you can see the effect using comunica click the execute button. And see two datatypes. One for the "" as given and one extracted from the hcyc.owl test file.

IgorRodchenkov · 2024-05-09T13:57:24Z

Oh, I get it - admit that I (re-)introduced this regression bug in 2023-07; it seems made it to Paxtools release v5.3.0 and v5.2.1 too (older v5.1.0 seems fine...)

I like the first PR #55 more (small fix), with comments.

JervenBolleman · 2024-05-11T13:09:02Z

@IgorRodchenkov thank you! While not that important you might also want to pick up the part of the second patch I proposed regarding skipping doctype and entity declarations. It will give you more flexibility in the future.

IgorRodchenkov · 2024-05-12T00:57:48Z

Thanks for reporting, @JervenBolleman!

I am likely actually going to make that change to skip printing rdf:datatype attributes for "string" type/range BioPAX OWL properties. Just trying to do some more research and confirm if it's safe (it definitely makes files smaller!) But I am stll not so sure...

May I ask you which dataset were you working with? Were you just converting from existing BioPAX RDF/XML datafile to n-triples or were you building own BioPAX model with Paxtools (6.0.0-SNAPSHOT?), outputting to RDF/XML, then converting?

…to v3 (due to weird jsonld conversion issues when using jena 4 or 5).

IgorRodchenkov · 2024-05-12T20:58:12Z

OMG... It seems that Paxtools IO should not write to the output OWL (RDF/XML) files any of those rdf:datatype attributes at all for the BioPAX model property values (e.g. bp:name, bp:db, bp:id etc. primitive props) because they are well defined in the BioPAX L3 specification (which we import into every such file and also use as namespace prefix, usually "bp").
See BioPAX specs: http://www.biopax.org/release/biopax-level3.owl# and e.g. like <rdfs:range rdf:resource="&xsd;string"/> definitions there for OWL props...

JervenBolleman · 2024-05-13T07:37:37Z

@IgorRodchenkov the specification/owl import says what the range of things are. However, the SPARQL and RDF layers don't care about that. Even OWL with datatype reasoning might not do what you expect. I think it as likely that an OWL reasoner would say that the pax file is inconsistent with the definition as it would be to coeerce a values datatype to a different one. In any case before you implement that may I suggest testing it out first, with a wide variety of tools.

JervenBolleman · 2024-05-13T07:40:27Z

We are updating the code that is used to generate the biopax level2 of Rhea and noticed the change in output.

IgorRodchenkov · 2024-05-13T22:07:58Z

@JervenBolleman please do not use BioPAX Level2 use Level3 if you can (Paxtools can in fact autoconvert L2 to L3).
I made an experimantal demo file w/o any rdf:datatype in there; could you test if it's fine, works with you?
demo-pathway.zip

PS: as to "testing it out first, with a wide variety of tools...", IMHO, unlikely we could ever satisfy all the tools that might also have bugs... As long as we have valid OWL model (written as RDF/XML), we should be good - others can transform/postfix/mixin the data for their needs. Unfortunately, we also had to write lots of code to semantically validate and "clean" existing BioPAX data that we integrate/merge into Pathway Commons model (problems include wrong use of biopax properties, bad/nonsense/misspelled identifier collection names (in xref.db), unification xrefs attached to wrong type of individuals/objects, etc...)

JervenBolleman added a commit to JervenBolleman/Paxtools that referenced this issue May 2, 2024

BioPAXGH-54 XSD Datatype should use full IRI in rdf/xml datatype attr…

418beee

…ibute

JervenBolleman mentioned this issue May 2, 2024

GH-54 XSD Datatype should use full IRI in rdf/xml datatype attribute #55

Closed

JervenBolleman added a commit to JervenBolleman/Paxtools that referenced this issue May 5, 2024

BioPAXGH-54 allow using RDF 1.1 (no need for xsd:string) and/or XML e…

2daa42e

…ntities to make long IRI's for xsd datatype references without introducing a lot of bytes

JervenBolleman mentioned this issue May 5, 2024

GH-54 allow using RDF 1.1 (no need for xsd:string) and/or XML entitie… #56

Closed

IgorRodchenkov closed this as completed in 305ff23 May 10, 2024

IgorRodchenkov added a commit that referenced this issue May 12, 2024

Followed up with #54 - added tests and reverted Jena dependency back …

66e012b

…to v3 (due to weird jsonld conversion issues when using jena 4 or 5).

IgorRodchenkov added a commit that referenced this issue May 12, 2024

Followed #54 - polished tests and added more comments.

d70667f

IgorRodchenkov added a commit that referenced this issue May 13, 2024

Fixed #54 for v5.2.x branch

393ad83

IgorRodchenkov added a commit that referenced this issue May 13, 2024

Polished #54 fix (just in case the type would be null/blank...)

ef1d3e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SimpleIOHandler prints xsd datatype as an abreviated string, which is subtly off #54

SimpleIOHandler prints xsd datatype as an abreviated string, which is subtly off #54

JervenBolleman commented May 2, 2024

IgorRodchenkov commented May 3, 2024 via email •

edited

Loading

JervenBolleman commented May 3, 2024

JervenBolleman commented May 7, 2024

IgorRodchenkov commented May 9, 2024 •

edited

Loading

JervenBolleman commented May 11, 2024

IgorRodchenkov commented May 12, 2024

IgorRodchenkov commented May 12, 2024

JervenBolleman commented May 13, 2024

JervenBolleman commented May 13, 2024

IgorRodchenkov commented May 13, 2024 •

edited

Loading

SimpleIOHandler prints xsd datatype as an abreviated string, which is subtly off #54

SimpleIOHandler prints xsd datatype as an abreviated string, which is subtly off #54

Comments

JervenBolleman commented May 2, 2024

IgorRodchenkov commented May 3, 2024 via email • edited Loading

JervenBolleman commented May 3, 2024

JervenBolleman commented May 7, 2024

IgorRodchenkov commented May 9, 2024 • edited Loading

JervenBolleman commented May 11, 2024

IgorRodchenkov commented May 12, 2024

IgorRodchenkov commented May 12, 2024

JervenBolleman commented May 13, 2024

JervenBolleman commented May 13, 2024

IgorRodchenkov commented May 13, 2024 • edited Loading

IgorRodchenkov commented May 3, 2024 via email •

edited

Loading

IgorRodchenkov commented May 9, 2024 •

edited

Loading

IgorRodchenkov commented May 13, 2024 •

edited

Loading