Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL engine incorrectly permits backslash-escapes #1871

Open
ajnelson-nist opened this issue Apr 25, 2022 · 1 comment
Open

SPARQL engine incorrectly permits backslash-escapes #1871

ajnelson-nist opened this issue Apr 25, 2022 · 1 comment
Labels

Comments

@ajnelson-nist
Copy link
Contributor

I've recently been exploring usability of forward-slashes in concept identifiers. The particular use case, which I consider out of scope of this issue except as noting for background, is IANA Media Types as prefixed IRIs.

I got an odd behavior from RDFLib when making an assumption about how close SPARQL's syntax was to Turtle. After a BNF grammar dive, I believe there's a bug in the RDFLib SPARQL engine.

The short of the issue is that:

  • In Turtle, a prefixed concept's suffix can include a forward-slash, but it must be escaped with a single backslash.
  • In JSON-LD, a prefixed concept's suffix can include a forward-slash, and it does not need a backslash to escape.
  • In SPARQL, a prefixed concept's suffix cannot include a forward-slash. There is no capacity to backslash-escape characters outside of strings. Further, forward-slash is the path operator, so not escaping it would be unhelpful.

I've drafted unit tests with spec. citations and XFAILs that capture how it seems to me the engine should behave, coming in a PR in a few moments. That PR does not have corrections to the SPARQL parse engine. I suspect corrections there would be better left to the RDFLib maintainers to implement, if we agree the tests are correct.

The current RDFLib SPARQL parse behavior carries an escaping backslash into a URIRef. E.g. this query:

PREFIX ex: <http://example.org/ontology/>
SELECT ?nIndividual
WHERE {
  ?nIndividual a ex:core\/MyClassB .
}

generates this warning:

WARNING  rdflib.term:term.py:275 http://example.org/ontology/core\/MyClassB does not look like a valid URI, trying to serialize this will break.

The SPARQL specification's terminals grammar does not include backslash (reminder: ord 92, 0x5c) between PN_LOCAL and PN_BASE. So, IMO, this should be a parse error.

It seems, from backslash discussion, this is slightly related to Issue 1395. The present issue is at least a warning for how far backslashes can be taken.

ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Apr 25, 2022
…nd SPARQL

Some correction is needed in the SPARQL engine to get these tests to
pass.

References:
* RDFLib#1871

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ajnelson-nist/rdflib that referenced this issue Apr 25, 2022
…nd SPARQL

Some correction is needed in the SPARQL engine to get these tests to
pass.

References:
* RDFLib#1871

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

It's currently looking like this Issue should have a second associated PR to follow on to #1872 . The second PR should add a strict flag for SPARQL query processing, specifically to catch the backslash character behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants