From 994035be0e3e5b5240655cb75dd6faf34effc500 Mon Sep 17 00:00:00 2001 From: Gregg Kellogg Date: Wed, 28 Jun 2023 17:06:04 -0700 Subject: [PATCH] Adds BNF and text to extend LANGTAG to support base direction and term construtors for creating directional language-tagged strings. Fixes #32. --- spec/index.html | 47 ++++++++++++++++++++++++++++++++---------- spec/ntriples-bnf.html | 2 +- spec/ntriples.bnf | 2 +- 3 files changed, 38 insertions(+), 13 deletions(-) diff --git a/spec/index.html b/spec/index.html index 9316096..8c680be 100644 --- a/spec/index.html +++ b/spec/index.html @@ -85,7 +85,9 @@ subject or object of another triple, - making it possible to make statements about other statements.

+ making it possible to make statements about other statements. + RDF 1.2 N-Triples also adds support for + directional language-tagged strings.

@@ -231,7 +233,11 @@

RDF Literals

are used to identify values such as strings, numbers, dates.

Literals (Grammar production Literal) - have a lexical form followed by a language tag, a datatype IRI, or neither.

+ have a lexical form followed by a + language tag + (possibly including base direction), + a datatype IRI, + or neither.

The representation of the lexical form consists of an initial delimiter " (U+0022), @@ -252,6 +258,9 @@

RDF Literals

is the characters between the delimiters, after processing any escape sequences. If present, the language tag is preceded by a '@' (U+0040). + If present, the base direction + is included in the language tag + separated by '--'. If there is no language tag, there may be a datatype IRI, preceded by two concatenated ^ characters, each having the code point U+005E. @@ -267,8 +276,9 @@

RDF Literals

"That Seventies Show"^^ . # literal with XML Schema string datatype "That Seventies Show" . # same as above "That Seventies Show"@en . # literal with a language tag + "That Seventies Show"@en-ltr . # literal with a language tag and base direction "Cette Série des Années Septante"@fr-be . # literal outside of ASCII range with a region subtag - "This is a multi-linenliteral with many quotes (""""")nand two apostrophes ('')." . + "This is a multi-linenliteral with many quotes (""""") and two apostrophes ('')." . "2"^^ . # xsd:integer "1.663E-4"^^ . # xsd:double --> @@ -325,7 +335,7 @@

A Canonical form of N-Triples

any of which MUST be a single space (U+0020).
  • Literals with the datatype http://www.w3.org/2001/XMLSchema#string - MUST NOT use the datatype IRI part of the literal, + MUST NOT use the datatype IRI part of the literal, and are represented using only STRING_LITERAL_QUOTE.
  • HEX MUST use only uppercase letters ([A-F]).
  • @@ -498,7 +508,9 @@

    RDF Term Constructors

    language tag - The characters following the @ form the unicode string of the language tag. + The characters following the @ form the unicode string of the language tag + and optionally the base direction, + if the matched characters include '--'. @@ -523,13 +535,24 @@

    RDF Term Constructors

    The literal has a lexical form of the first rule argument, STRING_LITERAL_QUOTE, - and either a language tag of LANGTAG - or a datatype IRI of iri, + and either a language tag + with optional base direction + from LANGTAG + or a datatype IRI of iri, depending on which rule matched the input. - If the LANGTAG rule matched, + If the LANGTAG rule matched, + it is split into language tag + and base direction + on '--'. + If there is no base direction, the datatype is rdf:langString - and the language tag is LANGTAG. - If neither a language tag nor a datatype IRI is provided, + and the language tag is LANGTAG. + If there is a base direction, the datatype is rdf:dirLangString, + the language tag is + taken from the portion of the matched LANGTAG proceding '--' + and the base direction + is taken from the portion of the matched LANGTAG following '--'. + If neither a language tag nor a datatype IRI is provided, the literal has a datatype of xsd:string. @@ -743,8 +766,10 @@

    Changes between RDF 1.1 and RDF 1.2

  • Separated from and updated language.
  • Changes to clarify - use of datatype IRIs and expand the use of escapes + use of datatype IRIs and expand the use of escapes in literals.
  • +
  • Extends LANGTAG to include + an optional base direction.
  • diff --git a/spec/ntriples-bnf.html b/spec/ntriples-bnf.html index a6cc0f1..e958e46 100644 --- a/spec/ntriples-bnf.html +++ b/spec/ntriples-bnf.html @@ -64,7 +64,7 @@

    Productions for terminals

    [11] LANGTAG ::= - "@" [a-zA-Z]+ ("-" [a-zA-Z0-9]+)* + "@" [a-zA-Z]+ ("-" [a-zA-Z0-9]+)* ("--" [a-zA-Z0-9]+)? [12] diff --git a/spec/ntriples.bnf b/spec/ntriples.bnf index ea4a578..8b94054 100644 --- a/spec/ntriples.bnf +++ b/spec/ntriples.bnf @@ -10,7 +10,7 @@ quotedTriple ::= '<<' subject predicate object '>>' IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>' BLANK_NODE_LABEL ::= '_:' ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)? -LANGTAG ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )* +LANGTAG ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )* ( "--" [a-zA-Z0-9]+ )? STRING_LITERAL_QUOTE ::= '"' ( [^#x22#x5C#xA#xD] | ECHAR | UCHAR )* '"' UCHAR ::= ( "\u" HEX HEX HEX HEX ) | ( "\U" HEX HEX HEX HEX HEX HEX HEX HEX )