Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds BNF and text to extend LANGTAG to support base direction #34

Merged
merged 11 commits into from
Oct 13, 2023

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Jun 29, 2023

… and term construtors for creating directional language-tagged strings.

Fixes #32. Depends on w3c/rdf-concepts#48.


Preview | Diff

@gkellogg gkellogg added the spec:substantive Issue or proposed change in the spec that changes its normative content label Jun 29, 2023
@gkellogg gkellogg requested review from afs and domel June 29, 2023 00:07
@gkellogg
Copy link
Member Author

Considering the fuzziness of the updated LANGTAG, we might want to consider splitting the terminals and use a grammar more like the following:

literal           ::= STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG DIRECTION? )?
LANGTAG           ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )*
DIRECTION         ::= "--" [a-zA-Z0-9]+

The current text makes the term language tag fuzzy, as it is both associated with the language tag from RDF Concepts and the LANGTAG terminal production.

@Tpt
Copy link

Tpt commented Jun 29, 2023

Considering the fuzziness of the updated LANGTAG, we might want to consider splitting the terminals and use a grammar more like the following:

literal           ::= STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG DIRECTION? )?
LANGTAG           ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )*
DIRECTION         ::= "--" [a-zA-Z0-9]+

I am not sure this rewriting works. It makes both DIRECTION and LANGTAG separated terminals so it allows whitespaces between them making "test"@en --ltr correct.

spec/ntriples.bnf Outdated Show resolved Hide resolved
@afs
Copy link
Contributor

afs commented Jun 29, 2023

I agree with @Tpt. We should not allow white space.

In Turtle, WS includes newlines

"test"
@en
--ltr

In addition to appearance, it has the effect that the token DIRECTION may be recognized on it's own. It makes the use of any the character sequence -- anywhere as a future feature problematic. This already applies to @, which is why we have the text about @prefix and @base in Turtle. It is unfortunate that white space was allowed between " and @ at all but that was many years ago.

@afs
Copy link
Contributor

afs commented Jun 29, 2023

If the concern that LANGTAG is now fuzzy is widely shared, then rename the token as LANGDIR or LANGTAGDIR (pref: the former - don't explicitly mention TAG)

As direction only applies when a language tag is present, calling it LANGTAG does not concern me because the constructor text explains the situation. If the general feeling is that a new name is better, than that's fine by me as well.

@gkellogg
Copy link
Member Author

IMO, the fuzziness is more about the use of language tag to refer to the value matched by LANGTAG as well as the concept of a language tag. It can be solved with some more creative editing.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
Copy link
Contributor

@afs afs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LANGDIR is used in the current latest but github isn't clear for me so I mentioned it.

I did find class="type langTag in the doc.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@gkellogg gkellogg requested a review from afs July 3, 2023 22:47
@TallTed
Copy link
Member

TallTed commented Jul 6, 2023

Circumflex accent is not the same as caret, remove that before ^^.

It appears that there is not quite such a strict line between caret and circumflex.

https://en.wikipedia.org/wiki/Caret

Caret is the name used familiarly for the character ^, provided on most QWERTY keyboards by typing ⇧ Shift+6. The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreader's caret, a mark used in proofreading to indicate where a punctuation mark, word, or phrase should be inserted into a document. The formal ASCII standard (X3.64.1977) calls it a "circumflex".[1]

I think we should be more explicit in the descriptive text, as well as in the Unicode character notation. We should state that the ^^ are two Unicode CIRCUMFLEX ACCENT (not to be confused with the visually similar Unicode CARET, ; COMBINING CIRCUMFLEX ACCENT,  ̂; or FULLWIDTH CIRCUMFLEX ACCENT, ).

@afs
Copy link
Contributor

afs commented Jul 6, 2023

There is a non-blocking issue with renaming LANGTAG as LANGDIR.

LANGDIR is a function name proposed for the accessor for the language direction. This fits into the general style of function naming in SPARQL (c.f. LANGMATCHES, isURI) and SPARQL already has LANG for the language tag accessor.

Possibilities:

  • use LANG_DIR as the token name. This also reflects that fact that the token syntax has two parts. _ is already used in some token names.
  • use LANGTAG_DIR or LANGTAGDIR for the token name.

@TallTed
Copy link
Member

TallTed commented Jul 13, 2023

Blocked by rdf-concepts #48

spec/index.html Outdated
is separated from the <a data-cite="RDF12-CONCEPTS#dfn-language-tag">language tag</a> by '<code>--</code>'.
If there is no matched <a href="#grammar-production-LANG_DIR" class="type langDir">LANG_DIR</a> terminal,
there may be a <a data-cite="RDF12-CONCEPTS#dfn-datatype-iri">datatype IRI</a>,
preceded by '<code>^^</code>'
Copy link
Member

@TallTed TallTed Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar notation of these Unicode code points should be added to the EBNF, where these characters are currently specified only visually, as '^^', which is not sufficiently specific.

Suggested change
preceded by '<code>^^</code>'
preceded by two circumflex accent characters, '<code>^^</code>'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another "circumflex accent".

As a single name isn't clear, just say that "^ is codepoint U+005E".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far as I have found, while there are other characters where "circumflex accent" is part of their name, there is no other character whose full name is "circumflex accent".

If I'm just searching badly, please provide the codepoint(s) of such other character(s).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggested wording in w3c/rdf-turtle#34 (comment).

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be more of this. Trying to make all the specs spell out the special characters and sequences in the same way.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
gkellogg and others added 6 commits September 25, 2023 12:48
…m construtors for creating directional language-tagged strings.

Fixes #32.
Co-authored-by: Andy Seaborne <andy@apache.org>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
@gkellogg gkellogg merged commit 1aaf1f0 into main Oct 13, 2023
2 checks passed
@gkellogg gkellogg deleted the base-direction branch October 13, 2023 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:substantive Issue or proposed change in the spec that changes its normative content
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for base direction
5 participants