Unable to parse valid W3C EBNF #43

shellscape · 2022-07-13T13:42:37Z

The grammar located here https://github.com/transpect/css-tools/blob/master/ebnf-scheme/CSS3.ebnf is valid W3C EBNF, as verified on railroad https://bottlecaps.de/rr/ui. This package throws an error that it could not parse the grammar at /node_modules/ebnf/dist/Grammars/W3CEBNF.js:288:19.

So it looks like there are some compatibility issues. Perhaps the grammar for W3C is out of date, given the age of the package?

shellscape · 2022-07-13T13:53:58Z

Additionally, this package cannot parse the EBNF grammar that railroad shows on its site:

import { Grammars } from 'ebnf';

const w3grammar = `Grammar ::= Production*
Production ::= NCName '::=' ( Choice | Link )
NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
Choice ::= SequenceOrDifference ( '|' SequenceOrDifference )*
SequenceOrDifference ::= (Item ( '-' Item | Item* ))?
Item ::= Primary ( '?' | '*' | '+' )*
Primary ::= NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
StringLiteral ::= '"' [^"]* '"' | "'" [^']* "'"
/* ws: explicit */
CharCode ::= '#x' [0-9a-fA-F]+
CharClass ::= '[' '^'? ( Char | CharCode | CharRange | CharCodeRange )+ ']'
Char ::= [http://www.w3.org/TR/xml#NT-Char]
CharRange ::= Char '-' ( Char - ']' )
CharCodeRange ::= CharCode '-' CharCode
Link ::= '[' URL ']'
URL ::= [^#x5D:/?#]+ '://' [^#x5D#]+ ('#' NCName)?
Whitespace ::= S | Comment
S ::= #x9 | #xA | #xD | #x20
Comment ::= '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'`;

const rules = Grammars.W3C.getRules(w3grammar);

This also fails with throw new Error('Could not parse ' + source); at the same line and position.

menduz · 2022-07-15T12:53:22Z

Hello, Can you try ending thr document/grammar string with a line ending char?

kjhughes · 2022-07-27T19:39:49Z

Your Char production looks hosed:

Char ::= [http://www.w3.org/TR/xml#NT-Char]

(A URL doesn't belong in a bracket expression.)

shellscape · 2022-07-27T20:05:45Z

@kjhughes that's straight from W3C

kjhughes · 2022-07-27T20:14:08Z

The RHS is clearly meant to be metadata / documentation, not an EBNF regex. The URL references this EBNF:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

jgeewax · 2022-10-05T06:40:28Z

@menduz : Just tried adding a newline at the end and that seemed to do the trick!

Might be worthwhile to not fail on no final newline character?

jimmcslim · 2022-10-28T05:36:58Z

I've tried adding a newline and still not having any success. Also been trying to parse https://github.com/messagetemplates/grammar/blob/master/message-template.ebnf without success.

Antony74 · 2023-06-26T14:33:03Z

Yes, adding a new line on the end of a string is a great tip! Additionally, even though the parser only give you a yes/no as to whether is parsed successfully or not, you can quickly narrow down the problem in the playground

https://menduz.github.io/ebnf-highlighter/

by starting with just one line at a leaf or your parse tree and building your ebnf file back up from there.

e.g. does this parse?

_LETTER-OR-DIGIT ::= [A-Za-z0-9]

No. How about this?

_LETTERORDIGIT ::= [A-Za-z0-9]

No. How about now?

LETTERORDIGIT ::= [A-Za-z0-9]

Yes. So does W3C EBNF not support an NCName entity starting with an underscore? Well, let's look at the node-ebnf source code, this is the top of W3CEBNF.ts

// https://www.w3.org/TR/REC-xml/#NT-Name
// http://www.bottlecaps.de/rr/ui

// Grammar	::=	Production*
// Production	::=	NCName '::=' Choice
// NCName	::=	[http://www.w3.org/TR/xml-names/#NT-NCName]
// Choice	::=	SequenceOrDifference ( '|' SequenceOrDifference )*
// SequenceOrDifference	::=	(Item ( '-' Item | Item* ))?
// Item	::=	Primary ( '?' | '*' | '+' )?
// Primary	::=	NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
// StringLiteral	::=	'"' [^"]* '"' | "'" [^']* "'"
// CharCode	::=	'#x' [0-9a-fA-F]+
// CharClass	::=	'[' '^'? ( RULE_Char | CharCode | CharRange | CharCodeRange )+ ']'
// RULE_Char	::=	[http://www.w3.org/TR/xml#NT-RULE_Char]
// CharRange	::=	RULE_Char '-' ( RULE_Char - ']' )
// CharCodeRange	::=	CharCode '-' CharCode
// RULE_WHITESPACE	::=	RULE_S | Comment
// RULE_S	::=	#x9 | #xA | #xD | #x20
// Comment	::=	'/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'

That tells us to look it up here: http://www.w3.org/TR/xml-names/#NT-NCName

click through to the Name: https://www.w3.org/TR/REC-xml/#NT-Name

click through to the NameStartChar: https://www.w3.org/TR/REC-xml/#NT-NameStartChar

Oh dear, it does look to me like you're supposed to be able to start an NCName entity with an underscore. So it does seem a shame that node-ebnf won't parse this. But hopefully what I've been able to demostrate about how I would isolate a fault and investigate the cause is helpful?

snoozbuster mentioned this issue Jun 6, 2023

Cannot be used in a browser with a CSP set that does not include the unsafe-eval permission #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to parse valid W3C EBNF #43

Unable to parse valid W3C EBNF #43

shellscape commented Jul 13, 2022

shellscape commented Jul 13, 2022

menduz commented Jul 15, 2022

kjhughes commented Jul 27, 2022 •

edited

Loading

shellscape commented Jul 27, 2022

kjhughes commented Jul 27, 2022

jgeewax commented Oct 5, 2022

jimmcslim commented Oct 28, 2022

Antony74 commented Jun 26, 2023

Unable to parse valid W3C EBNF #43

Unable to parse valid W3C EBNF #43

Comments

shellscape commented Jul 13, 2022

shellscape commented Jul 13, 2022

menduz commented Jul 15, 2022

kjhughes commented Jul 27, 2022 • edited Loading

shellscape commented Jul 27, 2022

kjhughes commented Jul 27, 2022

jgeewax commented Oct 5, 2022

jimmcslim commented Oct 28, 2022

Antony74 commented Jun 26, 2023

kjhughes commented Jul 27, 2022 •

edited

Loading