Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse valid W3C EBNF #43

Open
shellscape opened this issue Jul 13, 2022 · 8 comments
Open

Unable to parse valid W3C EBNF #43

shellscape opened this issue Jul 13, 2022 · 8 comments

Comments

@shellscape
Copy link

The grammar located here https://github.com/transpect/css-tools/blob/master/ebnf-scheme/CSS3.ebnf is valid W3C EBNF, as verified on railroad https://bottlecaps.de/rr/ui. This package throws an error that it could not parse the grammar at /node_modules/ebnf/dist/Grammars/W3CEBNF.js:288:19.

So it looks like there are some compatibility issues. Perhaps the grammar for W3C is out of date, given the age of the package?

@shellscape
Copy link
Author

Additionally, this package cannot parse the EBNF grammar that railroad shows on its site:

import { Grammars } from 'ebnf';

const w3grammar = `Grammar ::= Production*
Production ::= NCName '::=' ( Choice | Link )
NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
Choice ::= SequenceOrDifference ( '|' SequenceOrDifference )*
SequenceOrDifference ::= (Item ( '-' Item | Item* ))?
Item ::= Primary ( '?' | '*' | '+' )*
Primary ::= NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
StringLiteral ::= '"' [^"]* '"' | "'" [^']* "'"
/* ws: explicit */
CharCode ::= '#x' [0-9a-fA-F]+
CharClass ::= '[' '^'? ( Char | CharCode | CharRange | CharCodeRange )+ ']'
Char ::= [http://www.w3.org/TR/xml#NT-Char]
CharRange ::= Char '-' ( Char - ']' )
CharCodeRange ::= CharCode '-' CharCode
Link ::= '[' URL ']'
URL ::= [^#x5D:/?#]+ '://' [^#x5D#]+ ('#' NCName)?
Whitespace ::= S | Comment
S ::= #x9 | #xA | #xD | #x20
Comment ::= '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'`;

const rules = Grammars.W3C.getRules(w3grammar);

This also fails with throw new Error('Could not parse ' + source); at the same line and position.

@menduz
Copy link
Member

menduz commented Jul 15, 2022

Hello, Can you try ending thr document/grammar string with a line ending char?

@kjhughes
Copy link

kjhughes commented Jul 27, 2022

Your Char production looks hosed:

Char ::= [http://www.w3.org/TR/xml#NT-Char]

(A URL doesn't belong in a bracket expression.)

@shellscape
Copy link
Author

@kjhughes that's straight from W3C

@kjhughes
Copy link

The RHS is clearly meant to be metadata / documentation, not an EBNF regex. The URL references this EBNF:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

@jgeewax
Copy link

jgeewax commented Oct 5, 2022

@menduz : Just tried adding a newline at the end and that seemed to do the trick!

Might be worthwhile to not fail on no final newline character?

@jimmcslim
Copy link

I've tried adding a newline and still not having any success. Also been trying to parse https://github.com/messagetemplates/grammar/blob/master/message-template.ebnf without success.

@Antony74
Copy link

Yes, adding a new line on the end of a string is a great tip! Additionally, even though the parser only give you a yes/no as to whether is parsed successfully or not, you can quickly narrow down the problem in the playground

https://menduz.github.io/ebnf-highlighter/

by starting with just one line at a leaf or your parse tree and building your ebnf file back up from there.

e.g. does this parse?

_LETTER-OR-DIGIT ::= [A-Za-z0-9]

No. How about this?

_LETTERORDIGIT ::= [A-Za-z0-9]

No. How about now?

LETTERORDIGIT ::= [A-Za-z0-9]

Yes. So does W3C EBNF not support an NCName entity starting with an underscore? Well, let's look at the node-ebnf source code, this is the top of W3CEBNF.ts

// https://www.w3.org/TR/REC-xml/#NT-Name
// http://www.bottlecaps.de/rr/ui

// Grammar	::=	Production*
// Production	::=	NCName '::=' Choice
// NCName	::=	[http://www.w3.org/TR/xml-names/#NT-NCName]
// Choice	::=	SequenceOrDifference ( '|' SequenceOrDifference )*
// SequenceOrDifference	::=	(Item ( '-' Item | Item* ))?
// Item	::=	Primary ( '?' | '*' | '+' )?
// Primary	::=	NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
// StringLiteral	::=	'"' [^"]* '"' | "'" [^']* "'"
// CharCode	::=	'#x' [0-9a-fA-F]+
// CharClass	::=	'[' '^'? ( RULE_Char | CharCode | CharRange | CharCodeRange )+ ']'
// RULE_Char	::=	[http://www.w3.org/TR/xml#NT-RULE_Char]
// CharRange	::=	RULE_Char '-' ( RULE_Char - ']' )
// CharCodeRange	::=	CharCode '-' CharCode
// RULE_WHITESPACE	::=	RULE_S | Comment
// RULE_S	::=	#x9 | #xA | #xD | #x20
// Comment	::=	'/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'

That tells us to look it up here: http://www.w3.org/TR/xml-names/#NT-NCName

click through to the Name: https://www.w3.org/TR/REC-xml/#NT-Name

click through to the NameStartChar: https://www.w3.org/TR/REC-xml/#NT-NameStartChar

Oh dear, it does look to me like you're supposed to be able to start an NCName entity with an underscore. So it does seem a shame that node-ebnf won't parse this. But hopefully what I've been able to demostrate about how I would isolate a fault and investigate the cause is helpful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants