-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tips for grammars with case insensitive keywords? #17
Comments
Hi, thank you for sharing the use case. There are a couple of things we can do to optimize this kind of parsers: Look ahead of the first letter to reduce the full scanKeyword ::= 'And'
| &('B') ('ByVal' | 'ByRef)
| &('C') ('Case' | 'Call' | 'Class' | 'Const') https://github.com/lys-lang/lys/blob/master/src/grammar.ts#L117-L121 Use regular expressions for performanceIndividual constructions are a good mechanism to build grammars, to write nominal options it may be faster to write Keywords ::= [Aa][Nn][Dd]
|[Bb][Yy][Vv][Aa][Ll]
|[Bb][Yy][Rr][Ee][Ff]
|[Cc][Aa][Ss][Ee]
|[Cc][Aa][Ll][Ll]
|[Cc][Ll][Aa][Ss][Ss]
|[Cc][Oo][Nn][Ss][Tt]
|[Dd][Ee][Ff][Aa][Uu][Ll][Tt]
|[Dd][Ii][Mm]
|[Dd][Oo]
|[Ee][Aa][Cc][Hh]
... Usually languages like Basic use a tokenizer as a first step and then the parsing is performed over tokens instead of text. This library generates PEG grammars without a tokenization phase, which in some cases can be unfortunate. Regarding the performance of the case sensitiveness of this library, it can be added as an option of the construction like Keywords := 'and' | 'byval' {ignoreCase=true} There are a couple of options to base the code on those, it should be an easy thing to do since it is only a parameter |
Thank you for responding so fast! Believe it or not, we had just tried:
And it is extremely fast even more so than the when they keywords are explicitly defined.
|
Looking at your BNF grammar, it seems like it's already compiled. Is there a way to produce such a JavaScript array that we can load into the parser directly? That would allow us to 1) skip compiling the grammar each time we transpile, and 2) tweak the regexes so they are more performant? |
Yes, there is a way to export the parsed grammar. Every grammar was scaffolded using the previous one.
You can cache and fine tune your grammar by executing: ebnf Grammar.ebnf >> myFile.js https://github.com/lys-lang/node-ebnf/blob/master/src/bin.ts |
I just ran this utility on our grammar at: https://github.com/vpdb/vpx-js/blob/scripting/ebnf/lib/scripting/grammar/grammar.bnf It appears to not dump all the elements and ends early with
|
Yes please! |
Just to post an update on this, in #20 we have been seeing that regexp's are much faster than string literals. I think it would be beneficial to have an My only thought is from doing testing I can help author this option, based on feedback on #20. |
@menduz Now that #20 is in place (thanks, btw), we started looking at adding Given the rule:
Would the appropriate place be the parser or the grammar? In We looked into placing it the parser, by adding an TBH, I am getting lost where these regular expressions are actually being testing. |
@menduz Thanks for 1.7.4! Quick question. Does ebnf-highlighter get updated with the latest lib? That website has been absolutely invaluable to vpx-js. |
No, it wasn't updated since 2016 https://github.com/menduz/ebnf-highlighter |
We are using
node-ebnf
to transpile vbscript into javascript in an attempt to run Visual Pinball in a web browser.Currently we go through a two pass approach. The first pass format's the original script file to remove comments, standardize keyword case, and remove extra whitespace. The second pass converts the
node-ebnf
ast tree into a js ast tree.To standardize the keyword case, we end up taking a rule like:
Keywords ::= 'And' | 'ByVal' | 'ByRef' | 'Case' | 'Call' | 'Class' | 'Const' | 'Default' | 'Dim' | 'Do' | 'Each' | 'ElseIf' | 'Else' | 'Empty' | 'End' | 'Erase' | 'Error' | 'Eqv' | 'Exit' | 'Explicit' | 'False' | 'For' | 'Function' | 'Get' | 'GoTo' | 'If' | 'In' | 'Is' | 'Let' | 'Loop' | 'Mod' | 'New' | 'Next' | 'Nothing' | 'Not' | 'Null' | 'On' | 'Option' | 'Or' | 'Preserve' | 'Private' | 'Property' | 'Public' | 'ReDim' | 'Resume' | 'Select' | 'Set' | 'Step' | 'Sub' | 'Then' | 'To' | 'True' | 'Until' | 'While' | 'WEnd' | 'With' | 'Xor' Keyword ::= Keywords ![A-Za-z0-9_]
and converting it to:
Keywords ::= ('A'|'a')('N'|'n')('D'|'d')|('B'|'b')('Y'|'y')('V'|'v')('A'|'a')('L'|'l')|('B'|'b')('Y'|'y')('R'|'r')('E'|'e')('F'|'f')|('C'|'c')('A'|'a')('S'|'s')('E'|'e')|('C'|'c')('A'|'a')('L'|'l')('L'|'l')|('C'|'c')('L'|'l')('A'|'a')('S'|'s')('S'|'s')|('C'|'c')('O'|'o')('N'|'n')('S'|'s')('T'|'t')|('D'|'d')('E'|'e')('F'|'f')('A'|'a')('U'|'u')('L'|'l')('T'|'t')|('D'|'d')('I'|'i')('M'|'m')|('D'|'d')('O'|'o')|('E'|'e')('A'|'a')('C'|'c')('H'|'h')|('E'|'e')('L'|'l')('S'|'s')('E'|'e')('I'|'i')('F'|'f')|('E'|'e')('L'|'l')('S'|'s')('E'|'e')|('E'|'e')('M'|'m')('P'|'p')('T'|'t')('Y'|'y')|('E'|'e')('N'|'n')('D'|'d')|('E'|'e')('R'|'r')('A'|'a')('S'|'s')('E'|'e')|('E'|'e')('R'|'r')('R'|'r')('O'|'o')('R'|'r')|('E'|'e')('Q'|'q')('V'|'v')|('E'|'e')('X'|'x')('I'|'i')('T'|'t')|('E'|'e')('X'|'x')('P'|'p')('L'|'l')('I'|'i')('C'|'c')('I'|'i')('T'|'t')|('F'|'f')('A'|'a')('L'|'l')('S'|'s')('E'|'e')|('F'|'f')('O'|'o')('R'|'r')|('F'|'f')('U'|'u')('N'|'n')('C'|'c')('T'|'t')('I'|'i')('O'|'o')('N'|'n')|('G'|'g')('E'|'e')('T'|'t')|('G'|'g')('O'|'o')('T'|'t')('O'|'o')|('I'|'i')('F'|'f')|('I'|'i')('N'|'n')|('I'|'i')('S'|'s')|('L'|'l')('E'|'e')('T'|'t')|('L'|'l')('O'|'o')('O'|'o')('P'|'p')|('M'|'m')('O'|'o')('D'|'d')|('N'|'n')('E'|'e')('W'|'w')|('N'|'n')('E'|'e')('X'|'x')('T'|'t')|('N'|'n')('O'|'o')('T'|'t')('H'|'h')('I'|'i')('N'|'n')('G'|'g')|('N'|'n')('O'|'o')('T'|'t')|('N'|'n')('U'|'u')('L'|'l')('L'|'l')|('O'|'o')('N'|'n')|('O'|'o')('P'|'p')('T'|'t')('I'|'i')('O'|'o')('N'|'n')|('O'|'o')('R'|'r')|('P'|'p')('R'|'r')('E'|'e')('S'|'s')('E'|'e')('R'|'r')('V'|'v')('E'|'e')|('P'|'p')('R'|'r')('I'|'i')('V'|'v')('A'|'a')('T'|'t')('E'|'e')|('P'|'p')('R'|'r')('O'|'o')('P'|'p')('E'|'e')('R'|'r')('T'|'t')('Y'|'y')|('P'|'p')('U'|'u')('B'|'b')('L'|'l')('I'|'i')('C'|'c')|('R'|'r')('E'|'e')('D'|'d')('I'|'i')('M'|'m')|('R'|'r')('E'|'e')('S'|'s')('U'|'u')('M'|'m')('E'|'e')|('S'|'s')('E'|'e')('L'|'l')('E'|'e')('C'|'c')('T'|'t')|('S'|'s')('E'|'e')('T'|'t')|('S'|'s')('T'|'t')('E'|'e')('P'|'p')|('S'|'s')('U'|'u')('B'|'b')|('T'|'t')('H'|'h')('E'|'e')('N'|'n')|('T'|'t')('O'|'o')|('T'|'t')('R'|'r')('U'|'u')('E'|'e')|('U'|'u')('N'|'n')('T'|'t')('I'|'i')('L'|'l')|('W'|'w')('H'|'h')('I'|'i')('L'|'l')('E'|'e')|('W'|'w')('E'|'e')('N'|'n')('D'|'d')|('W'|'w')('I'|'i')('T'|'t')('H'|'h')|('X'|'x')('O'|'o')('R'|'r')
If we could assume the case of the script was correct, not splitting the keywords like this gives us a 40% gain in performance.
Is there any tips or suggestions you could offer on case insensitive grammars?
The text was updated successfully, but these errors were encountered: