After 2.5 years of silence in the original Jison repo the team behind AlaSQL have made this community fork to harvest some of the great pull requests. The code here will be released on npm as "jisons" (extra s) and just like the story about nodejs and iojs the hope is that the two repos can one day merge again.
Update:
Was happy to see how https://github.com/GerHobbelt/jison picked up from where the original Jison repo left - but after 1.5 years of silence a dependency of a dependency has started breaking every build using fixed versioning of dependencies. AlaSQL is therefore making this repo into a community fork of jison-gho instead - to make sure that Jison can still be used without breaking the builds.
Jison generates bottom-up parsers in JavaScript. Its API is similar to Bison's, hence the name. It supports many of Bison's major features, plus some of its own. If you are new to parser generators such as Bison, and Context-free Grammars in general, a good introduction is found in the Bison manual. If you already know Bison, Jison should be easy to pickup.
Briefly, Jison takes a JSON encoded grammar or Bison style grammar and outputs a JavaScript file capable of parsing the language described by that grammar. You can then use the generated script to parse inputs and accept, reject, or perform actions based on the input.
Install globally
npm install jisons -g
Clone the github repository for examples:
npx degit AlaSQL/jisons/examples jison-examples
cd jison-examples
Now you're ready to generate some parsers:
jison calculator.jison
This will generate calculator.js
in your current working directory. This file can be used to parse an input file, like so:
echo "2^32 / 1024" > testcalc
node calculator.js testcalc
This will print out 4194304
.
Full cli option list:
Usage: jison [file] [lexfile] [options]
file file containing a grammar
lexfile file containing a lexical grammar
Where the available options
are:
: -j, --json force jison to expect a grammar in JSON format [false]
: -o FILE, --outfile FILE Filepath and base module name of the generated parser; when terminated with a / (dir separator) it is treated as the destination directory where the generated output will be stored
: -t, --debug Debug mode [false]
: -I, --info Report some statistics about the generated parser [false]
: -m TYPE, --module-type TYPE The type of module to generate (commonjs, amd, es, js) [commonjs]
: -n NAME, --module-name NAME The name of the generated parser object, namespace supported
: -p TYPE, --parser-type TYPE The type of algorithm to use for the parser (lr0, slr, lalr, lr, ll) [lalr]
: -c, --compress-tables Output compressed parser tables in generated modules (0 = no compression, 1 = default compression, 2 = deep compression) [2]
: -T, --output-debug-tables Output extra parser tables (rules list + look-ahead analysis) in generated modules to assist debugging / diagnostics purposes [false]
: -X, --no-default-resolve Act another way when a conflict is found in the grammar [false]
: --default-action=[for-values,for-locations]
Generate a parser which does NOT include the default "$$ =
: --no-try-catch Generate a parser which does NOT try/catch exceptions (from the grammar action code or parseError error reporting calls. This produces a slightly faster parser at the cost of enhanced code safety. [false]
: -Q, --error-recovery-token-discard-count Set the number of lexed tokens that may be gobbled by an error recovery process before we cry wolf (default: 3) [3]
: -E, --export-all-tables Next to producing a grammar source file, also export the symbols, terminals, grammar and parse tables to separate JSON files for further use by other tools. The files' names will be derived from the outputFile name by appending a suffix. [false]
: -x, --main Include .main() entry point in generated commonjs module [false]
: -y NAME, --module-main NAME The main module function definition
: -V, --version print version and exit
You can generate parsers programmatically from JavaScript as well. Assuming Jison is in your CommonJS environment's load path:
// mygenerator.js
var Parser = require("jison").Parser;
// a grammar in JSON
var grammar = {
"lex": {
"rules": [
["\\s+", "/* skip whitespace */"],
["[a-f0-9]+", "return 'HEX';"]
]
},
"bnf": {
"hex_strings" :[ "hex_strings HEX",
"HEX" ]
}
};
// `grammar` can also be a string that uses jison's grammar format
var parser = new Parser(grammar);
// generate source, ready to be written to disk
var parserSource = parser.generate();
// you can also use the parser directly from memory
// returns true
parser.parse("adfe34bc e82a");
// throws lexical error
parser.parse("adfe34bc zxg");
For more information on creating grammars and using the generated parsers, read the documentation.
See CONTRIBUTING.md for contribution guidelines, how to run the tests, etc.
View them on the wiki, or add your own.
The JISON tool uses several modules:
- The
ebnf-parser
library parses BNF and EBNF grammars to a basic AST used by Jison to produce a parser engine for your grammar spec. - The
lex-parser
library parses%lex ... /lex
lexical grammars to a basic AST used by Jison to produce a parser engine for your grammar spec. - The
jison-lex
library/utility generates lexical analyzers which are included by Jison in your parser run-time engine to lex the input according to your%lex ... /lex
lexical grammar definition. - The
jison2json
utility converts a Jison spec file to JSON format file. - The
json2jison
utility converts a JSON format file to a Jison spec file.
Special thanks to Jarred Ligatti, Manuel E. Bermúdez
Here's a comprehensive list of features and fixes compared to the original:
-
Full Unicode support: the lexer can handle all Unicode regexes which are supported by the XRegExp library, with a few notes:
-
your own software does not need to include the XRegExp library: jison will produce standard JavaScript regex expressions for every lexer rule so that you can enjoy most Unicode features without the added burden of another library (XRegExp)
-
astral Unicode codepoints are not fully supported within regex character set expressions, unless you yourself include XRegExp and instruct the lexer to produce XRegExp regex expressions via the lexer option
%options xregexp
-
-
EBNF LR/LALR/SLR/LR0 grammars are correctly rewritten to BNF grammars, allowing your action code blocks to access all elements of the grammar rule at hand. See also the wiki section about EBNF.
-
Parser engine optimization: jison analyzes not just your grammar, but also your action code and will strip any feature you don't use (such as location tracking via
@element
references andyylloc
) from the parser kernel, which will benefit your parser run-time performance. The fastest parsers are obtained when you do not include error recovery (error
tokens in your grammar), nor any lexer location tracking: this can potentially result in run-time execution cost reductions of over 70% (hence your parser executes more than 3 times as fast)! -
generated grammar / lexer source files carry a full API and internals documentation in the code comments to help you to read and debug a grammar. For example, every grammar rule is printed above its action code so that stepping through the parser when debugging hard-to-find problems makes it quite obvious which rule the engine is currently 'reducing'.
-
Generated parsers and lexers are JavaScript
strict mode
compliant. -
you can specify a totally custom lexer in the
%lex ... /lex
section of your grammar definition file if you like, i.e. you can define and use a lexer which is not regex ruleset based / generated by jison lex! This is particularly handy when you want to achieve maximum performance / absolute minimum parse and lexing overhead for your high-performance demand grammars. -
lexer.reject()
et al: the lexer comes with extra APIs to help you write more sophisticated lexers based on the lex/jison mechanism. Thethis.reject()
call in your lexer rule action code will reject the current match and continue down the lexer rule set to find another match. Very handy when you do notflex mode
matching all the time, but want specific, local, control over when a lexer regex (a.k.a. lexer rule) actually is a correct match. -
You can now enter epsilon as a token in your grammar rules, so no more hacks like
/* epsilon */
comments for empty rules: you can type any of these:%epsilon
,\u0190
\u025B
\u03B5
\u03F5
(See also https://en.wikipedia.org/wiki/Epsilon#Glyph_variants)
-
%options easy_keyword_rules
: see also https://github.com/zaach/jison/wiki/Deviations-From-Flex-Bison#user-content-literal-tokens -
... more lexer features ...
-
%options ...
-
kernel ...
-
-
... more parser features ...
-
configurable error recovery search depth (default: 3 tokens)
-
augmented error reporting callbacks
-
dedicated parser and lexer
Error
-derived exception classes so you can useinstanceof
to help your generic error code discern what type of error has occurred and what info is available next to the text message itself. -
(are we faster even when we run with the same feature set as 'vanilla' zaach jison? Probably a little bit, but haven't measured this thoroughly.)
-
JSON (rather than JISON) grammar files support all JSON5 features, i.e. you can include comments, etc. in your JSON-file based grammar specs!
-
Copyright (c) 2009-2016 Zachary Carter
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.