From 745684ad5550556360943209593c375c37735aaa Mon Sep 17 00:00:00 2001 From: Ulya Trofimovich Date: Thu, 31 Oct 2024 15:24:53 +0000 Subject: [PATCH] Docs: add description of syntax files. --- bootstrap/doc/re2c.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2d.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2go.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2hs.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2java.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2js.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2ocaml.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2py.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2rust.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2v.1 | 575 +++++++++++++++++++++++++++ bootstrap/doc/re2zig.1 | 575 +++++++++++++++++++++++++++ doc/manpage.rst.in | 4 + doc/manual/basics/syntax_files.rst_ | 590 ++++++++++++++++++++++++++++ 13 files changed, 6919 insertions(+) create mode 100644 doc/manual/basics/syntax_files.rst_ diff --git a/bootstrap/doc/re2c.1 b/bootstrap/doc/re2c.1 index 6ff785965..822998a22 100644 --- a/bootstrap/doc/re2c.1 +++ b/bootstrap/doc/re2c.1 @@ -1639,6 +1639,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2c repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2c reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2c generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2c should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2c version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2c with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2c version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2d.1 b/bootstrap/doc/re2d.1 index 5ba811e23..5a3f596ea 100644 --- a/bootstrap/doc/re2d.1 +++ b/bootstrap/doc/re2d.1 @@ -1557,6 +1557,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2d repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2d reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2d generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2d should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2d version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2d with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2d version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2go.1 b/bootstrap/doc/re2go.1 index 48e03aac6..e4c604e66 100644 --- a/bootstrap/doc/re2go.1 +++ b/bootstrap/doc/re2go.1 @@ -1637,6 +1637,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2go repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2go reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2go generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2go should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2go version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2go with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2go version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2hs.1 b/bootstrap/doc/re2hs.1 index 7bfbd5fdb..bf762c346 100644 --- a/bootstrap/doc/re2hs.1 +++ b/bootstrap/doc/re2hs.1 @@ -1577,6 +1577,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2hs repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2hs reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2hs generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2hs should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2hs version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2hs with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2hs version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2java.1 b/bootstrap/doc/re2java.1 index 5884f77ac..e33c2af00 100644 --- a/bootstrap/doc/re2java.1 +++ b/bootstrap/doc/re2java.1 @@ -1599,6 +1599,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2java repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2java reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2java generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2java should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2java version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2java with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2java version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2js.1 b/bootstrap/doc/re2js.1 index 936fa7ecf..1599231e7 100644 --- a/bootstrap/doc/re2js.1 +++ b/bootstrap/doc/re2js.1 @@ -1591,6 +1591,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2js repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2js reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2js generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2js should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2js version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2js with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2js version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2ocaml.1 b/bootstrap/doc/re2ocaml.1 index ec914254c..0b3404e66 100644 --- a/bootstrap/doc/re2ocaml.1 +++ b/bootstrap/doc/re2ocaml.1 @@ -1564,6 +1564,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2ocaml repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2ocaml reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2ocaml generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2ocaml should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2ocaml version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2ocaml with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2ocaml version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2py.1 b/bootstrap/doc/re2py.1 index f41eb3bbc..a4a050635 100644 --- a/bootstrap/doc/re2py.1 +++ b/bootstrap/doc/re2py.1 @@ -1564,6 +1564,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2py repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2py reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2py generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2py should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2py version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2py with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2py version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2rust.1 b/bootstrap/doc/re2rust.1 index b58817144..d91f0d971 100644 --- a/bootstrap/doc/re2rust.1 +++ b/bootstrap/doc/re2rust.1 @@ -1608,6 +1608,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2rust repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2rust reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2rust generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2rust should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2rust version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2rust with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2rust version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2v.1 b/bootstrap/doc/re2v.1 index 3fbb44d0f..42e84e086 100644 --- a/bootstrap/doc/re2v.1 +++ b/bootstrap/doc/re2v.1 @@ -1556,6 +1556,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2v repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2v reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2v generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2v should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2v version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2v with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2v version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/bootstrap/doc/re2zig.1 b/bootstrap/doc/re2zig.1 index ea1946762..171370dee 100644 --- a/bootstrap/doc/re2zig.1 +++ b/bootstrap/doc/re2zig.1 @@ -1579,6 +1579,581 @@ raise a warning, and the user will be notified. If some configurations are unused and do not need a definition, they should be explicitly set to \fB\fP\&. .UNINDENT +.SS Syntax files +.sp +Support for different languages in re2c is based on the idea of \fIsyntax files\fP\&. +A syntax file is a configuration file that defines syntax of the target language +\-\- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for configurations: their purpose is to define syntax of the target +language, not to customise one particular lexer. +Syntax files contain configurations of four different kinds: +.sp +\fBFeature lists\fP +.sp +\fBLanguage configurations\fP +.sp +\fBInplace configurations\fP +.sp +\fBCode templates\fP +.INDENT 0.0 +.INDENT 3.5 +\fICode templates\fP define syntax of the target language. They are written in a +simple domain\-specific language with the following formal grammar: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code\-template :: + name \(aq=\(aq code\-exprs \(aq;\(aq + | CODE_TEMPLATE \(aq;\(aq + | \(aq\(aq \(aq;\(aq + +code\-exprs :: + + | code\-exprs code\-expr + +code\-expr :: + STRING + | VARIABLE + | optional + | list + +optional :: + \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq)\(aq + | \(aq(\(aq CONDITIONAL \(aq?\(aq code\-exprs \(aq:\(aq code\-exprs \(aq)\(aq + +list :: + \(aq[\(aq VARIABLE \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq + | \(aq[\(aq VARIABLE \(aq{\(aq NUMBER \(aq,\(aq NUMBER \(aq}\(aq \(aq:\(aq code\-exprs \(aq]\(aq +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +A code template is a sequence of string literals, variables, optional elements +and lists, or a reference to another code template, or a special value +\fB\fP\&. Variables are placeholders that are substituted during code +generation phase. List variables are special: when expanding list templates, +re2zig repeats expressions the right hand side of the column a few times, each +time replacing occurrences of the list variable with a value speific to this +repetition. Lists have optional bounds (negative values are counted from the +end, e.g. \fB\-1\fP means the last element). Conditional names start with a dot. +Both conditionals and variables may be either local (specific to the given +code template) or global (allowed in all code templates). When re2zig reads +syntax file, it checks that each code template uses only the variables and +conditionals that are allowed in it. +.sp +For example, the following code template defines if\-then\-else construct for a +C\-like language: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +code:if_then_else = + [branch{0}: topindent \(dqif \(dq cond \(dq {\(dq nl + indent [stmt: stmt] dedent] + [branch{1:\-1}: topindent \(dq} else\(dq (.cond ? \(dq if \(dq cond) \(dq {\(dq nl + indent [stmt: stmt] dedent] + topindent \(dq}\(dq nl; +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here \fBbranch\fP is a list variable. \fBbranch{0}\fP expands to the first branch +\-\- it has to be special, as there is no \fBelse\fP part. \fBbranch{1:\-1}\fP +expands to the remaining branches, if any. \fBtopindent\fP, \fBindent\fP, +\fBdedent\fP and \fBnl\fP are global variables (see below). \fB[stmt: stmt]\fP is a +nested list that expands to the list of statements in the current branch +(\fBstmt\fP is a list variable). Local conditional \fB\&.cond\fP is true if the +current branch has a condition. +This code template could produce the following code: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +if x { + // do something +} else if y { + // do something else +} else { + // don\(aqt do anything +} +.ft P +.fi +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global variables: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fBnl\fP +A newline. +.TP +.B \fBindent\fP +A variable that does not produce any code, but has a side\-effect of +increasing indentation level. +.TP +.B \fBdedent\fP +A variable that does not produce any code, but has a side\-effect of +decreasing indentation level. +.TP +.B \fBtopindent\fP +Indent string for the current statement (indent level is tracked and +updated by code generator. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Here\(aqs a list of all global conditionals: +.INDENT 0.0 +.INDENT 3.5 +.INDENT 0.0 +.TP +.B \fB\&.api.simple\fP +True if simple API is used (\fB\-\-api simple\fP or \fBre2c:api = simple\fP). +.TP +.B \fB\&.api.generic\fP +True if generic API is used (\fB\-\-api generic\fP or +\fBre2c:api = generic\fP). +.TP +.B \fB\&.api.record\fP +True if record API is used (\fB\-\-api record\fP or \fBre2c:api = record\fP). +.TP +.B \fB\&.api_style.functions\fP +True if function\-like API style is used +(\fBre2c:api\-style = functions\fP). +.TP +.B \fB\&.api_style.freeform\fP +True if free\-form API style is used (\fBre2c:api\-style = free\-form\fP). +.TP +.B \fB\&.case_ranges\fP +True if case ranges feature is enabled (\fB\-\-case\-ranges\fP or +\fBre2c:case\-ranges = 1\fP). +.TP +.B \fB\&.code_model.goto_label\fP +True if code model based on goto/label is used (\fB\-\-goto\-label\fP). +.TP +.B \fB\&.code_model.loop_switch\fP +True if code model based on loop/switch is used (\fB\-\-loop\-switch\fP). +.TP +.B \fB\&.code_model.recursive_functions\fP +True if code model based on recursive functions is used +(\fB\-\-recursive\-function\fP). +.TP +.B \fB\&.date\fP +True if the generated fingerprint should contain generation date. +.TP +.B \fB\&.loop_label\fP +True if re2zig generated loops must have a label (\fBre2c:label:yyloop\fP +is set to a nonempty string). +.TP +.B \fB\&.monadic\fP +True if the generated code should be monadic (\fBre2c:monadic = 1\fP). +This is only relevant for pure functional languages. +.TP +.B \fB\&.start_conditions\fP +True if start conditions are enabled (\fB\-\-start\-conditions\fP). +.TP +.B \fB\&.storable_state\fP +True if storable state is enabled (\fB\-\-storable\-state\fP). +.TP +.B \fB\&.unsafe\fP +True if re2zig should use \(dqunsafe\(dq blocks in order to generate faster +code (\fB\-\-unsafe\fP, \fBre2c:unsafe = 1\fP). This is only relevant for +languages that have \(dqunsafe\(dq feature. +.TP +.B \fB\&.version\fP +True if the generated fingerprint should contain re2zig version. +.UNINDENT +.UNINDENT +.UNINDENT +.sp +Below is a full list of code templates supported by re2zig with their local +variables and conditionals (a definition does not have to use all of them). +Any unused code templates should be set to \fB\fP\&. +.INDENT 0.0 +.TP +.B \fBcode:var_local\fP +.INDENT 7.0 +.INDENT 3.5 +Declaration or definition of a local variable. Supported variables: +\fBtype\fP (the type of the variable), \fBname\fP (its name) and \fBinit\fP +(initial value, if any). Conditionals: \fB\&.init\fP (true if there is an +initializer). +.UNINDENT +.UNINDENT +.INDENT 7.0 +.TP +.B \fBcode:var_global\fP +Same as \fBcode:var_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:const_local\fP +Definition of a local constant. Supported variables: \fBtype\fP (the type +of the constant), \fBname\fP (its name) and \fBinit\fP (initial value). +.TP +.B \fBcode:const_global\fP +Same as \fBcode:const_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_local\fP +Definition of a local array (table). Supported variables: \fBtype\fP (the +type of array elements), \fBname\fP (array name), \fBsize\fP (its size), +\fBrow\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are rows in the table) +and \fBelem\fP (a list variable that expands to all table elements in the +current row \-\- it\(aqs meant to be nested in the \fBrow\fP list). +.TP +.B \fBcode:array_global\fP +Same as \fBcode:array_local\fP, except that it\(aqs used in top\-level. +.TP +.B \fBcode:array_elem\fP +Reference to an element of an array (table). Supported variables: +\fBarray\fP (the name of the array) and \fBindex\fP (index of the element). +.TP +.B \fBcode:enum\fP +Definition of an enumeration (it may be defined using a special language +construct for enumerations, or simply as a few standalone constants). +Supported variables are \fBtype\fP (user\-defined enumeration type or type +of the constants), \fBelem\fP (list variable that expands to the name of +each member) and \fBinit\fP (initializer for each member). Conditionals: +\fB\&.init\fP (true if there is an initializer). +.TP +.B \fBcode:enum_elem\fP +Enumeration element (a member of a user\-defined enumeration type or a +name of a constant, depending on how \fBcode:enum\fP is defined). +Supported variables are \fBname\fP (the name of the element) and \fBtype\fP +(its type). +.TP +.B \fBcode:assign\fP +Assignment statement. Supported variables are \fBlhs\fP (left hand side) +and \fBrhs\fP (right hand side). +.TP +.B \fBcode:type_int\fP +Signed integer type. +.TP +.B \fBcode:type_uint\fP +Unsigned integer type. +.TP +.B \fBcode:type_yybm\fP +Type of elements in the \fByybm\fP table. +.TP +.B \fBcode:type_yytarget\fP +Type of elements in the \fByytarget\fP table. +.TP +.B \fBcode:cmp_eq\fP +Operator \(dqequals\(dq. +.TP +.B \fBcode:cmp_ne\fP +Operator \(dqnot equals\(dq. +.TP +.B \fBcode:cmp_lt\fP +Operator \(dqless than\(dq. +.TP +.B \fBcode:cmp_gt\fP +Operator \(dqgreater than\(dq +.TP +.B \fBcode:cmp_le\fP +Operator \(dqless or equal\(dq +.TP +.B \fBcode:cmp_ge\fP +Operator \(dqgreater or equal\(dq +.TP +.B \fBcode:if_then_else\fP +If\-then\-else statement with one or more branches. Supported variables: +\fBbranch\fP (a list variable that does not itself produce any code, but +expands list expression as many times as there are branches), \fBcond\fP +(condition of the current branch) and \fBstmt\fP (a list variable that +expands to all statements in the current branch). Conditionals: +\fB\&.cond\fP (true if the current branch has a condition), \fB\&.many\fP (true +if there\(aqs more than one branch). +.TP +.B \fBcode:if_then_else_oneline\fP +A specialization of \fBcode:if_then_else\fP for the case when all branches +have one\-line statements. If this is \fB\fP, +\fBcode:if_then_else\fP is used instead. +.TP +.B \fBcode:switch\fP +A switch statement with one or more cases. Supported variables: \fBexpr\fP +(the switched\-on expression) and \fBcase\fP (a list variable that expands +to all cases\-groups with their code blocks). +.TP +.B \fBcode:switch_cases\fP +A group of switch cases that maps to a single code block. Supported +variables are \fBcase\fP (a list variable that expands to all cases in +this group) and \fBstmt\fP (a list variable that expands to all statements +in the code block. +.TP +.B \fBcode:switch_cases_oneline\fP +A specialization of \fBcode:switch_cases\fP for the case when the code +block consists of a single one\-line statement. If this is +\fB\fP, \fBcode:switch_cases\fP is used instead. +.TP +.B \fBcode:switch_case_range\fP +A single switch case that covers a range of values (possibly consisting +of a single value). Supported variable: \fBval\fP (a list variable that +expands to all values in the range). Supported conditionals: \fB\&.many\fP +(true if there\(aqs more than one value in the range) and +\fB\&.char_literals\fP (true if this is a switch on character literals \-\- +some languages provide special syntax for this case). +.TP +.B \fBcode:switch_case_default\fP +Default switch case. +.TP +.B \fBcode:loop\fP +A loop that runs forever (unless interrupted from the loop body). +Supported variables: \fBlabel\fP (loop label), \fBstmt\fP (a list variable +tht expands to all statements in the loop body). +.TP +.B \fBcode:continue\fP +Continue statement. Supported variables: \fBlabel\fP (label from which to +continue execution). +.TP +.B \fBcode:goto\fP +Goto statement. Supported variables: \fBlabel\fP (label of the jump +target). +.TP +.B \fBcode:fndecl\fP +Function declaration. Supported variables: \fBname\fP (function name), +\fBtype\fP (return type), \fBarg\fP (a list variable that does not itself +produce code, but expands list expression as many times as there are +function arguments), \fBargname\fP (name of the current argument), +\fBargtype\fP (type of the current argument). Conditional: \fB\&.type\fP (true +if this is a non\-void funtion). +.TP +.B \fBcode:fndef\fP +Like \fBcode:fndecl\fP, but used for function definitions, so it has one +additional list variable \fBstmt\fP that expands to all statements in the +function body. +.TP +.B \fBcode:fncall\fP +Function call statement. Supported variables: \fBname\fP (function name), +\fBretval\fP (l\-value where the return value is stored, if any) and +\fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if return value needs to be saved). +.TP +.B \fBcode:tailcall\fP +Tail call statement. Supported variables: \fBname\fP (function name), +and \fBarg\fP (a list variable that expands to all function arguments). +Conditionals: \fB\&.args\fP (true if the function has arguments) and +\fB\&.retval\fP (true if this is a non\-void function). +.TP +.B \fBcode:recursive_functions\fP +Program body with \fB\-\-recursive\-functions\fP code model. Supported +variables: \fBfn\fP (a list variable that does not itself produce any +code, but expands list expression as many times as there are functions), +\fBfndecl\fP (declaration of the current function) and \fBfndef\fP +(definition of the current function). +.TP +.B \fBcode:fingerprint\fP +The fingerprint at the top of the generated output file. Supported +variables: \fBver\fP (re2zig version that was used to generate this) and +\fBdate\fP (generation date). +.TP +.B \fBcode:line_info\fP +The format of line directives (if this is set to \fB\fP, no +directives are generated). Supported variables: \fBline\fP (line number) +and \fBfile\fP (filename). +.TP +.B \fBcode:abort\fP +A statement that aborts program execution. +.TP +.B \fBcode:yydebug\fP +\fBYYDEBUG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYDEBUG\fP, \fByyrecord\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBstate\fP (DFA state number). +.TP +.B \fBcode:yypeek\fP +\fBYYPEEK\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYPEEK\fP, \fBYYCTYPE\fP, \fBYYINPUT\fP, \fBYYCURSOR\fP, +\fByyrecord\fP, \fByych\fP (map to the corresponding \fBre2c:\fP +configurations). Conditionals: \fB\&.cast\fP (true if +\fBre2c:yych:conversion\fP is set to non\-zero). +.TP +.B \fBcode:yyskip\fP +\fBYYSKIP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSKIP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackup\fP +\fBYYBACKUP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUP\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yybackupctx\fP +\fBYYBACKUPCTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYBACKUPCTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyskip_yypeek\fP +Combined \fBcode:yyskip\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yypeek_yyskip\fP +Combined \fBcode:yypeek\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yypeek\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyskip_yybackup\fP +Combined \fBcode:yyskip\fP and \fBcode:yybackup\fP statement (defaults to +\fBcode:yyskip\fP followed by \fBcode:yybackup\fP). +.TP +.B \fBcode:yybackup_yyskip\fP +Combined \fBcode:yybackup\fP and \fBcode:yyskip\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yybackup_yypeek\fP +Combined \fBcode:yybackup\fP and \fBcode:yypeek\fP statement (defaults to +\fBcode:yybackup\fP followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yyskip_yybackup_yypeek\fP +Combined \fBcode:yyskip\fP, \fBcode:yybackup\fP and \fBcode:yypeek\fP +statement (defaults to\(ga\(gacode:yyskip\(ga\(ga followed by \fBcode:yybackup\fP +followed by \fBcode:yypeek\fP). +.TP +.B \fBcode:yybackup_yypeek_yyskip\fP +Combined \fBcode:yybackup\fP, \fBcode:yypeek\fP and \fBcode:yyskip\fP +statement (defaults to\(ga\(gacode:yybackup\(ga\(ga followed by \fBcode:yypeek\fP +followed by \fBcode:yyskip\fP). +.TP +.B \fBcode:yyrestore\fP +\fBYYRESTORE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORE\fP, \fBYYCURSOR\fP, \fBYYMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestorectx\fP +\fBYYRESTORECTX\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORECTX\fP, \fBYYCURSOR\fP, \fBYYCTXMARKER\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations). +.TP +.B \fBcode:yyrestoretag\fP +\fBYYRESTORETAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYRESTORETAG\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map +to the corresponding \fBre2c:\fP configurations), \fBtag\fP (the name of tag +variable used to restore position). +.TP +.B \fBcode:yyshift\fP +\fBYYSHIFT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFT\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (the number of code +units to shift the current position). +.TP +.B \fBcode:yyshiftstag\fP +\fBYYSHIFTSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTSTAG\fP, \fByyrecord\fP, \fBnegative\fP (map +to the corresponding \fBre2c:\fP configurations), \fItag\(ga\fP (tag variable +which needs to be shifted), \fBoffset\fP (the number of code units to +shift). Conditionals: \fB\&.nested\fP (true if this is a nested tag \-\- in +this case its value may equal to \fBre2c:tags:negative\fP, which should +not be shifted). +.TP +.B \fBcode:yyshiftmtag\fP +\fBYYSHIFTMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSHIFTMTAG\fP (maps to the corresponding +\fBre2c:\fP configuration), \fItag\(ga\fP (tag variable which needs to be +shifted), \fBoffset\fP (the number of code units to shift). +.TP +.B \fBcode:yystagp\fP +\fBYYSTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGP\fP, \fBYYCURSOR\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagp\fP +\fBYYMTAGP\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGP\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yystagn\fP +\fBYYSTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSTAGN\fP, \fBnegative\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBtag\fP (tag variable that +should be updated). +.TP +.B \fBcode:yymtagn\fP +\fBYYMTAGN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYMTAGN\fP (maps to the corresponding \fBre2c:\fP +configuration), \fBtag\fP (tag variable that should be updated). +.TP +.B \fBcode:yycopystag\fP +\fBYYCOPYSTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYSTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yycopymtag\fP +\fBYYCOPYMTAG\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYCOPYMTAG\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBlhs\fP, \fBrhs\fP (left and +right hand side tag variables of the copy operation). +.TP +.B \fBcode:yygetaccept\fP +\fBYYGETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration). +.TP +.B \fBcode:yysetaccept\fP +\fBYYSETACCEPT\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETACCEPT\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yyaccept\fP configuration) and \fBval\fP (numeric value of the +accepted rule). +.TP +.B \fBcode:yygetcond\fP +\fBYYGETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration). +.TP +.B \fBcode:yysetcond\fP +\fBYYSETCOND\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETCOND\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yycond\fP configuration) and \fBval\fP (numeric condition +identifier). +.TP +.B \fBcode:yygetstate\fP +\fBYYGETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYGETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration). +.TP +.B \fBcode:yysetstate\fP +\fBYYSETSTATE\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYSETSTATE\fP, \fByyrecord\fP (map to the +corresponding \fBre2c:\fP configurations), \fBvar\fP (maps to +\fBre2c:yystate\fP configuration) and \fBval\fP (state number). +.TP +.B \fBcode:yylessthan\fP +\fBYYLESSTHAN\fP statement, possibly specialized for different APIs. +Supported variables: \fBYYLESSTHAN\fP, \fBYYCURSOR\fP, \fBYYLIMIT\fP, +\fByyrecord\fP (map to the corresponding \fBre2c:\fP configurations), +\fBneed\fP (the number of code units to check against). Conditional: +\fB\&.many\fP (true if the \fBneed\fP is more than one). +.TP +.B \fBcode:yybm_filter\fP +Condition that is used to filter out \fByych\fP values that are not +covered by the \fByybm\fP table (used with \fB\-\-bitmaps\fP option). +Supported variable: \fByych\fP (maps to \fBre2c:yych\fP configuration). +.TP +.B \fBcode:yybm_match\fP +The format of \fByybm\fP table check (generated with \fB\-\-bitmaps\fP +option). Supported variables: \fByybm\fP, \fByych\fP (map to the +corresponding \fBre2c:\fP configurations), \fBoffset\fP (offset in the +\fByybm\fP table that needs to be added to \fByych\fP) and \fBmask\fP (bit +mask that should be applied to the table entry to retrieve the boolean +value that needs to be checked) +.UNINDENT +.UNINDENT +.UNINDENT +.UNINDENT .SH HANDLING THE END OF INPUT .sp One of the main problems for the lexer is to know when to stop. diff --git a/doc/manpage.rst.in b/doc/manpage.rst.in index a9d01bd17..e1b19e1af 100644 --- a/doc/manpage.rst.in +++ b/doc/manpage.rst.in @@ -82,6 +82,10 @@ Warnings can be invividually enabled, disabled and turned into an error. .. include:: @top_srcdir@/doc/manual/basics/warnings/warnings_general.rst_ .. include:: @top_srcdir@/doc/manual/basics/warnings/warnings_list.rst_ +Syntax files +------------ +.. include:: @top_srcdir@/doc/manual/basics/syntax_files.rst_ + Handling the end of input ========================= .. include:: @top_srcdir@/doc/manual/eof/eof.rst_ diff --git a/doc/manual/basics/syntax_files.rst_ b/doc/manual/basics/syntax_files.rst_ new file mode 100644 index 000000000..ae7e3c8f7 --- /dev/null +++ b/doc/manual/basics/syntax_files.rst_ @@ -0,0 +1,590 @@ +Support for different languages in re2c is based on the idea of *syntax files*. +A syntax file is a configuration file that defines syntax of the target language +-- not the whole language, but a small part of it that is used by the generated +code. Syntax files make re2c very flexible, but they should not be used as a +replacement for ``re2c:`` configurations: their purpose is to define syntax of +the target language, not to customise one particular lexer. All supported +languages have default syntax files that are part of the distribution (see +``include/syntax`` subdirectory); they are also embedded in the |re2c| binary. +Users may provide a custom syntax file that overrides a few configurations for +one of supported languages, or they may choose to redefine all configurations +(in that case ``--lang none`` option should be used). +Syntax files contain configurations of four different kinds: feature lists, +language configurations, inplace configurations and code templates. + +Feature lists +~~~~~~~~~~~~~ + + Feature lists define various features supported by a given backend, so + that |re2c| may give a clear error if the user tries to enable an unsupported + feature. + + ``supported_apis`` + A list of supported APIs with possible elements ``simple``, ``record``, + ``generic``. + + ``supported_api_styles`` + A list of supported API styles with possible elements ``functions``, + ``free-form``. + + ``supported_code_models`` + A list of supported code models with possible elements ``goto-label``, + ``loop-switch``, ``recursive-functions``. + + ``supported_targets`` + A list of supported codegen targets with possible elements ``code``, + ``dot``, ``skeleton``. + + ``supported_features`` + A list of supported features with possible elements ``nested-ifs``, + ``bitmaps``, ``computed-gotos``, ``case-ranges``, ``monadic``, ``unsafe``, + ``tags``, ``captures``, ``captvars``. + +Language configurations +~~~~~~~~~~~~~~~~~~~~~~~ + + A few boolean configurations describe features of the target language that + affect |re2c| parser and code generator. + + ``semicolons`` + Non-zero if the language uses semicolons after statements. + + ``backtick_quoted_strings`` + Non-zero if the language has backtick-quoted strings. + + ``single_quoted_strings`` + Non-zero if the language has single-quoted strings. + + ``indentation_sensitive`` + Non-zero if the language is indentation sensitive. + + ``wrap_blocks_in_braces`` + Non-zero if compound statements must be wrapped in curly braces. + +Inplace configurations +~~~~~~~~~~~~~~~~~~~~~~ + + Syntax files define initial values of all ``re2c:`` configurations, as they + may differ for different languages. See configurations section for a full list + of all inplace configurations and their meaning. + +Code templates +~~~~~~~~~~~~~~ + + *Code templates* define syntax of the target language. They are written in a + simple domain-specific language with the following formal grammar: + + .. code-block:: bnf + + code-template :: + name '=' code-exprs ';' + | CODE_TEMPLATE ';' + | '' ';' + + code-exprs :: + + | code-exprs code-expr + + code-expr :: + STRING + | VARIABLE + | optional + | list + + optional :: + '(' CONDITIONAL '?' code-exprs ')' + | '(' CONDITIONAL '?' code-exprs ':' code-exprs ')' + + list :: + '[' VARIABLE ':' code-exprs ']' + | '[' VARIABLE '{' NUMBER '}' ':' code-exprs ']' + | '[' VARIABLE '{' NUMBER ',' NUMBER '}' ':' code-exprs ']' + + A code template is a sequence of string literals, variables, optional elements + and lists, or a reference to another code template, or a special value + ````. Variables are placeholders that are substituted during code + generation phase. List variables are special: when expanding list templates, + |re2c| repeats expressions the right hand side of the column a few times, each + time replacing occurrences of the list variable with a value speific to this + repetition. Lists have optional bounds (negative values are counted from the + end, e.g. ``-1`` means the last element). Conditional names start with a dot. + Both conditionals and variables may be either local (specific to the given + code template) or global (allowed in all code templates). When |re2c| reads + syntax file, it checks that each code template uses only the variables and + conditionals that are allowed in it. + + For example, the following code template defines if-then-else construct for a + C-like language: + + .. code-block:: cpp + + code:if_then_else = + [branch{0}: topindent "if " cond " {" nl + indent [stmt: stmt] dedent] + [branch{1:-1}: topindent "} else" (.cond ? " if " cond) " {" nl + indent [stmt: stmt] dedent] + topindent "}" nl; + + Here ``branch`` is a list variable: ``branch{0}`` expands to the first branch + (which is special, as there is no ``else`` part), ``branch{1:-1}`` expands to + all remaining branches (if any). ``stmt`` is also a list variable: + ``[stmt: stmt]`` is a nested list that expands to a list of statements in the + body of the current branch. ``topindent``, ``indent``, ``dedent`` and ``nl`` + are global variables, and ``.cond`` is a local conditional (their meaning is + described below). This code template could produce the following code: + + .. code-block:: cpp + + if x { + // do something + } else if y { + // do something else + } else { + // don't do anything + } + +Here's a list of all code templates supported by |re2c| with their local +variables and conditionals. Note that a particular definition may, but does not +have to use local variables and conditionals. +Any unused code templates should be set to ````. + + ``code:var_local`` + Declaration or definition of a local variable. Supported variables: + ``type`` (the type of the variable), ``name`` (its name) and ``init`` + (initial value, if any). Conditionals: ``.init`` (true if there is an + initializer). + + ``code:var_global`` + Same as ``code:var_local``, except that it's used in top-level. + + ``code:const_local`` + Definition of a local constant. Supported variables: ``type`` (the type + of the constant), ``name`` (its name) and ``init`` (initial value). + + ``code:const_global`` + Same as ``code:const_local``, except that it's used in top-level. + + ``code:array_local`` + Definition of a local array (table). Supported variables: ``type`` (the + type of array elements), ``name`` (array name), ``size`` (its size), + ``row`` (a list variable that does not itself produce any code, but + expands list expression as many times as there are rows in the table) + and ``elem`` (a list variable that expands to all table elements in the + current row -- it's meant to be nested in the ``row`` list). + + ``code:array_global`` + Same as ``code:array_local``, except that it's used in top-level. + + ``code:array_elem`` + Reference to an element of an array (table). Supported variables: + ``array`` (the name of the array) and ``index`` (index of the element). + + ``code:enum`` + Definition of an enumeration (it may be defined using a special language + construct for enumerations, or simply as a few standalone constants). + Supported variables are ``type`` (user-defined enumeration type or type + of the constants), ``elem`` (list variable that expands to the name of + each member) and ``init`` (initializer for each member). Conditionals: + ``.init`` (true if there is an initializer). + + ``code:enum_elem`` + Enumeration element (a member of a user-defined enumeration type or a + name of a constant, depending on how ``code:enum`` is defined). + Supported variables are ``name`` (the name of the element) and ``type`` + (its type). + + ``code:assign`` + Assignment statement. Supported variables are ``lhs`` (left hand side) + and ``rhs`` (right hand side). + + ``code:type_int`` + Signed integer type. + + ``code:type_uint`` + Unsigned integer type. + + ``code:type_yybm`` + Type of elements in the ``yybm`` table. + + ``code:type_yytarget`` + Type of elements in the ``yytarget`` table. + + ``code:cmp_eq`` + Operator "equals". + + ``code:cmp_ne`` + Operator "not equals". + + ``code:cmp_lt`` + Operator "less than". + + ``code:cmp_gt`` + Operator "greater than" + + ``code:cmp_le`` + Operator "less or equal" + + ``code:cmp_ge`` + Operator "greater or equal" + + ``code:if_then_else`` + If-then-else statement with one or more branches. Supported variables: + ``branch`` (a list variable that does not itself produce any code, but + expands list expression as many times as there are branches), ``cond`` + (condition of the current branch) and ``stmt`` (a list variable that + expands to all statements in the current branch). Conditionals: + ``.cond`` (true if the current branch has a condition), ``.many`` (true + if there's more than one branch). + + ``code:if_then_else_oneline`` + A specialization of ``code:if_then_else`` for the case when all branches + have one-line statements. If this is ````, + ``code:if_then_else`` is used instead. + + ``code:switch`` + A switch statement with one or more cases. Supported variables: ``expr`` + (the switched-on expression) and ``case`` (a list variable that expands + to all cases-groups with their code blocks). + + ``code:switch_cases`` + A group of switch cases that maps to a single code block. Supported + variables are ``case`` (a list variable that expands to all cases in + this group) and ``stmt`` (a list variable that expands to all statements + in the code block. + + ``code:switch_cases_oneline`` + A specialization of ``code:switch_cases`` for the case when the code + block consists of a single one-line statement. If this is + ````, ``code:switch_cases`` is used instead. + + ``code:switch_case_range`` + A single switch case that covers a range of values (possibly consisting + of a single value). Supported variable: ``val`` (a list variable that + expands to all values in the range). Supported conditionals: ``.many`` + (true if there's more than one value in the range) and + ``.char_literals`` (true if this is a switch on character literals -- + some languages provide special syntax for this case). + + ``code:switch_case_default`` + Default switch case. + + ``code:loop`` + A loop that runs forever (unless interrupted from the loop body). + Supported variables: ``label`` (loop label), ``stmt`` (a list variable + tht expands to all statements in the loop body). + + ``code:continue`` + Continue statement. Supported variables: ``label`` (label from which to + continue execution). + + ``code:goto`` + Goto statement. Supported variables: ``label`` (label of the jump + target). + + ``code:fndecl`` + Function declaration. Supported variables: ``name`` (function name), + ``type`` (return type), ``arg`` (a list variable that does not itself + produce code, but expands list expression as many times as there are + function arguments), ``argname`` (name of the current argument), + ``argtype`` (type of the current argument). Conditional: ``.type`` (true + if this is a non-void funtion). + + ``code:fndef`` + Like ``code:fndecl``, but used for function definitions, so it has one + additional list variable ``stmt`` that expands to all statements in the + function body. + + ``code:fncall`` + Function call statement. Supported variables: ``name`` (function name), + ``retval`` (l-value where the return value is stored, if any) and + ``arg`` (a list variable that expands to all function arguments). + Conditionals: ``.args`` (true if the function has arguments) and + ``.retval`` (true if return value needs to be saved). + + ``code:tailcall`` + Tail call statement. Supported variables: ``name`` (function name), + and ``arg`` (a list variable that expands to all function arguments). + Conditionals: ``.args`` (true if the function has arguments) and + ``.retval`` (true if this is a non-void function). + + ``code:recursive_functions`` + Program body with ``--recursive-functions`` code model. Supported + variables: ``fn`` (a list variable that does not itself produce any + code, but expands list expression as many times as there are functions), + ``fndecl`` (declaration of the current function) and ``fndef`` + (definition of the current function). + + ``code:fingerprint`` + The fingerprint at the top of the generated output file. Supported + variables: ``ver`` (|re2c| version that was used to generate this) and + ``date`` (generation date). + + ``code:line_info`` + The format of line directives (if this is set to ````, no + directives are generated). Supported variables: ``line`` (line number) + and ``file`` (filename). + + ``code:abort`` + A statement that aborts program execution. + + ``code:yydebug`` + ``YYDEBUG`` statement, possibly specialized for different APIs. + Supported variables: ``YYDEBUG``, ``yyrecord``, ``yych`` (map to the + corresponding ``re2c:`` configurations), ``state`` (DFA state number). + + ``code:yypeek`` + ``YYPEEK`` statement, possibly specialized for different APIs. + Supported variables: ``YYPEEK``, ``YYCTYPE``, ``YYINPUT``, ``YYCURSOR``, + ``yyrecord``, ``yych`` (map to the corresponding ``re2c:`` + configurations). Conditionals: ``.cast`` (true if + ``re2c:yych:conversion`` is set to non-zero). + + ``code:yyskip`` + ``YYSKIP`` statement, possibly specialized for different APIs. + Supported variables: ``YYSKIP``, ``YYCURSOR``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations). + + ``code:yybackup`` + ``YYBACKUP`` statement, possibly specialized for different APIs. + Supported variables: ``YYBACKUP``, ``YYCURSOR``, ``YYMARKER``, + ``yyrecord`` (map to the corresponding ``re2c:`` configurations). + + ``code:yybackupctx`` + ``YYBACKUPCTX`` statement, possibly specialized for different APIs. + Supported variables: ``YYBACKUPCTX``, ``YYCURSOR``, ``YYCTXMARKER``, + ``yyrecord`` (map to the corresponding ``re2c:`` configurations). + + ``code:yyskip_yypeek`` + Combined ``code:yyskip`` and ``code:yypeek`` statement (defaults to + ``code:yyskip`` followed by ``code:yypeek``). + + ``code:yypeek_yyskip`` + Combined ``code:yypeek`` and ``code:yyskip`` statement (defaults to + ``code:yypeek`` followed by ``code:yyskip``). + + ``code:yyskip_yybackup`` + Combined ``code:yyskip`` and ``code:yybackup`` statement (defaults to + ``code:yyskip`` followed by ``code:yybackup``). + + ``code:yybackup_yyskip`` + Combined ``code:yybackup`` and ``code:yyskip`` statement (defaults to + ``code:yybackup`` followed by ``code:yyskip``). + + ``code:yybackup_yypeek`` + Combined ``code:yybackup`` and ``code:yypeek`` statement (defaults to + ``code:yybackup`` followed by ``code:yypeek``). + + ``code:yyskip_yybackup_yypeek`` + Combined ``code:yyskip``, ``code:yybackup`` and ``code:yypeek`` + statement (defaults to``code:yyskip`` followed by ``code:yybackup`` + followed by ``code:yypeek``). + + ``code:yybackup_yypeek_yyskip`` + Combined ``code:yybackup``, ``code:yypeek`` and ``code:yyskip`` + statement (defaults to``code:yybackup`` followed by ``code:yypeek`` + followed by ``code:yyskip``). + + ``code:yyrestore`` + ``YYRESTORE`` statement, possibly specialized for different APIs. + Supported variables: ``YYRESTORE``, ``YYCURSOR``, ``YYMARKER``, + ``yyrecord`` (map to the corresponding ``re2c:`` configurations). + + ``code:yyrestorectx`` + ``YYRESTORECTX`` statement, possibly specialized for different APIs. + Supported variables: ``YYRESTORECTX``, ``YYCURSOR``, ``YYCTXMARKER``, + ``yyrecord`` (map to the corresponding ``re2c:`` configurations). + + ``code:yyrestoretag`` + ``YYRESTORETAG`` statement, possibly specialized for different APIs. + Supported variables: ``YYRESTORETAG``, ``YYCURSOR``, ``yyrecord`` (map + to the corresponding ``re2c:`` configurations), ``tag`` (the name of tag + variable used to restore position). + + ``code:yyshift`` + ``YYSHIFT`` statement, possibly specialized for different APIs. + Supported variables: ``YYSHIFT``, ``YYCURSOR``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``offset`` (the number of code + units to shift the current position). + + ``code:yyshiftstag`` + ``YYSHIFTSTAG`` statement, possibly specialized for different APIs. + Supported variables: ``YYSHIFTSTAG``, ``yyrecord``, ``negative`` (map + to the corresponding ``re2c:`` configurations), `tag`` (tag variable + which needs to be shifted), ``offset`` (the number of code units to + shift). Conditionals: ``.nested`` (true if this is a nested tag -- in + this case its value may equal to ``re2c:tags:negative``, which should + not be shifted). + + ``code:yyshiftmtag`` + ``YYSHIFTMTAG`` statement, possibly specialized for different APIs. + Supported variables: ``YYSHIFTMTAG`` (maps to the corresponding + ``re2c:`` configuration), `tag`` (tag variable which needs to be + shifted), ``offset`` (the number of code units to shift). + + ``code:yystagp`` + ``YYSTAGP`` statement, possibly specialized for different APIs. + Supported variables: ``YYSTAGP``, ``YYCURSOR``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``tag`` (tag variable that + should be updated). + + ``code:yymtagp`` + ``YYMTAGP`` statement, possibly specialized for different APIs. + Supported variables: ``YYMTAGP`` (maps to the corresponding ``re2c:`` + configuration), ``tag`` (tag variable that should be updated). + + ``code:yystagn`` + ``YYSTAGN`` statement, possibly specialized for different APIs. + Supported variables: ``YYSTAGN``, ``negative``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``tag`` (tag variable that + should be updated). + + ``code:yymtagn`` + ``YYMTAGN`` statement, possibly specialized for different APIs. + Supported variables: ``YYMTAGN`` (maps to the corresponding ``re2c:`` + configuration), ``tag`` (tag variable that should be updated). + + ``code:yycopystag`` + ``YYCOPYSTAG`` statement, possibly specialized for different APIs. + Supported variables: ``YYCOPYSTAG``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``lhs``, ``rhs`` (left and + right hand side tag variables of the copy operation). + + ``code:yycopymtag`` + ``YYCOPYMTAG`` statement, possibly specialized for different APIs. + Supported variables: ``YYCOPYMTAG``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``lhs``, ``rhs`` (left and + right hand side tag variables of the copy operation). + + ``code:yygetaccept`` + ``YYGETACCEPT`` statement, possibly specialized for different APIs. + Supported variables: ``YYGETACCEPT``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``var`` (maps to + ``re2c:yyaccept`` configuration). + + ``code:yysetaccept`` + ``YYSETACCEPT`` statement, possibly specialized for different APIs. + Supported variables: ``YYSETACCEPT``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``var`` (maps to + ``re2c:yyaccept`` configuration) and ``val`` (numeric value of the + accepted rule). + + ``code:yygetcond`` + ``YYGETCOND`` statement, possibly specialized for different APIs. + Supported variables: ``YYGETCOND``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``var`` (maps to + ``re2c:yycond`` configuration). + + ``code:yysetcond`` + ``YYSETCOND`` statement, possibly specialized for different APIs. + Supported variables: ``YYSETCOND``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``var`` (maps to + ``re2c:yycond`` configuration) and ``val`` (numeric condition + identifier). + + ``code:yygetstate`` + ``YYGETSTATE`` statement, possibly specialized for different APIs. + Supported variables: ``YYGETSTATE``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``var`` (maps to + ``re2c:yystate`` configuration). + + ``code:yysetstate`` + ``YYSETSTATE`` statement, possibly specialized for different APIs. + Supported variables: ``YYSETSTATE``, ``yyrecord`` (map to the + corresponding ``re2c:`` configurations), ``var`` (maps to + ``re2c:yystate`` configuration) and ``val`` (state number). + + ``code:yylessthan`` + ``YYLESSTHAN`` statement, possibly specialized for different APIs. + Supported variables: ``YYLESSTHAN``, ``YYCURSOR``, ``YYLIMIT``, + ``yyrecord`` (map to the corresponding ``re2c:`` configurations), + ``need`` (the number of code units to check against). Conditional: + ``.many`` (true if the ``need`` is more than one). + + ``code:yybm_filter`` + Condition that is used to filter out ``yych`` values that are not + covered by the ``yybm`` table (used with ``--bitmaps`` option). + Supported variable: ``yych`` (maps to ``re2c:yych`` configuration). + + ``code:yybm_match`` + The format of ``yybm`` table check (generated with ``--bitmaps`` + option). Supported variables: ``yybm``, ``yych`` (map to the + corresponding ``re2c:`` configurations), ``offset`` (offset in the + ``yybm`` table that needs to be added to ``yych``) and ``mask`` (bit + mask that should be applied to the table entry to retrieve the boolean + value that needs to be checked) + +Here's a list of all global variables that are allowed in syntax files: + + ``nl`` + A newline. + + ``indent`` + A variable that does not produce any code, but has a side-effect of + increasing indentation level. + + ``dedent`` + A variable that does not produce any code, but has a side-effect of + decreasing indentation level. + + ``topindent`` + Indentation string for the current statement. Indentation level is + tracked and automatically updated by the code generator. + +Here's a list of all global conditionals that are allowed in syntax files: + + ``.api.simple`` + True if simple API is used (``--api simple`` or ``re2c:api = simple``). + + ``.api.generic`` + True if generic API is used (``--api generic`` or + ``re2c:api = generic``). + + ``.api.record`` + True if record API is used (``--api record`` or ``re2c:api = record``). + + ``.api_style.functions`` + True if function-like API style is used + (``re2c:api-style = functions``). + + ``.api_style.freeform`` + True if free-form API style is used (``re2c:api-style = free-form``). + + ``.case_ranges`` + True if case ranges feature is enabled (``--case-ranges`` or + ``re2c:case-ranges = 1``). + + ``.code_model.goto_label`` + True if code model based on goto/label is used (``--goto-label``). + + ``.code_model.loop_switch`` + True if code model based on loop/switch is used (``--loop-switch``). + + ``.code_model.recursive_functions`` + True if code model based on recursive functions is used + (``--recursive-function``). + + ``.date`` + True if the generated fingerprint should contain generation date. + + ``.loop_label`` + True if |re2c| generated loops must have a label (``re2c:label:yyloop`` + is set to a nonempty string). + + ``.monadic`` + True if the generated code should be monadic (``re2c:monadic = 1``). + This is only relevant for pure functional languages. + + ``.start_conditions`` + True if start conditions are enabled (``--start-conditions``). + + ``.storable_state`` + True if storable state is enabled (``--storable-state``). + + ``.unsafe`` + True if |re2c| should use "unsafe" blocks in order to generate faster + code (``--unsafe``, ``re2c:unsafe = 1``). This is only relevant for + languages that have "unsafe" feature. + + ``.version`` + True if the generated fingerprint should contain |re2c| version. +