Version : 0.7.0
URL : http://github.com/sebastien/parsing
README : https://cdn.rawgit.com/sebastien/libparsing/master/README.html
libparsing
is a parsing element grammar (PEG) library written in C with
Python bindings. It offers decent performance while allowing for a
lot of flexibility. It is mainly intended to be used to create programming
languages and software engineering tools.
As opposed to more traditional parsing techniques, the grammar is not compiled but constructed using an API that allows dynamic update of the grammar.
The parser does not do any tokeninzation, the instead input stream is consumed and parsing elements are dynamically asked to match the next element of it. Once parsing elements match, the resulting matched input is processed and an action is triggered.
libparsing
supports the following features:
- backtracking, ie. going back in the input stream if a match is not found
- cherry-picking, ie. skipping unrecognized input
- contextual rules, ie. a rule that will match or not depending on external variables
Parsing elements are usually slower than compiled or FSM-based parsers as
they trade performance for flexibility. It's probably not a great idea to
use libparsing
if the parsing has to happen as fast as possible (ie. a protocol
implementation), but it is a great use for programming languages, as it
opens up the door to dynamic syntax plug-ins and multiple language
embedding.
If you're interested in PEG, you can start reading Brian Ford's original article. Projects such as PEG/LEG by Ian Piumarta http://piumarta.com/software/peg/ ,OMeta by Alessandro Warth http://www.tinlizzie.org/ometa/ or Haskell's Parsec library https://www.haskell.org/haskellwiki/Parsec are of particular interest in the field.
Here is a short example of what creating a simple grammar looks like in Python:
g = Grammar()
s = g.symbols
g.token("WS", "\s+")
g.token("NUMBER", "\d+(\.\d+)?")
g.token("VARIABLE", "\w+")
g.token("OPERATOR", "[\/\+\-\*]")
g.group("Value", s.NUMBER, s.VARIABLE)
g.rule("Suffix", s.OPERATOR._as("operator"), s.Value._as("value"))
g.rule("Expression", s.Value, s.Suffix.zeroOrMore())
g.axiom(s.Expression)
g.skip(s.WS)
match = g.parseString("10 + 20 / 5")
and the equivalent code in C
Grammar* g = Grammar_new()
SYMBOL(WS, TOKEN("\\s+"))
SYMBOL(NUMBER, TOKEN("\\d+(\\.\\d+)?"))
SYMBOL(VARIABLE, TOKEN("\\w+"))
SYMBOL(OPERATOR, GROUP("[\\/\\+\\-\\*]"))
SYMBOL(Value, GOUP(_S(NUMBER), _S(VARIABLE)))
SYMBOL(Suffix, RULE(_AS(_S(OPERATOR), "operator"), _AS(_S(Value), "value")))
SYMBOL(Expression, RULE(_S(Value), _MO(Suffix))
g->axiom = s_Expression;
g->skip(s_WS);
Grammar_prepare(g);
Match* match = Grammar_parseString(g, "10 + 20 / 5")
To install the Python parsing module:
easy_install libparsing # From Setuptools
pip install libparsing # From PIP
Note that for the above to work, you'll need a C compiler libffi-dev
and libpcre-dev
.
On Ubuntu, do sudo apt install build-essential libffi-dev libprcre-dev
.
To compile the C parsing module:
git clone http://github.com/sebastien/libparsing
cd libparsing
make
make install # You can set PREFIX
libparsing
works with GCC4 and Clang and is written following the c11
standard.
The parsing library is configured at compile-time to iterate on
specific elements of input, typically char
. You can redefine
the macro ITERATION_UNIT
to the type you'd like to iterate on.
By default, the ITERATION_UNIT
is a char
, which works both
for ASCII and UTF8. On the topic of Unicode/UTF8, the parsing
library only uses functions that are UTF8-savvy.
#ifndef ITERATION_UNIT
#define ITERATION_UNIT char
#endif
typedef ITERATION_UNIT iterated_t;
Input data is acquired through iterators. Iterators wrap an input source
(the default input is a FileInput
) and a move
callback that updates the
iterator's offset. The iterator will build a buffer of the acquired input
and maintain a pointer for the current offset within the data acquired from
the input stream.
You can get an iterator on a file by doing:
Iterator* iterator = Iterator_Open("example.txt");
typedef struct Iterator {
char status; // The status of the iterator, one of STATUS_{INIT|PROCESSING|INPUT_ENDED|ENDED}
char* buffer; // The buffer to the read data, note how it is a (void*) and not an `iterated_t`
iterated_t* current; // The pointer current offset within the buffer
iterated_t separator; // The character for line separator, `\n` by default.
size_t offset; // Offset in input (in bytes), might be different from `current - buffer` if some input was freed.
size_t lines; // Counter for lines that have been encountered
size_t capacity; // Content capacity (in bytes), might be bigger than the data acquired from the input
size_t available; // Available data in buffer (in bytes), always `<= capacity`
bool freeBuffer;
void* input; // Pointer to the input source
bool (*move) (struct Iterator*, int n); // Plug-in function to move to the previous/next positions
} Iterator;
The file input wraps information about the input file, such
as the FILE
object and the path
.
typedef struct FileInput {
FILE* file;
const char* path;
} FileInput;
The EOL character used to count lines in an iterator context.
extern iterated_t EOL;
Returns a new iterator instance with the given open file as input
Iterator* Iterator_Open(const char* path);
Returns a new iterator instance with the text
Iterator* Iterator_FromString(const char* text);
Iterator* Iterator_new(void);
void Iterator_free(Iterator* this);
Makes the given iterator open the file at the given path.
This will automatically assign a FileInput
to the iterator
as an input source.
bool Iterator_open( Iterator* this, const char *path );
Tells if the iterator has more available data. This means that there is available data after the current offset.
bool Iterator_hasMore( Iterator* this );
Returns the number of bytes available from the current iterator's position
up to the last available data. For dynamic streams, where the length is
unknown, this should be lesser or equalt to ITERATOR_BUFFER_AHEAD
.
size_t Iterator_remaining( Iterator* this );
Moves the iterator to the given offset
bool Iterator_moveTo ( Iterator* this, size_t offset );
bool String_move ( Iterator* this, int offset );
The number of iterated_t
that should be loaded after the iterator's
current position. This limits the numbers of iterated_t
that a Token
could match.
#define ITERATOR_BUFFER_AHEAD 64000
FileInput* FileInput_new(const char* path );
void FileInput_free(FileInput* this);
Preloads data from the input source so that the buffer has up to ITERATOR_BUFFER_AHEAD characters ahead.
size_t FileInput_preload( Iterator* this );
Advances/rewinds the given iterator, loading new data from the file input
whenever there is not ITERATOR_BUFFER_AHEAD
data elements
ahead of the iterator's current position.
bool FileInput_move ( Iterator* this, int n );
The Grammar
is the concrete definition of the language you're going to
parse. It is defined by an axiom
and input data that can be skipped,
such as white space.
The axiom
and skip
properties are both references to parsing elements.
typedef struct ParsingContext ParsingContext;
typedef struct ParsingElement ParsingElement;
typedef struct ParsingResult ParsingResult;
typedef struct Reference Reference;
typedef struct Match Match;
typedef void Element;
typedef struct Element { char type; // Type is used du differentiate ParsingElement from Reference int id; // The ID, assigned by the grammar, as the relative distance to the axiom } Element;
typedef struct Grammar {
ParsingElement* axiom; // The axiom
ParsingElement* skip; // The skipped element
int axiomCount; // The count of parsing elemetns in axiom
int skipCount; // The count of parsing elements in skip
Element** elements; // The set of all elements in the grammar
bool isVerbose;
} Grammar;
Grammar* Grammar_new(void);
void Grammar_free(Grammar* this);
void Grammar_prepare ( Grammar* this );
int Grammar_symbolsCount ( Grammar* this );
ParsingResult* Grammar_parseIterator( Grammar* this, Iterator* iterator );
ParsingResult* Grammar_parsePath( Grammar* this, const char* path );
ParsingResult* Grammar_parseString( Grammar* this, const char* text );
void Grammar_freeElements(Grammar* this);
typedef int (*WalkingCallback)(Element* this, int step, void* context);
int Element_walk( Element* this, WalkingCallback callback, void* context);
int Element__walk( Element* this, WalkingCallback callback, int step, void* context);
Parsing elements are the core elements that recognize and process input
data. There are 4 basic types: Work
, Token
, Group
and Rule
.
Parsing elements offer two main operations: recognize
and process
.
The recognize
method generates a Match
object (that might be the FAILURE
singleton if the data was not recognized). The process
method tranforms
corresponds to a user-defined action that transforms the Match
object
and returns the generated value.
Parsing element are assigned an id
that corresponds to their breadth-first distance
to the axiom. Before parsing, the grammar will re-assign the parsing element's
id accordingly.
typedef struct Match {
char status; // The status of the match (see STATUS_XXX)
size_t offset; // The offset of `iterated_t` matched
size_t length; // The number of `iterated_t` matched
Element* element;
ParsingContext* context;
void* data; // The matched data (usually a subset of the input stream)
struct Match* next; // A pointer to the next match (see `References`)
struct Match* children; // A pointer to the child match (see `References`)
void* result; // A pointer to the result of the match
} Match;
The different values for a match (or iterator)'s status
#define STATUS_INIT '-'
#define STATUS_PROCESSING '~'
#define STATUS_MATCHED 'M'
#define STATUS_SUCCESS 'S'
#define STATUS_PARTIAL 's'
#define STATUS_FAILED 'X'
#define STATUS_INPUT_ENDED '.'
#define STATUS_ENDED 'E'
#define TYPE_ELEMENT 'E'
#define TYPE_WORD 'W'
#define TYPE_TOKEN 'T'
#define TYPE_GROUP 'G'
#define TYPE_RULE 'R'
#define TYPE_CONDITION 'c'
#define TYPE_PROCEDURE 'p'
#define TYPE_REFERENCE '#'
A parsing element that is not bound to a grammar will have ID_UNBOUND by default.
#define ID_UNBOUND -10
A parsing element that being bound to a grammar (see Grammar_prepare
)
will have an id of ID_BINDING
temporarily.
#define ID_BINDING -1
A specific match that indicates a failure
extern Match FAILURE_S;
extern Match* FAILURE;
Creates new empty (successful) match
Match* Match_Empty(Element* element, ParsingContext* context);
Creates a new successful match of the given length
Match* Match_Success(size_t length, Element* element, ParsingContext* context);
Match* Match_new(void);
Frees the given match. If the match is FAILURE
, then it won't
be feed. This means that most of the times you won't need to free
a failed match, as it's likely to be the FAILURE
singleton.
void Match_free(Match* this);
bool Match_isSuccess(Match* this);
int Match_getOffset(Match* this);
int Match_getLength(Match* this);
int Match__walk(Match* this, WalkingCallback callback, int step, void* context );
int Match_countAll(Match* this);
typedef struct ParsingElement {
char type; // Type is used du differentiate ParsingElement from Reference
int id; // The ID, assigned by the grammar, as the relative distance to the axiom
const char* name; // The parsing element's name, for debugging
void* config; // The configuration of the parsing element
struct Reference* children; // The parsing element's children, if any
struct Match* (*recognize) (struct ParsingElement*, ParsingContext*);
struct Match* (*process) (struct ParsingElement*, ParsingContext*, Match*);
void (*freeMatch) (Match*);
} ParsingElement;
Tells if the given pointer is a pointer to a ParsingElement.
bool ParsingElement_Is(void *);
Creates a new parsing element and adds the given referenced parsing elements as children. Note that this is an internal constructor, and you should use the specialized versions instead.
ParsingElement* ParsingElement_new(Reference* children[]);
void ParsingElement_free(ParsingElement* this);
Adds a new reference as child of this parsing element. This will only
be effective for composite parsing elements such as Rule
or Token
.
ParsingElement* ParsingElement_add(ParsingElement *this, Reference *child);
ParsingElement* ParsingElement_clear(ParsingElement *this);
Returns the match for this parsing element for the given iterator's state. inline Match* ParsingElement_recognize( ParsingElement* this, ParsingContext* context );
Processes the given match once the parsing element has fully succeeded. This is where user-bound actions will be applied, and where you're most likely to do things such as construct an AST.
Match* ParsingElement_process( ParsingElement* this, Match* match );
Transparently sets the name of the element
ParsingElement* ParsingElement_name( ParsingElement* this, const char* name );
Words recognize a static string at the current iterator location.
The parsing element configuration information that is used by the
Token
methods.
typedef struct WordConfig {
const char* word;
size_t length;
} WordConfig;
ParsingElement* Word_new(const char* word);
void Word_free(ParsingElement* this);
The specialized match function for token parsing elements.
Match* Word_recognize(ParsingElement* this, ParsingContext* context);
const char* Word_word(ParsingElement* this);
const char* WordMatch_group(Match* match);
Tokens are regular expression based parsing elements. They do not have any children and test if the regular expression matches exactly at the iterator's current location.
The parsing element configuration information that is used by the
Token
methods.
typedef struct TokenConfig {
const char* expr;
#ifdef WITH_PCRE
pcre* regexp;
pcre_extra* extra;
#endif
} TokenConfig;
typedef struct TokenMatch {
int count;
const char** groups;
} TokenMatch;
Creates a new token with the given POSIX extended regular expression
ParsingElement* Token_new(const char* expr);
void Token_free(ParsingElement*);
The specialized match function for token parsing elements.
Match* Token_recognize(ParsingElement* this, ParsingContext* context);
const char* Token_expr(ParsingElement* this);
Frees the TokenMatch
created in Token_recognize
void TokenMatch_free(Match* match);
const char* TokenMatch_group(Match* match, int index);
int TokenMatch_count(Match* match);
We've seen that parsing elements can have children
. However, a parsing
element's children are not directly parsing elements but rather
parsing elements' Reference
s. This is why the ParsingElement_add
takes
a Reference
object as parameter.
References allow to share a single parsing element between many different
composite parsing elements, while decorating them with additional information
such as their cardinality (ONE
, OPTIONAL
, MANY
and MANY_OPTIONAL
)
and a name
that will allow process
actions to easily access specific
parts of the parsing element.
typedef struct Reference {
char type; // Set to Reference_T, to disambiguate with ParsingElement
int id; // The ID, assigned by the grammar, as the relative distance to the axiom
char cardinality; // Either ONE (default), OPTIONAL, MANY or MANY_OPTIONAL
const char* name; // The name of the reference (optional)
struct ParsingElement* element; // The reference to the parsing element
struct Reference* next; // The next child reference in the parsing elements
} Reference;
The different values for the Reference
cardinality.
#define CARDINALITY_OPTIONAL '?'
#define CARDINALITY_ONE '1'
#define CARDINALITY_MANY_OPTIONAL '*'
#define CARDINALITY_MANY '+'
Tells if the given pointer is a pointer to Reference
bool Reference_Is(void * this);
Ensures that the given element (or reference) is a reference.
Reference* Reference_Ensure(void* elementOrReference);
Returns a new reference wrapping the given parsing element
Reference* Reference_FromElement(ParsingElement* element);
References are typically owned by their single parent composite element.
Reference* Reference_new(void);
void Reference_free(Reference* this);
Sets the cardinality of this reference, returning it transprently.
Reference* Reference_cardinality(Reference* this, char cardinality);
Reference* Reference_name(Reference* this, const char* name);
bool Reference_hasNext(Reference* this);
bool Reference_hasElement(Reference* this);
int Reference__walk( Reference* this, WalkingCallback callback, int step, void* nothing );
Returns the matched value corresponding to the first match of this reference.
OPTIONAL
references might return EMPTY
, ONE
references will return
a match with a next=NULL
while MANY
may return a match with a next
pointing to the next match.
Match* Reference_recognize(Reference* this, ParsingContext* context);
Groups are composite parsing elements that will return the first matching reference's
match. Think of it as a logical or
.
ParsingElement* Group_new(Reference* children[]);
Match* Group_recognize(ParsingElement* this, ParsingContext* context);
Groups are composite parsing elements that only succeed if all their matching reference's.
ParsingElement* Rule_new(Reference* children[]);
Match* Rule_recognize(ParsingElement* this, ParsingContext* context);
Procedures are parsing elements that do not consume any input, always succeed and usually have a side effect, such as setting a variable in the parsing context.
typedef void (*ProcedureCallback)(ParsingElement* this, ParsingContext* context);
typedef void (*MatchCallback)(Match* m);
ParsingElement* Procedure_new(ProcedureCallback c);
Match* Procedure_recognize(ParsingElement* this, ParsingContext* context);
Conditions, like procedures, execute arbitrary code when executed, but they might return a FAILURE.
typedef Match* (*ConditionCallback)(ParsingElement*, ParsingContext*);
ParsingElement* Condition_new(ConditionCallback c);
Match* Condition_recognize(ParsingElement* this, ParsingContext* context);
The parsing itself is the process of taking a grammar
and applying it
to an input stream of data, represented by the iterator
.
The grammar's axiom
will be matched against the iterator
's current
position, and if necessary, the grammar's skip
parsing element
will be applied to advance the iterator.
typedef struct ParsingStep ParsingStep;
typedef struct ParsingOffset ParsingOffset;
typedef struct ParsingStats {
size_t bytesRead;
double parseTime;
size_t symbolsCount;
size_t* successBySymbol;
size_t* failureBySymbol;
size_t failureOffset; // A reference to the deepest failure
size_t matchOffset;
size_t matchLength;
Element* failureElement; // A reference to the failure element
} ParsingStats;
ParsingStats* ParsingStats_new(void);
void ParsingStats_free(ParsingStats* this);
void ParsingStats_setSymbolsCount(ParsingStats* this, size_t t);
Match* ParsingStats_registerMatch(ParsingStats* this, Element* e, Match* m);
typedef struct ParsingContext {
struct Grammar* grammar; // The grammar used to parse
struct Iterator* iterator; // Iterator on the input data
struct ParsingOffset* offsets; // The parsing offsets, starting at 0
struct ParsingOffset* current; // The current parsing offset
struct ParsingStats* stats;
} ParsingContext;
ParsingContext* ParsingContext_new( Grammar* g, Iterator* iterator );
iterated_t* ParsingContext_text( ParsingContext* this );
void ParsingContext_free( ParsingContext* this );
typedef struct ParsingResult {
char status;
Match* match;
ParsingContext* context;
} ParsingResult;
ParsingResult* ParsingResult_new(Match* match, ParsingContext* context);
Frees this parsing result instance as well as all the matches it referes to.
void ParsingResult_free(ParsingResult* this);
bool ParsingResult_isFailure(ParsingResult* this);
bool ParsingResult_isPartial(ParsingResult* this);
bool ParsingResult_isComplete(ParsingResult* this);
iterated_t* ParsingResult_text(ParsingResult* this);
int ParsingResult_textOffset(ParsingResult* this);
size_t ParsingResult_remaining(ParsingResult* this);
The result of recognizing parsing elements at given offsets within the
input stream is stored in ParsingOffset
. Each parsing offset is a stack
of ParsingStep
, corresponding to successive attempts at matching
parsing elements at the current position.
The parsing offset is a stack of parsing steps, where the tail is the most specific parsing step. By following the tail's previous parsing step, you can unwind the stack.
The parsing steps each have an offset within the iterated stream. Offsets where data has been fully extracted (ie, a leaf parsing element has matched and processing returned a NOTHING) can be freed as they are not necessary any more.
typedef struct ParsingOffset {
size_t offset; // The offset matched in the input stream
ParsingStep* last; // The last matched parsing step (ie. corresponding to the most specialized parsing element)
struct ParsingOffset* next; // The link to the next offset (if any)
} ParsingOffset;
ParsingOffset* ParsingOffset_new( size_t offset );
void ParsingOffset_free( ParsingOffset* this );
The parsing step allows to memoize the state of a parsing element at a given offset. This is the data structure that will be manipulated and created/destroyed the most during the parsing process.
typedef struct ParsingStep {
ParsingElement* element; // The parsing elemnt we're matching
char step; // The step corresponds to current child's index (0 for token/word)
unsigned int iteration; // The current iteration (on the step)
char status; // Match status `STATUS_{INIT|PROCESSING|FAILED}`
Match* match; // The corresponding match, if any.
struct ParsingStep* previous; // The previous parsing step on the parsing offset's stack
} ParsingStep;
ParsingStep* ParsingStep_new( ParsingElement* element );
void ParsingStep_free( ParsingStep* this );
typedef struct Processor Processor;
typedef void (*ProcessorCallback)(Processor* processor, Match* match);
typedef struct Processor {
ProcessorCallback fallback;
ProcessorCallback* callbacks;
int callbacksCount;
} Processor;
Processor* Processor_new( );
void Processor_free(Processor* this);
void Processor_register (Processor* this, int symbolID, ProcessorCallback callback) ;
int Processor_process (Processor* this, Match* match, int step);
void Utilities_indent( ParsingElement* this, ParsingContext* context );
void Utilities_dedent( ParsingElement* this, ParsingContext* context );
Match* Utilites_checkIndent( ParsingElement *this, ParsingContext* context );
The parsing library provides a set of macros that make defining grammars a much easier task. A grammar is usually defined in the following way:
- leaf symbols (words & tokens) are defined ;
- compound symbolds (rules & groups) are defined.
Let's take as simple grammar and define it with the straight API:
// Leaf symbols
ParsingElement* s_NUMBER = Token_new("\\d+");
ParsingElement* s_VARIABLE = Token_new("\\w+");
ParsingElement* s_OPERATOR = Token_new("[\\+\\-\\*\\/]");
// We also attach names to the symbols so that debugging will be easier
ParsingElement_name(s_NUMBER, "NUMBER");
ParsingElement_name(s_VARIABLE, "VARIABLE");
ParsingElement_name(s_OPERATOR, "OPERATOR");
// Now we defined the compound symbols
ParsingElement* s_Value = Group_new((Reference*[3]),{
Reference_cardinality(Reference_Ensure(s_NUMBER), CARDINALITY_ONE),
Reference_cardinality(Reference_Ensure(s_VARIABLE), CARDINALITY_ONE)
NULL
});
ParsingElement* s_Suffix = Rule_new((Reference*[3]),{
Reference_cardinality(Reference_Ensure(s_OPERATOR), CARDINALITY_ONE),
Reference_cardinality(Reference_Ensure(s_Value), CARDINALITY_ONE)
NULL
});
* ParsingElement* s_Expr = Rule_new((Reference*[3]),{
Reference_cardinality(Reference_Ensure(s_Value), CARDINALITY_ONE),
Reference_cardinality(Reference_Ensure(s_Suffix), CARDINALITY_MANY_OPTIONAL)
NULL
});
// We define the names as well
ParsingElement_name(s_Value, "Value");
ParsingElement_name(s_Suffix, "Suffix");
ParsingElement_name(s_Expr, "Expr");
As you can see, this is quite verbose and makes reading the grammar declaration a difficult task. Let's introduce a set of macros that will make expressing grammars much easier.
Declares a symbol of name n
as being parsing element e
.
#define SYMBOL(n,e) ParsingElement* s_ ## n = ParsingElement_name(e, #n);
Creates a Word
parsing element with the given regular expression
#define WORD(v) Word_new(v)
Creates a Token
parsing element with the given regular expression
#define TOKEN(v) Token_new(v)
Creates a Rule
parsing element with the references or parsing elements
as children.
#define RULE(...) Rule_new((Reference*[(VA_ARGS_COUNT(__VA_ARGS__)+1)]){__VA_ARGS__,NULL})
Creates a Group
parsing element with the references or parsing elements
as children.
#define GROUP(...) Group_new((Reference*[(VA_ARGS_COUNT(__VA_ARGS__)+1)]){__VA_ARGS__,NULL})
Creates a Procedure
parsing element
#define PROCEDURE(f) Procedure_new(f)
Creates a Condition
parsing element
#define CONDITION(f) Condition_new(f)
Refers to symbol n
, wrapping it in a CARDINALITY_ONE
reference
#define _S(n) ONE(s_ ## n)
Refers to symbol n
, wrapping it in a CARDINALITY_OPTIONAL
reference
#define _O(n) OPTIONAL(s_ ## n)
Refers to symbol n
, wrapping it in a CARDINALITY_MANY
reference
#define _M(n) MANY(s_ ## n)
Refers to symbol n
, wrapping it in a CARDINALITY_MANY_OPTIONAL
reference
#define _MO(n) MANY_OPTIONAL(s_ ## n)
Sets the name of reference r
to be v
#define _AS(r,v) Reference_name(Reference_Ensure(r), v)
The following set of macros is mostly used by the set of macros above. You probably won't need to use them directly.
Sets the name of the given parsing element e
to be the name n
.
#define NAME(n,e) ParsingElement_name(e,n)
Sets the given reference or parsing element's reference to CARDINALITY_ONE If a parsing element is given, it will be automatically wrapped in a reference.
#define ONE(v) Reference_cardinality(Reference_Ensure(v), CARDINALITY_ONE)
Sets the given reference or parsing element's reference to CARDINALITY_OPTIONAL If a parsing element is given, it will be automatically wrapped in a reference.
#define OPTIONAL(v) Reference_cardinality(Reference_Ensure(v), CARDINALITY_OPTIONAL)
Sets the given reference or parsing element's reference to CARDINALITY_MANY If a parsing element is given, it will be automatically wrapped in a reference.
#define MANY(v) Reference_cardinality(Reference_Ensure(v), CARDINALITY_MANY)
Sets the given reference or parsing element's reference to CARDINALITY_MANY_OPTIONAL If a parsing element is given, it will be automatically wrapped in a reference.
#define MANY_OPTIONAL(v) Reference_cardinality(Reference_Ensure(v), CARDINALITY_MANY_OPTIONAL)
The same grammar that we defined previously can now be expressed in the following way:
SYMBOL(NUMBER, TOKEN("\\d+"))
SYMBOL(VAR, TOKEN("\\w+"))
SYMBOL(OPERATOR, TOKEN("[\\+\\-\\*\\/]"))
SYMBOL(Value, GROUP( _S(NUMBER), _S(VAR) ))
SYMBOL(Suffix, RULE( _S(OPERATOR), _S(Value) ))
SYMBOL(Expr, RULE( _S(Value), _MO(Suffix) ))
All symbols will be define as s_XXX
, so that you can do:
ParsingGrammar* g = Grammar_new();
g->axiom = s_Expr;
Revised BSD License Copyright (c) 2014, FFunction inc (1165373771 Quebec inc) All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the FFunction inc (CANADA) nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.