-
Notifications
You must be signed in to change notification settings - Fork 156
Parsing Performance Tuning
In certain applications, where parsing performance is of the essence, you might want to increase the parsing speed of your grammar to the maximum possible with parboileds parsing engine. There are two basic approaches to optimizing parsing performance: Optimizing the parsing grammar itself and optimizing
parboileds “execution” of the grammar.
One good way of optimizing performance is to think about how parboileds recursive descent parser will match common input in your applications language and identify “problematic” areas.
Consider the following rule:
Rule SomeRule() {
return FirstOf(
Sequence(A(), X()),
Sequence(A(), B(), Y()),
Sequence(A(), B(), Z())
);
}
Even though this rule construct is perfectly legal it is not the most efficient for a recursive descent parser that does not perform match memoization (which parboiled not currently supports). It order to successfully match the third alternative of the FirstOf rule in this example parboiled will have to match rule A three times and rule B twice, since the first alternative will only fail after A has already been successfully matched and parboiled will have to backtrack to the input position before the A in order to try the second alternative (which in turn will only fail after A and B have been successfully matched).
Rewriting the rule to this construct:
Rule SomeRule() {
return Sequence(
A(),
FirstOf(
X(),
Sequence(
B(),
FirstOf(
Y(),
Z()
)
)
)
);
}
can speed up parsing performance considerably, especially if the third alternative is very common in the input and the rules A and B are rather complex. Of course, complete left-factoring in the way of this example is not always possible or convenient since it does require changing the rule structure in a way that might not conform to the semantic logic of the grammar or complicate action design. However, it is helpful to keep left-factoring in mind when designing a grammar, especially for the “costly” rules higher-up in your rule tree .
When optimizing a grammar for parsing performance it is also helpful to look at the statistical properties of the expected parsing input, especially with regard to the probability of occurrence of certain FirstOf alternatives.
Consider this simple example:
Rule SomeRule() {
return FirstOf(
A(),
B(),
C()
);
}
If you already know that alternative C is much more likely to match than alternatives A and B in most inputs you expect your parser to see you might want to reorder the alternatives to this:
Rule SomeRule() {
return FirstOf(
C(),
A(),
B()
);
}
This way parboiled will always try alternative C first and spend less time with trying and failing with the other two alternatives. However, this optimization is only safe when the alternatives being reordered are completely independent. If, for example, A, B, and C are defined to match a 4, 2 or 1 digit number as follows
Rule A() { return Sequence(Digit(), Digit(), Digit(), Digit()); }
Rule B() { return Sequence(Digit(), Digit()); }
Rule C() { return Digit(); }
you cannot simply move C first. If you did C would always match and A and B would never match, since C is a prefix of A and B.
Apart from optimizing the grammar itself for increased parsing performance you can also tweak the way in which parboiled matches the various rules of your grammar.
Since building a parse tree can be quite expensive (time- and memory-wise) it is better to not enable parse-tree building when parsing performance is important.
Many times it is not that easy to completely foresee, how a large and possibly complicated parser digests common input. Profiling common scenarios with The ProfilingParseRunner can yield valuable insights into which areas of your grammar are particularly stressed and where grammar optimization might be most beneficial.
Currently parboiled only features limited “packratting” support, which is the technique of memoizing the results of previously matched rules for certain input locations to curb the effect of excessive backtracking. However, since “dumb” memoization comes with significant time and memory costs itself its effects on parsing performance heavily depend on the actual grammar, input and implementation details. In-depth testing of various parboiled parsers has shown that the effects of memoization are rarely beneficial to bottom-line parsing speed in somewhat optimized, real-world grammars.
One memoization option that parboiled currently supports is the memoization of rule mismatches for consecutive applications of a rule at the same input location. This memoization can be implemented with very little overhead and can help reduce the number of re-mismatches as reported by The ProfilingParseRunner.
- Introduction
- ... Motivation
- ... Features
- ... Simple Java Example
- ... Simple Scala Example
- ... RegEx vs. parboiled vs. Parser Generators
- ... Projects using parboiled
- Installation
- Concepts
- ... The Big Picture
- ... The Rule Tree
- ... The Value Stack
- ... The Parse Tree
- ... AST Construction
- ... Parse Error Handling
- parboiled for Java
- ... Rule Construction in Java
- ... Parser Action Expressions
- ... Working with the Value Stack
- ... Action Variables
- ... Parser Extension in Detail
- ... Style Guide
- ... Java Examples
- ...... ABC Grammar
- ...... Calculators
- ...... Time Parser
- ...... Java Parser
- ...... Markdown processor
- parboiled for Scala
- ... Rule Construction in Scala
- ... Parser Actions in Scala
- ... Parser Testing in Scala
- ... Scala Examples
- ...... Simple Calculator
- ...... JSON Parser
- Advanced Topics
- ... Handling Whitespace
- ... Parsing Performance Tuning
- ... Indentation Based Grammars
- ... The ProfilingParseRunner
- ... Grammar and Parser Debugging
- ... Thread Safety
- Building parboiled
- parboiled-core Javadoc API
- parboiled-java Javadoc API
- parboiled-scala Scaladoc API
- Change Log
- Patch Policy