Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize redundant fail checks #118

Merged
merged 4 commits into from
Jun 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ Released: TBD

By default this new option contains an array with [reserved JavaScript words][reserved]
[@Mingun](https://github.com/peggyjs/peggy/pull/150)
- Several optimizations in the generator. Generated parsers should now be faster and smaller
[@Mingun](https://github.com/peggyjs/peggy/pull/118)

[reserved]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#reserved_keywords_as_of_ecmascript_2015

Expand Down
16 changes: 9 additions & 7 deletions lib/compiler/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

const generateBytecode = require("./passes/generate-bytecode");
const generateJS = require("./passes/generate-js");
const inferenceMatchResult = require("./passes/inference-match-result");
const removeProxyRules = require("./passes/remove-proxy-rules");
const reportDuplicateLabels = require("./passes/report-duplicate-labels");
const reportDuplicateRules = require("./passes/report-duplicate-rules");
Expand Down Expand Up @@ -38,21 +39,22 @@ const compiler = {
// or modify it as needed. If the pass encounters a semantic error, it throws
// |peg.GrammarError|.
passes: {
check: {
check: [
reportUndefinedRules,
reportDuplicateRules,
reportDuplicateLabels,
reportInfiniteRecursion,
reportInfiniteRepetition,
reportIncorrectPlucking
},
transform: {
removeProxyRules
},
generate: {
],
transform: [
removeProxyRules,
inferenceMatchResult,
],
generate: [
generateBytecode,
generateJS
}
]
},

// Generates a parser from a specified grammar AST. Throws |peg.GrammarError|
Expand Down
127 changes: 100 additions & 27 deletions lib/compiler/passes/generate-bytecode.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
const asts = require("../asts");
const op = require("../opcodes");
const visitor = require("../visitor");
const { ALWAYS_MATCH, SOMETIMES_MATCH, NEVER_MATCH } = require("./inference-match-result");

// Generates bytecode.
//
Expand Down Expand Up @@ -195,6 +196,27 @@ const visitor = require("../visitor");
// [29] SILENT_FAILS_OFF
//
// silentFails--;
//
// This pass can use the results of other previous passes, each of which can
// change the AST (and, as consequence, the bytecode).
//
// In particular, if the pass |inferenceMatchResult| has been run before this pass,
// then each AST node will contain a |match| property, which represents a possible
// match result of the node:
// - `<0` - node is never matched, for example, `!('a'*)` (negation of the always
// matched node). Generator can put |FAILED| to the stack immediately
// - `=0` - sometimes node matched, sometimes not. This is the same behavior
// when |match| is missed
// - `>0` - node is always matched, for example, `'a'*` (because result is an
// empty array, or an array with some elements). The generator does not
// need to add a check for |FAILED|, because it is impossible
//
// To handle the situation, when the |inferenceMatchResult| has not run (that
// happens, for example, in tests), the |match| value extracted using the
// `|0` trick, which performing cast of any value to an integer with value `0`
// that is equivalent of an unknown match result and signals the generator that
// runtime check for the |FAILED| is required. Trick is explained on the
// Wikipedia page (https://en.wikipedia.org/wiki/Asm.js#Code_generation)
function generateBytecode(ast) {
const literals = [];
const classes = [];
Expand Down Expand Up @@ -248,7 +270,10 @@ function generateBytecode(ast) {
return first.concat(...args);
}

function buildCondition(condCode, thenCode, elseCode) {
function buildCondition(match, condCode, thenCode, elseCode) {
if (match === ALWAYS_MATCH) { return thenCode; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a lot easier. thanks for that.

if (match === NEVER_MATCH) { return elseCode; }

return condCode.concat(
[thenCode.length, elseCode.length],
thenCode,
Expand All @@ -267,6 +292,8 @@ function generateBytecode(ast) {
}

function buildSimplePredicate(expression, negative, context) {
const match = expression.match | 0;
Mingun marked this conversation as resolved.
Show resolved Hide resolved

return buildSequence(
[op.PUSH_CURR_POS],
[op.SILENT_FAILS_ON],
Expand All @@ -277,6 +304,7 @@ function generateBytecode(ast) {
}),
[op.SILENT_FAILS_OFF],
buildCondition(
negative ? -match : match,
Mingun marked this conversation as resolved.
Show resolved Hide resolved
[negative ? op.IF_ERROR : op.IF_NOT_ERROR],
buildSequence(
[op.POP],
Expand All @@ -292,15 +320,16 @@ function generateBytecode(ast) {
);
}

function buildSemanticPredicate(code, negative, context) {
function buildSemanticPredicate(node, negative, context) {
const functionIndex = addFunctionConst(
true, Object.keys(context.env), code
true, Object.keys(context.env), node.code
);

return buildSequence(
[op.UPDATE_SAVED_POS],
buildCall(functionIndex, 0, context.env, context.sp),
buildCondition(
node.match | 0,
Mingun marked this conversation as resolved.
Show resolved Hide resolved
[op.IF],
buildSequence(
[op.POP],
Expand Down Expand Up @@ -341,7 +370,11 @@ function generateBytecode(ast) {
},

named(node, context) {
const nameIndex = addExpectedConst({ type: "rule", value: node.name });
const match = node.match | 0;
Mingun marked this conversation as resolved.
Show resolved Hide resolved
// Expectation not required if node always fail
const nameIndex = match === NEVER_MATCH ? null : addExpectedConst(
{ type: "rule", value: node.name }
);

// The code generated below is slightly suboptimal because |FAIL| pushes
// to the stack, so we need to stick a |POP| in front of it. We lack a
Expand All @@ -351,20 +384,34 @@ function generateBytecode(ast) {
[op.SILENT_FAILS_ON],
generate(node.expression, context),
[op.SILENT_FAILS_OFF],
buildCondition([op.IF_ERROR], [op.FAIL, nameIndex], [])
buildCondition(match, [op.IF_ERROR], [op.FAIL, nameIndex], [])
);
},

choice(node, context) {
function buildAlternativesCode(alternatives, context) {
const match = alternatives[0].match | 0;
const first = generate(alternatives[0], {
sp: context.sp,
env: cloneEnv(context.env),
action: null
});
// If an alternative always match, no need to generate code for the next
// alternatives. Because their will never tried to match, any side-effects
// from next alternatives is impossible so we can skip their generation
if (match === ALWAYS_MATCH) {
return first;
}

// Even if an alternative never match it can have side-effects from
// a semantic predicates or an actions, so we can not skip generation
// of the first alternative.
// We can do that when analysis for possible side-effects will be introduced
return buildSequence(
generate(alternatives[0], {
sp: context.sp,
env: cloneEnv(context.env),
action: null
}),
first,
alternatives.length > 1
? buildCondition(
SOMETIMES_MATCH,
[op.IF_ERROR],
buildSequence(
[op.POP],
Expand All @@ -382,21 +429,24 @@ function generateBytecode(ast) {
action(node, context) {
const env = cloneEnv(context.env);
const emitCall = node.expression.type !== "sequence"
|| node.expression.elements.length === 0;
|| node.expression.elements.length === 0;
const expressionCode = generate(node.expression, {
sp: context.sp + (emitCall ? 1 : 0),
env,
action: node
});
const functionIndex = addFunctionConst(
false, Object.keys(env), node.code
);
const match = node.expression.match | 0;
// Function only required if expression can match
const functionIndex = emitCall && match !== NEVER_MATCH
? addFunctionConst(false, Object.keys(env), node.code)
: null;

return emitCall
? buildSequence(
[op.PUSH_CURR_POS],
expressionCode,
buildCondition(
match,
[op.IF_NOT_ERROR],
buildSequence(
[op.LOAD_SAVED_POS, 1],
Expand All @@ -412,8 +462,7 @@ function generateBytecode(ast) {
sequence(node, context) {
function buildElementsCode(elements, context) {
if (elements.length > 0) {
const processedCount
= node.elements.length - elements.slice(1).length;
const processedCount = node.elements.length - elements.length + 1;

return buildSequence(
generate(elements[0], {
Expand All @@ -423,6 +472,7 @@ function generateBytecode(ast) {
action: null
}),
buildCondition(
elements[0].match | 0,
Mingun marked this conversation as resolved.
Show resolved Hide resolved
[op.IF_NOT_ERROR],
buildElementsCode(elements.slice(1), {
sp: context.sp + 1,
Expand Down Expand Up @@ -508,6 +558,7 @@ function generateBytecode(ast) {
action: null
}),
buildCondition(
node.match | 0,
Mingun marked this conversation as resolved.
Show resolved Hide resolved
[op.IF_NOT_ERROR],
buildSequence([op.POP], [op.TEXT]),
[op.NIP]
Expand All @@ -531,6 +582,10 @@ function generateBytecode(ast) {
action: null
}),
buildCondition(
// Check expression match, not the node match
// If expression always match, no need to replace FAILED to NULL,
// because FAILED will never appeared
-(node.expression.match | 0),
Mingun marked this conversation as resolved.
Show resolved Hide resolved
[op.IF_ERROR],
buildSequence([op.POP], [op.PUSH_NULL]),
[]
Expand Down Expand Up @@ -564,6 +619,8 @@ function generateBytecode(ast) {
[op.PUSH_EMPTY_ARRAY],
expressionCode,
buildCondition(
// Condition depends on the expression match, not the node match
node.expression.match | 0,
Mingun marked this conversation as resolved.
Show resolved Hide resolved
[op.IF_NOT_ERROR],
buildSequence(buildAppendLoop(expressionCode), [op.POP]),
buildSequence([op.POP], [op.POP], [op.PUSH_FAILED])
Expand All @@ -580,11 +637,11 @@ function generateBytecode(ast) {
},

semantic_and(node, context) {
return buildSemanticPredicate(node.code, false, context);
return buildSemanticPredicate(node, false, context);
Mingun marked this conversation as resolved.
Show resolved Hide resolved
},

semantic_not(node, context) {
return buildSemanticPredicate(node.code, true, context);
return buildSemanticPredicate(node, true, context);
},

rule_ref(node) {
Expand All @@ -593,19 +650,26 @@ function generateBytecode(ast) {

literal(node) {
if (node.value.length > 0) {
const stringIndex = addLiteralConst(
const match = node.match | 0;
// String only required if condition is generated or string is
// case-sensitive and node always match
const needConst = match === SOMETIMES_MATCH
|| (match === ALWAYS_MATCH && !node.ignoreCase);
const stringIndex = needConst ? addLiteralConst(
node.ignoreCase ? node.value.toLowerCase() : node.value
);
const expectedIndex = addExpectedConst({
) : null;
// Expectation not required if node always match
const expectedIndex = match !== ALWAYS_MATCH ? addExpectedConst({
type: "literal",
value: node.value,
ignoreCase: node.ignoreCase
});
}) : null;

// For case-sensitive strings the value must match the beginning of the
// remaining input exactly. As a result, we can use |ACCEPT_STRING| and
// save one |substr| call that would be needed if we used |ACCEPT_N|.
return buildCondition(
match,
node.ignoreCase
? [op.MATCH_STRING_IC, stringIndex]
: [op.MATCH_STRING, stringIndex],
Expand All @@ -620,25 +684,34 @@ function generateBytecode(ast) {
},

class(node) {
const classIndex = addClassConst(node);
const expectedIndex = addExpectedConst({
const match = node.match | 0;
Mingun marked this conversation as resolved.
Show resolved Hide resolved
// Character class constant only required if condition is generated
const classIndex = match === SOMETIMES_MATCH ? addClassConst(node) : null;
// Expectation not required if node always match
const expectedIndex = match !== ALWAYS_MATCH ? addExpectedConst({
type: "class",
value: node.parts,
inverted: node.inverted,
ignoreCase: node.ignoreCase
});
}) : null;

return buildCondition(
match,
[op.MATCH_CHAR_CLASS, classIndex],
[op.ACCEPT_N, 1],
[op.FAIL, expectedIndex]
);
},

any() {
const expectedIndex = addExpectedConst({ type: "any" });
any(node) {
const match = node.match | 0;
// Expectation not required if node always match
const expectedIndex = match !== ALWAYS_MATCH ? addExpectedConst({
type: "any"
}) : null;

return buildCondition(
match,
[op.MATCH_ANY],
[op.ACCEPT_N, 1],
[op.FAIL, expectedIndex]
Expand Down
Loading