Ambiguity with recursive rules (is this left-recursion?) #917

HoldYourWaffle · 2019-03-10T14:07:16Z

After experimenting in the playground for a bit I decided that this library seemed like the perfect fit for my project. In case you're curious, I'm trying to create a DSL for REST-api generation.

All was well until I tried adding grammar for generic types. In case you need a memory refresher, generics basically have a structure like this:

TypeA<TypeB>

Looks pretty simple and harmless doesn't it? Well what if TypeB is a generic type too?

TypeA<TypeB<TypeC>>

We get a recursive structure. I expressed this in my parser like this:

class MyParser extends Parser {

	private typeReference = this.RULE('typeReference', () => {
		this.CONSUME(Lexer.NamespacedIdentifier); //type name
		
		this.OPTION1(() => this.SUBRULE(this.genericsDefinition)); //optional generic type
		this.OPTION2(() => { //optional array type
			this.CONSUME(Lexer.LSquare);
			this.CONSUME(Lexer.RSquare);
		});
	});
	
	private genericsDefinition = this.RULE('genericsDefinition', () => {
		this.CONSUME(Lexer.LBeak);
		this.AT_LEAST_ONE_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.typeReference)
		});
		this.CONSUME(Lexer.RBeak);
	});

}

This worked fine for me, until I tried to use a typeReference in an OR-rule (full code + error below). I tried increasing maxLookahead, but that didn't help.
I'm not sure if this is left recursion (I know it's unsupported), the fact that it works just fine when it's not wrapped in an OR-rule makes me think it isn't. If it is left-recursion then the warning about it doesn't work.

So my question is: how do I express this grammar in chevrotain? Is it impossible and should I drop support for generics-in-generics? Did I increase my maxLookahead not far enough (I tried till 10)?

Thanks in advance!
(if you need anymore information feel free to ask)

Output

Error: Parser Definition Errors detected:
 Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, '<'> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's config
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.io/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

-------------------------------
Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, '['> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's config
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.io/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

-------------------------------
Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, ','> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's config
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.io/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

-------------------------------
Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, '>'> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's configy providing a larger 'maxLookahead' value in t you are doing...), see https://sap.github.io/chevrotain/documenthe parser's config                          noredissues for more details
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.i:\VOLATILE\workspace-2\TSAG\node_modules\chevrotain\lib\src\parseo/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

Lexer

import { Lexer, createToken } from 'chevrotain'

function ucFirst(str: string) {
	return str.charAt(0).toUpperCase() + str.slice(1).toLowerCase();
}

function escapeRegex(str: string) {
	return str.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}

function createKeywordToken(word: string) {
	return createToken({ name: `Keyword${ucFirst(word)}`, pattern: new RegExp(`${word}`), longer_alt: Identifier });
}

function createCharacterToken(name: string, character: string) {
	const token = createToken({ name: name, pattern: new RegExp(escapeRegex(character)) });
	token.LABEL = `'${character}'`;
	return token;
}

// --- Character tokens ---
export const LRound = createCharacterToken('LRound', '(');
export const RRound = createCharacterToken('RRound', ')');

export const LCurly = createCharacterToken('LCurly', '{');
export const RCurly = createCharacterToken('RCurly', '}');

export const LSquare = createCharacterToken('LSquare', '[');
export const RSquare = createCharacterToken('RSquare', ']');

export const LBeak = createCharacterToken('LBeak', '<');
export const RBeak = createCharacterToken('RBeak', '>');

export const Dot = createCharacterToken('Dot', '.');
export const Comma = createCharacterToken('Comma', ',');
export const Colon = createCharacterToken('Colon', ':');
export const Semicolon = createCharacterToken('Semicolon', ';');
export const Equals = createCharacterToken('Equals', '=');

// --- Literals ---
export const NumberLiteral = createToken({ name: 'NumberLiteral', pattern: /0|[1-9]\d*/ });
export const StringLiteral = createToken({ name: 'StringLiteral', pattern: /'(?:[^\\']|\\(?:[bfnrtv'\\/]|u[0-9a-fA-F]{4}))*'/ });
export const BooleanLiteral = createToken({ name: 'BooleanLiteral', pattern: /(true|false)/ });

// --- Indentifiers ---
export const MethodIdentifier = createToken({ name: 'MethodIdentifier', pattern: /(GET|POST|PUT|PATCH|DELETE)/ });
export const Identifier = createToken({ name: 'Identifier', pattern: /[a-zA-Z]\w*/ });
export const NamespacedIdentifier = createToken({ name: 'NamespacedIdentifier', pattern: /[a-zA-Z][\w\.]*/ });

// --- Keywords ---
export const KeywordWith = createKeywordToken('with');
export const KeywordStatus_code = createKeywordToken('status_code');
export const KeywordRequires = createKeywordToken('requires');

// --- Ignored stuff ---
const WhiteSpace = createToken({
	name: 'WhiteSpace',
	pattern: /\s+/,
	group: Lexer.SKIPPED
});

// --- Initialise lexer ---
export const tokens = [
	WhiteSpace,
	LRound, RRound, LCurly, RCurly, LSquare, RSquare, LBeak, RBeak,
	Dot, Comma, Colon, Semicolon, Equals,
	
	KeywordWith, KeywordStatus_code, KeywordRequires,
	
	NumberLiteral, StringLiteral, BooleanLiteral,
	MethodIdentifier, NamespacedIdentifier, Identifier
];

export const MyLexer = new Lexer(tokens);

Parser

import { Parser } from 'chevrotain'
import * as Lexer from './lexer-redacted'

export default class MyParser extends Parser {
	constructor() {
		super(Lexer.tokens);
		this.performSelfAnalysis();
	}
	
	public program = this.RULE('program', () => {
		this.MANY2(() => {
			this.SUBRULE(this.specialFunctionRequireClause)
			this.CONSUME(Lexer.Semicolon);
		});
		this.AT_LEAST_ONE(() => this.SUBRULE(this.handlerDeclaration));
	});
	
	
	// ---------- MISC ----------
	private typeReference = this.RULE('typeReference', () => {
		this.CONSUME(Lexer.NamespacedIdentifier);
		
		this.OPTION1(() => this.SUBRULE(this.genericsDefinition));
		this.OPTION2(() => { //array
			this.CONSUME(Lexer.LSquare);
			this.CONSUME(Lexer.RSquare);
		});
	});
	
	private genericsDefinition = this.RULE('genericsDefinition', () => {
		this.CONSUME(Lexer.LBeak);
		this.AT_LEAST_ONE_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.typeReference)
		});
		this.CONSUME(Lexer.RBeak);
	});
	
	
	// ---------- SPECIAL FUNCTIONS ----------
	private specialFunctionCall = this.RULE('specialFunctionCall', () => {
		this.SUBRULE(this.typeReference);
		this.CONSUME(Lexer.LRound);
		this.MANY_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.specialFunctionParameter)
		});
		this.CONSUME(Lexer.RRound);
	});
	
	private specialFunctionParameter = this.RULE('specialFunctionParameter', () => {
		this.OR([
			{ ALT: () => this.CONSUME(Lexer.StringLiteral) },
			{ ALT: () => this.CONSUME(Lexer.NumberLiteral) },
			{ ALT: () => this.CONSUME(Lexer.BooleanLiteral) }
		]);
	});
	
	
	// ---------- HANDLERS ----------
	private handlerDeclaration = this.RULE('handlerDeclaration', () => {
		this.CONSUME(Lexer.MethodIdentifier);
		this.CONSUME(Lexer.Equals);
		this.CONSUME(Lexer.Identifier);
		
		this.CONSUME(Lexer.LRound);
		this.MANY_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.handlerParameterDeclaration)
		})
		this.CONSUME(Lexer.RRound);
		
		this.OPTION1(this.handlerReturnDeclaration);
		this.OPTION2(this.specialFunctionRequireClause);
		
		this.CONSUME(Lexer.Semicolon);
	});
	
	private handlerParameterDeclaration = this.RULE('handlerParameterDeclaration', () => {
		this.OR([
			{ ALT: () => { //explicit
				this.CONSUME1(Lexer.Identifier);
				this.CONSUME(Lexer.Colon);
				this.SUBRULE(this.typeReference);
			}},
			
			{ ALT: () => this.CONSUME2(Lexer.Identifier)}, //alias
		]);
	});
	
	// ----------- THIS IS THE ONE THAT CHEVROTAIN DOESN'T LIKE -----------//
	private handlerReturnDeclaration = this.RULE('handlerReturnDeclaration', () => {
		this.CONSUME(Lexer.Colon);
		
		this.OR([
			{ ALT: () => {
				this.SUBRULE1(this.typeReference);
				this.CONSUME(Lexer.KeywordWith);
				this.SUBRULE1(this.statusCodeDeclaration);
			}},
			
			{ ALT: () => {
				this.SUBRULE2(this.statusCodeDeclaration);
			}},
			
			{ ALT: () => {
				this.SUBRULE2(this.typeReference);
			}}
		]);
	});
	// ---------------------------------------------------------------------------//
	
	private statusCodeDeclaration = this.RULE('statusCodeDeclaration', () => {
		this.CONSUME(Lexer.KeywordStatus_code);
		this.CONSUME(Lexer.LRound);
		this.CONSUME(Lexer.NumberLiteral);
		this.CONSUME(Lexer.RRound);
	});
	
	private specialFunctionRequireClause = this.RULE('specialFunctionRequireClause', () => {
		this.CONSUME(Lexer.KeywordRequires);
		
		this.AT_LEAST_ONE_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.OR([
				{ ALT: () => this.SUBRULE(this.specialFunctionCall) }, //full call
				{ ALT: () => this.CONSUME(Lexer.Identifier) } //alias
			])
		});
	});
};

The text was updated successfully, but these errors were encountered:

bd82 · 2019-03-10T14:26:47Z

The first and third options of that OR both start with "typeReference".
"typeReference" may have infinite length. That is to say for every K (maxlookahead) there would
be N > K thus that the "typeReference" rule would consume N tokens.
Chevrotain is a Recursive decent LL(K) parser, this means it needs to lookahead to choose between alternatives, and that it can look ahead at most K tokens ahead. but in our case because the prefix of these two alternatives is infinity long it can never be sure it picked the correct alternative.

You need to refactor this to something equivelent such as:

		this.OR([
			{ ALT: () => {
				this.SUBRULE1(this.typeReference);
                                this.OPTION(() => {
                                    	this.CONSUME(Lexer.KeywordWith);
			          	this.SUBRULE1(this.statusCodeDeclaration);
                                })
			}},			
			{ ALT: () => {
				this.SUBRULE2(this.statusCodeDeclaration);
			}},

		]);

So the lookahead can be resolved using a fixed number of tokens.
It is also possible to use backtracking, but that is strongly discouraged.

HoldYourWaffle · 2019-03-10T16:27:46Z

Aaah that makes a lot of sense. I think my mind almost went there, but sleep deprivation prevented it from actually solving it.

Thanks!

HoldYourWaffle closed this as completed Mar 10, 2019

HoldYourWaffle mentioned this issue Mar 10, 2019

OR-rule not working as expected #918

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguity with recursive rules (is this left-recursion?) #917

Ambiguity with recursive rules (is this left-recursion?) #917

HoldYourWaffle commented Mar 10, 2019

bd82 commented Mar 10, 2019 •

edited

Loading

HoldYourWaffle commented Mar 10, 2019

Ambiguity with recursive rules (is this left-recursion?) #917

Ambiguity with recursive rules (is this left-recursion?) #917

Comments

HoldYourWaffle commented Mar 10, 2019

Output

Lexer

Parser

bd82 commented Mar 10, 2019 • edited Loading

HoldYourWaffle commented Mar 10, 2019

bd82 commented Mar 10, 2019 •

edited

Loading