Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguity with recursive rules (is this left-recursion?) #917

Closed
HoldYourWaffle opened this issue Mar 10, 2019 · 2 comments
Closed

Ambiguity with recursive rules (is this left-recursion?) #917

HoldYourWaffle opened this issue Mar 10, 2019 · 2 comments

Comments

@HoldYourWaffle
Copy link
Contributor

After experimenting in the playground for a bit I decided that this library seemed like the perfect fit for my project. In case you're curious, I'm trying to create a DSL for REST-api generation.

All was well until I tried adding grammar for generic types. In case you need a memory refresher, generics basically have a structure like this:

TypeA<TypeB>

Looks pretty simple and harmless doesn't it? Well what if TypeB is a generic type too?

TypeA<TypeB<TypeC>>

We get a recursive structure. I expressed this in my parser like this:

class MyParser extends Parser {

	private typeReference = this.RULE('typeReference', () => {
		this.CONSUME(Lexer.NamespacedIdentifier); //type name
		
		this.OPTION1(() => this.SUBRULE(this.genericsDefinition)); //optional generic type
		this.OPTION2(() => { //optional array type
			this.CONSUME(Lexer.LSquare);
			this.CONSUME(Lexer.RSquare);
		});
	});
	
	private genericsDefinition = this.RULE('genericsDefinition', () => {
		this.CONSUME(Lexer.LBeak);
		this.AT_LEAST_ONE_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.typeReference)
		});
		this.CONSUME(Lexer.RBeak);
	});

}

This worked fine for me, until I tried to use a typeReference in an OR-rule (full code + error below). I tried increasing maxLookahead, but that didn't help.
I'm not sure if this is left recursion (I know it's unsupported), the fact that it works just fine when it's not wrapped in an OR-rule makes me think it isn't. If it is left-recursion then the warning about it doesn't work.

So my question is: how do I express this grammar in chevrotain? Is it impossible and should I drop support for generics-in-generics? Did I increase my maxLookahead not far enough (I tried till 10)?

Thanks in advance!
(if you need anymore information feel free to ask)

Output

Error: Parser Definition Errors detected:
 Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, '<'> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's config
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.io/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

-------------------------------
Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, '['> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's config
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.io/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

-------------------------------
Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, ','> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's config
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.io/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

-------------------------------
Ambiguous alternatives: <1 ,3> in <OR> inside <handlerReturnDeclaration> Rule,
<NamespacedIdentifier, '<', NamespacedIdentifier, '>'> may appears as a prefix path in all these alternatives.To Resolve this, try one of of the following:
1. Refactor your grammar to be LL(K) for the current value of k (by default k=4})
2. Increase the value of K for your grammar by providing a larger 'maxLookahead' value in the parser's configy providing a larger 'maxLookahead' value in t you are doing...), see https://sap.github.io/chevrotain/documenthe parser's config                          noredissues for more details
3. This issue can be ignored (if you know what you are doing...), see https://sap.github.i:\VOLATILE\workspace-2\TSAG\node_modules\chevrotain\lib\src\parseo/chevrotain/documentation/4_2_0/interfaces/iparserconfig.html#ignoredissues for more details

Lexer

import { Lexer, createToken } from 'chevrotain'

function ucFirst(str: string) {
	return str.charAt(0).toUpperCase() + str.slice(1).toLowerCase();
}

function escapeRegex(str: string) {
	return str.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}

function createKeywordToken(word: string) {
	return createToken({ name: `Keyword${ucFirst(word)}`, pattern: new RegExp(`${word}`), longer_alt: Identifier });
}

function createCharacterToken(name: string, character: string) {
	const token = createToken({ name: name, pattern: new RegExp(escapeRegex(character)) });
	token.LABEL = `'${character}'`;
	return token;
}

// --- Character tokens ---
export const LRound = createCharacterToken('LRound', '(');
export const RRound = createCharacterToken('RRound', ')');

export const LCurly = createCharacterToken('LCurly', '{');
export const RCurly = createCharacterToken('RCurly', '}');

export const LSquare = createCharacterToken('LSquare', '[');
export const RSquare = createCharacterToken('RSquare', ']');

export const LBeak = createCharacterToken('LBeak', '<');
export const RBeak = createCharacterToken('RBeak', '>');

export const Dot = createCharacterToken('Dot', '.');
export const Comma = createCharacterToken('Comma', ',');
export const Colon = createCharacterToken('Colon', ':');
export const Semicolon = createCharacterToken('Semicolon', ';');
export const Equals = createCharacterToken('Equals', '=');

// --- Literals ---
export const NumberLiteral = createToken({ name: 'NumberLiteral', pattern: /0|[1-9]\d*/ });
export const StringLiteral = createToken({ name: 'StringLiteral', pattern: /'(?:[^\\']|\\(?:[bfnrtv'\\/]|u[0-9a-fA-F]{4}))*'/ });
export const BooleanLiteral = createToken({ name: 'BooleanLiteral', pattern: /(true|false)/ });

// --- Indentifiers ---
export const MethodIdentifier = createToken({ name: 'MethodIdentifier', pattern: /(GET|POST|PUT|PATCH|DELETE)/ });
export const Identifier = createToken({ name: 'Identifier', pattern: /[a-zA-Z]\w*/ });
export const NamespacedIdentifier = createToken({ name: 'NamespacedIdentifier', pattern: /[a-zA-Z][\w\.]*/ });

// --- Keywords ---
export const KeywordWith = createKeywordToken('with');
export const KeywordStatus_code = createKeywordToken('status_code');
export const KeywordRequires = createKeywordToken('requires');

// --- Ignored stuff ---
const WhiteSpace = createToken({
	name: 'WhiteSpace',
	pattern: /\s+/,
	group: Lexer.SKIPPED
});

// --- Initialise lexer ---
export const tokens = [
	WhiteSpace,
	LRound, RRound, LCurly, RCurly, LSquare, RSquare, LBeak, RBeak,
	Dot, Comma, Colon, Semicolon, Equals,
	
	KeywordWith, KeywordStatus_code, KeywordRequires,
	
	NumberLiteral, StringLiteral, BooleanLiteral,
	MethodIdentifier, NamespacedIdentifier, Identifier
];

export const MyLexer = new Lexer(tokens);

Parser

import { Parser } from 'chevrotain'
import * as Lexer from './lexer-redacted'

export default class MyParser extends Parser {
	constructor() {
		super(Lexer.tokens);
		this.performSelfAnalysis();
	}
	
	public program = this.RULE('program', () => {
		this.MANY2(() => {
			this.SUBRULE(this.specialFunctionRequireClause)
			this.CONSUME(Lexer.Semicolon);
		});
		this.AT_LEAST_ONE(() => this.SUBRULE(this.handlerDeclaration));
	});
	
	
	// ---------- MISC ----------
	private typeReference = this.RULE('typeReference', () => {
		this.CONSUME(Lexer.NamespacedIdentifier);
		
		this.OPTION1(() => this.SUBRULE(this.genericsDefinition));
		this.OPTION2(() => { //array
			this.CONSUME(Lexer.LSquare);
			this.CONSUME(Lexer.RSquare);
		});
	});
	
	private genericsDefinition = this.RULE('genericsDefinition', () => {
		this.CONSUME(Lexer.LBeak);
		this.AT_LEAST_ONE_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.typeReference)
		});
		this.CONSUME(Lexer.RBeak);
	});
	
	
	// ---------- SPECIAL FUNCTIONS ----------
	private specialFunctionCall = this.RULE('specialFunctionCall', () => {
		this.SUBRULE(this.typeReference);
		this.CONSUME(Lexer.LRound);
		this.MANY_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.specialFunctionParameter)
		});
		this.CONSUME(Lexer.RRound);
	});
	
	private specialFunctionParameter = this.RULE('specialFunctionParameter', () => {
		this.OR([
			{ ALT: () => this.CONSUME(Lexer.StringLiteral) },
			{ ALT: () => this.CONSUME(Lexer.NumberLiteral) },
			{ ALT: () => this.CONSUME(Lexer.BooleanLiteral) }
		]);
	});
	
	
	// ---------- HANDLERS ----------
	private handlerDeclaration = this.RULE('handlerDeclaration', () => {
		this.CONSUME(Lexer.MethodIdentifier);
		this.CONSUME(Lexer.Equals);
		this.CONSUME(Lexer.Identifier);
		
		this.CONSUME(Lexer.LRound);
		this.MANY_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.SUBRULE(this.handlerParameterDeclaration)
		})
		this.CONSUME(Lexer.RRound);
		
		this.OPTION1(this.handlerReturnDeclaration);
		this.OPTION2(this.specialFunctionRequireClause);
		
		this.CONSUME(Lexer.Semicolon);
	});
	
	private handlerParameterDeclaration = this.RULE('handlerParameterDeclaration', () => {
		this.OR([
			{ ALT: () => { //explicit
				this.CONSUME1(Lexer.Identifier);
				this.CONSUME(Lexer.Colon);
				this.SUBRULE(this.typeReference);
			}},
			
			{ ALT: () => this.CONSUME2(Lexer.Identifier)}, //alias
		]);
	});
	
	// ----------- THIS IS THE ONE THAT CHEVROTAIN DOESN'T LIKE -----------//
	private handlerReturnDeclaration = this.RULE('handlerReturnDeclaration', () => {
		this.CONSUME(Lexer.Colon);
		
		this.OR([
			{ ALT: () => {
				this.SUBRULE1(this.typeReference);
				this.CONSUME(Lexer.KeywordWith);
				this.SUBRULE1(this.statusCodeDeclaration);
			}},
			
			{ ALT: () => {
				this.SUBRULE2(this.statusCodeDeclaration);
			}},
			
			{ ALT: () => {
				this.SUBRULE2(this.typeReference);
			}}
		]);
	});
	// ---------------------------------------------------------------------------//
	
	private statusCodeDeclaration = this.RULE('statusCodeDeclaration', () => {
		this.CONSUME(Lexer.KeywordStatus_code);
		this.CONSUME(Lexer.LRound);
		this.CONSUME(Lexer.NumberLiteral);
		this.CONSUME(Lexer.RRound);
	});
	
	private specialFunctionRequireClause = this.RULE('specialFunctionRequireClause', () => {
		this.CONSUME(Lexer.KeywordRequires);
		
		this.AT_LEAST_ONE_SEP({
			SEP: Lexer.Comma,
			DEF: () => this.OR([
				{ ALT: () => this.SUBRULE(this.specialFunctionCall) }, //full call
				{ ALT: () => this.CONSUME(Lexer.Identifier) } //alias
			])
		});
	});
};
@bd82
Copy link
Member

bd82 commented Mar 10, 2019

  • The first and third options of that OR both start with "typeReference".

  • "typeReference" may have infinite length. That is to say for every K (maxlookahead) there would
    be N > K thus that the "typeReference" rule would consume N tokens.

  • Chevrotain is a Recursive decent LL(K) parser, this means it needs to lookahead to choose between alternatives, and that it can look ahead at most K tokens ahead. but in our case because the prefix of these two alternatives is infinity long it can never be sure it picked the correct alternative.

You need to refactor this to something equivelent such as:

		this.OR([
			{ ALT: () => {
				this.SUBRULE1(this.typeReference);
                                this.OPTION(() => {
                                    	this.CONSUME(Lexer.KeywordWith);
			          	this.SUBRULE1(this.statusCodeDeclaration);
                                })
			}},			
			{ ALT: () => {
				this.SUBRULE2(this.statusCodeDeclaration);
			}},

		]);

So the lookahead can be resolved using a fixed number of tokens.
It is also possible to use backtracking, but that is strongly discouraged.

@HoldYourWaffle
Copy link
Contributor Author

Aaah that makes a lot of sense. I think my mind almost went there, but sleep deprivation prevented it from actually solving it.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants