Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lookaround assertions? #134

Open
1 of 2 tasks
alecthomas opened this issue Mar 22, 2021 · 9 comments
Open
1 of 2 tasks

Support lookaround assertions? #134

alecthomas opened this issue Mar 22, 2021 · 9 comments

Comments

@alecthomas
Copy link
Owner

alecthomas commented Mar 22, 2021

Should be fairly straightforward to implement, but is it useful?

  • Look ahead assertions.
  • Look behind assertions.
@petee-d
Copy link
Contributor

petee-d commented May 6, 2021

I'm not 100% sure we mean the same thing, but it sounds like I have a use case for something like that when writing a parser for Jinja (specifically, implicit test arguments). In the parser, I need to match some complex struct term only if the next token isn't one of certain 3 keywords. Here is a grossly oversimplified example:

type Condition struct {
	True  Expression `@@`
	Cond  Test       `'if' @@ 'else'`
	False Expression `@@`
}

type Test struct {
	Left     Expression  `@@`
	Name     string      `'is' @Ident`
	Argument *Expression `@@?` // Allowed to match only if next token isn't 'else' or a few other keywords
}

type Expression struct {
	// Would be much more complex in reality (operators with 10+ precedence groups, a huge recursive mess)
	Variable string `@Ident`
}

func TestNegativeLookahead(t *testing.T) {
	parser := participle.MustBuild(&Condition{})
	result := &Condition{}
	assert.NoError(t, parser.ParseString("", "one if x is odd else two", result))
	assert.NoError(t, parser.ParseString("", "small if x is below y else large", result))
}

I basically would need some way of modifying the optional Test.Argument (some "tests" may take arguments, some may not, it's unknown to the parser) so that the rule matches only if the next token isn't the keyword else (or 2 other keywords). In real Jinja, they did this with a hand-written recursive descent parser that peeks at the next token and doesn't try to match an expression if it sees one of those keywords. As you can imagine, the real grammar is much more complex (45 structs, ~3400 characters of generated EBNF, much more complex than https://github.com/alecthomas/participle/blob/master/_examples/sql/main.go), so hacks like a separate structure of Expression structs that won't match those keywords for Variable isn't an option.

Do you think this can somehow be done with existing participle grammar options? I've tried !<expr> but that apparently can't be used with a struct (naively, I tried something like @!('else'|'and'|'or')@, to no avail), plus I guess it would match only one token anyway, the expression could in reality be any number of tokens. Any hacks that wouldn't be too nasty? Or do you think participle could be extended somehow to support this use case? Some kind of negative lookahead?

If you look into this, thank you very much in advance! This seems to be about the only thing that my Jinja parser can't handle yet.

@petee-d
Copy link
Contributor

petee-d commented May 7, 2021

@alecthomas is this something you would be able to look at any time soon? Please let me know, as I could give it a try myself, but would like to know what kind of a syntax you would like for the negative lookahead. I think the ideal choice would be the negative lookahead syntax in https://pythonhosted.org/pyrser/dsl.html, so for example:

	Argument *Expression `( !('else'|'and'|'or')  @@ )?` // Match Expression only if the next token isn't else/and/or

I.e. within a sequence, any expression prefixed by ! would be treated as negative lookahead and the sequence will fail before matching anything else if the next token matches the rule, but the negative lookahead itself will not consume any tokens. This is of course in conflict with the existing negation syntax, which behaves the same except it does consume a token. I think it's somewhat unfortunate as with the behavior I'm suggesting the existing use case for negation

	EverythingUntilSemicolon *[]string `@!';'* @';'`

could be written as

	EverythingUntilSemicolon *[]string `@(!';' Ident)* @';'` // Ident could be another token or union of tokens, of course

But since breaking backwards compatibility would suck, I think we could do something like this for my first example:

	Argument *Expression `( !?('else'|'and'|'or')  @@ )?` // Match Expression only if the next token isn't else/and/or

I.e. ! followed by ? will tell the negation not to consume any token. Since it would be using the existing negation code and just work with a boolean flag, I think it wouldn't even be that hard to implement.

What do you think? Do you have another idea for what the syntax should be? Would you like to do it yourself, or should I take a crack at it?

@petee-d
Copy link
Contributor

petee-d commented May 7, 2021

Was kind of pressing me, so I just did it. :) Chose !?<expr> as it's kind of similar to the (?!<expr>) syntax of negative lookahead in PERL-like regular expressions. Changes to the syntax are of course welcome, I just need the functionality.

@alecthomas
Copy link
Owner Author

Mmm actually I did merge it, but I think I'd prefer it to be ?! to match regex. Would you mind updating it to that?

@alecthomas
Copy link
Owner Author

I think, regarding syntax, that it would be preferable to emulate Perl regex syntax. It should be part of the grouping construct, which will probably make this a bit more of a complex change than your one, but will be more consistent.

@alecthomas
Copy link
Owner Author

If you'd like to take a crack at it, I would be most appreciative.

@petee-d
Copy link
Contributor

petee-d commented May 8, 2021

Could try. How about I still use &negation{consume: false, node: ...} in the parse tree, just have an if branch in parseGroup that creates it? Would still be quite a simple change and work correctly, but would help to avoid duplicating a lot of negation.Parse code. The only problem is it would potentially be a bit weird parseGroup can create negation.

@petee-d
Copy link
Contributor

petee-d commented May 8, 2021

Actually, just tried it and it wasn't much good. I'll try something a bit different... :)

@petee-d
Copy link
Contributor

petee-d commented May 8, 2021

I changed it up by separating the code from negation completely (as negation had some unnecessary restrictions anyway) and took the opportunity to also implement positive lookahead with the PERL-like (?= ...) syntax. 🙂
#145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants