-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow lookarounds in conditionals #163
Comments
Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett). I don't see the point; as far as I can see, it doesn't add anything. The purpose of a conditional expression is to test whether a capture group has matched anything. What would it do that a bare lookaround doesn't already do? Wouldn't |
Original comment by Anonymous. Hello, I posted the proposal as anonymous by accident. The real reason it interests me is to make the whole expression more general for metaprogramming or dynamic generation of regexes. In short, the PCRE behavior doesn't add or detract anything from a manually crafted pattern but it would simplify some interesting dynamic techniques, especially in a language like Python that has great metaprogramming capabilities. Regards |
Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett). I've realised that they're not the same. With a bare lookaround, if it chooses the first branch and subsequently fails, it'll backtrack and try the second branch. With a lookaround in a conditional expression, if it chooses the first branch and subsequently fails, it'll backtrack but won't try the second branch. For example, on the string "123abc", |
Original comment by Anonymous. Oh, I feel dumb now. It makes sense the whole conditional has to be skipped, while with alternations it backtracks to another alternate expression. It implements mutual exclusion, not alternation. It turned out that, in my code, the 'then' and 'else' subexpressions were simple and mutually exclusive, I got lucky with my ignorance and it worked (close call!). Any expression of the type (? (test) then | else ) This (unsightly) example:
should be refactored to:
both returning '-3d---8-----9---' Besides making the pattern ugly it can make the match significantly slower since it has to check (complement-of test) every time it has to backtrack. Especially with variable length or complex lookbehinds. For these two reasons I think it is still a valuable enhancement to implement. Actually more than my original motivation since this would be more frequently applicable than some dynamic generation/metaprogramming scenario. |
Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett). It's not only the 'then' part that could trigger backtracking. It could match the 'then' part, progress into the remainder of the pattern, fail, backtrack through the 'then' part, then try the 'else' part. Anyway, it's now on my todo list. |
Original comment by Anonymous. Great!
the 'else' part could be checked after backtracking through the 'then' subexpression? Isn't the whole conditional skipped and have the pattern position pointer after ...\w+) ? I get 123b not a123b in https://regex101.com/#pcre By the way, thank you for your replies and congratulations for the rest of the work you've done to this very good regex package. |
Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett). Added in regex 2015.10.29. |
Original comment by 王珺 (Bitbucket: sulk, GitHub: sulk). As I test the original problem fails:
yields None while you is expected, and python crushes while executing
Regards |
Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett). It's fixed now. |
Original report by Anonymous.
It would be really helpful to allow allow lookarounds in addition to group name/id in conditional expression like in PCRE to allow a regex like this:
regex.findall(r'(?(?<=love\s)you|(?<=hate\s)her)', 'I love you but I don't hate her either. You and her are so different)
The text was updated successfully, but these errors were encountered: