Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: \K #151

Closed
mrabarnett opened this issue Sep 9, 2015 · 4 comments
Closed

Request: \K #151

mrabarnett opened this issue Sep 9, 2015 · 4 comments
Labels
enhancement New feature or request minor

Comments

@mrabarnett
Copy link
Owner

Original report by boolbag NA (Bitbucket: boolbag, GitHub: boolbag).


Hi Matthew,
Thank you as always for the terrific engine.
In my view it's one of the very best engines out there.

There are three missing features that have been "talking to me" for a while, and I thought I'd put in some requests. I'm sure you've considered them before, but I'd like to put forward a case for each of them.

In this thread I'll focus on \K.

I realize that \K was originally intended as a workaround for the lack of infinite lookbehind.
Nevertheless, it is an extremely clean and expressive token.

Without \K, you either have to use a lookbehind or capturing groups.
Not a problem, but within long expressions, \K gives you a clean "drop everything matched so far".

Also, I often have to translate many expressions from PCRE to Python. When the PCRE expressions are rich with \K, the absence of \K in regex is a real speed bump.

Thanks in advance for considering it again.

@mrabarnett
Copy link
Owner Author

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


As far as I can tell, it would shorten group 0 (the entire match), but not any capture group:

#!python

>>> m = regex.search(r'(abc\Kde)', 'abcde')
>>> m[0]
'de'
>>> m[1]
'abcde'

Therefore, it should also affect the span (start and end position) for group 0, but no other groups.

Is that correct?

@mrabarnett
Copy link
Owner Author

Original comment by boolbag NA (Bitbucket: boolbag, GitHub: boolbag).


Hi Matthew,

Yes, that's exactly right.

Also note that it's not a magic token: it can appear multiple times. For instance,
abc\Kde|fg\Khij matches de in abcde or hij in fghij

In PCRE, a\Kbc\Kde is legal. This has no point, but I guess the idea is that the token can be dropped anywhere.

You can have it on a single side of an alternation, for instance ab(?:\Kde|fg)
etc.

I know you had EditPadPro at some stage because I recall seeing you on the forum. For testing purposes Jan has a good implementation in EPP and RegexBuddy, except for a minor bug that he plans to fix in the next release (one of the most recent threads on the RB forum).

Regards

@mrabarnett
Copy link
Owner Author

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


Added in regex 2015.09.14.

@mrabarnett
Copy link
Owner Author

Original comment by boolbag NA (Bitbucket: boolbag, GitHub: boolbag).


Absolutely fantastic. Thank you so much for this time-saver.

An example for anyone interested in seeing it at work: everything to the left of \K (including the start=> marker) is dropped.

#!python


import regex as mrab
>>> bsk = mrab.compile(r'start=>\K.*')
>>> print(bsk.search('boring stuff start=>interesting stuff'))
<regex.Match object; span=(20, 37), match='interesting stuff'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor
Projects
None yet
Development

No branches or pull requests

1 participant