Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to query minimum and maxium length of regular expression #112386

Open
MegaIng opened this issue Nov 24, 2023 · 2 comments
Open

Ability to query minimum and maxium length of regular expression #112386

MegaIng opened this issue Nov 24, 2023 · 2 comments
Labels
stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement

Comments

@MegaIng
Copy link

MegaIng commented Nov 24, 2023

Feature or enhancement

Proposal:

For the lark parsing library we currently use the private re._parser module, as noticed when reorganizing the relevant libraries in #91308. The only information we need is the minimum and maximum width of a match a pattern can have.

My suggestion is to add relevant attributes/properties to the Pattern class, for example as with the names min_width and max_width. max_width could be either None or MAXREPEAT (the constant from re._constants/_sre) when the pattern could match an (essentially) unlimited amount of text.

pattern = re.compile(r"abc?d?e")
assert (pattern.min_width, pattern.max_width) == (3, 5)

pattern = re.compile(r"(a*b+){2, 5}")

assert (pattern.min_width, pattern.max_width) == (2, None)

As an alternative, the re._* modules could be made a public and stable API, although this doesn't appear to be a well liked option from my reading of the above linked PR. I would like this, primarily for implementing custom regex analyzers (there a few such users of the re._parser module out there), but I think this would have to be a PEP.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

I don't think this is a major enough feature to require widespread discussion. I requested a similar feature in the third party regex library. Preferably ofcourse both would have the same interface.

@MegaIng MegaIng added the type-feature A feature request or enhancement label Nov 24, 2023
@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 28, 2023
@TheCob11
Copy link

Shouldn't the bounds for the second example be (2, None)/(2, MAXREPEAT) since bb would match?

@MegaIng
Copy link
Author

MegaIng commented Dec 18, 2023

Yes, I don't know what my thought process there was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants