-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set qualifiers - feature idea #11
Comments
Original comment by Anonymous. Thinking about this a bit more, it would be more appropriate to use something like " |
Original comment by Anonymous. Could you provide me with some test data so that I can see what's needed, how it would be used, try some experiments, and see whether 'feels' right, whether it's the right approach? |
Original comment by Anonymous. Sure. Here's one I've been trying on CPython 2.6 on 64-bit Ubuntu (works), CPython 2.7 on 64-bit Windows (OverflowError), and IronPython 2.7 on 64-bit .NET (StackOverflowError). |
Original comment by Anonymous. Named lists have been added (provisionally). |
Original comment by Anonymous. I downloaded the PyPI version, built and installed it on Python 2.5.1, and tried it:
Does that seem right to you?
And that should have matched, right? |
Original comment by Anonymous. It was passing "y#" for bytestrings, which is Python 3. Fixed. |
Original comment by Anonymous. Ah, OK. I re-downloaded from PyPI, now it's working. But here's another issue:
|
Original comment by Anonymous. Fixed. |
Original comment by Anonymous. I've updated my test case to add some larger regular expressions. |
Original comment by Anonymous. I just tested this enhancement (cf.: http://mail.python.org/pipermail/python-list/2011-June/1274529.html ) and would like to ask about the treatment of metacharacters in the items of the options set; I somehow implied from the overview text, they would be escaped, but they appear to be discarded completely, cf.:
I believed, the first pattern shouldn't match if escaped (and cause an error if taken unchanged); the second one would match with escaping; or am I missing something? regards, |
Original comment by Anonymous. You're not missing anything. They should match as you say. But I'm seeing a different result (Ubuntu 10 with Python 2.6):
|
Original comment by Anonymous. This is an interesting one. If the pattern is known, it fetches from the cache of already-compiled regexes, but the set of strings is different. Should it treat the set as part of the pattern and recompile, much as it does with flags? |
Original comment by Anonymous. Fixed. The regex will be recompiled. |
Original comment by Anonymous. Yes, I think that's the right call. The named keyword argument is local to the particular compile() or search() or findall() call. Different calls may use the same keyword name for different values. |
Original comment by Anonymous. Sorry for the delayed reaction (I somehow believed, I would be notified on further comments after my post). |
Original report by Anonymous.
Some background: I've been working with very large REs in CPython and IronPython. We generate the RE pattern from lists, like lists of cities or lists of names, somewhat like this:
The one I'm working with now is just a pattern for finding substrings that look like the name of a person. It's overflowing the System::Text::RegularExpressions buffers on IronPython, but works OK with CPython 2.6 on 64-bit Ubuntu.
One of the things I've been thinking is that this kind of pattern should be handled differently. Suppose there was some syntax like
where (?S indicates a named ImmutableSet, the members of that set to be drawn from the keyword argument of that name. The compiler would generate a reasonably fast pattern from that set, say the union of all characters in all the strings in the set, and a max and min size based on the min-lengthed and max-lengthed elements of the set. When the engine runs, it would match that fast pattern, and if it matches, it would then check to see if the matched group is a member of the named set. If so, the match would be confirmed; if not, it would fail.
Seems like this might be a useful feature for regex to have, given the popularity of this kind of machine-generated RE.
The text was updated successfully, but these errors were encountered: