-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str} #15
Comments
I'm also interested in this, and some issues which are probably related:
I guess the parser is trying to find a full interval quantifier whenever it encounters a |
@Janosch-x are you sure you get a Regexp::Parser.parse(/{}/) # => ArgumentError: No valid target found for '{}' quantifier |
Also I think the rough explanation for the problem is this: The parser assumes that when it encounters the The issue is that Ruby seems to default to just matching on plain text if the quantifier doesn't make sense. For example, the regex 2.3.0 :036 > regex = /\Aa{a}\z/
=> /\Aa{a}\z/
2.3.0 :037 > regex =~ 'a'
=> nil
2.3.0 :038 > regex =~ 'a{a}'
=> 0
2.3.0 :039 > Regexp::Parser.parse(regex)
Regexp::Scanner::PrematureEndError: Premature end of pattern at a{a}\z Unfortunately this doesn't seem to be part of the grammar for this gem. |
I haven't investigated this in depth yet, but it looks like the treatment of a { and } as literals in certain cases is an implementation quirk of the regex engine. In other words, it's not a documented feature. Ruby's documentation explicitly states that meta characters must be backslash-escaped when they are used as literals:
Unless there is documentation that details the cases in which unescaped meta characters will be treated as literals, then I think it is safe to consider this behavior an implementation quirk. I consider using and counting on such quirks to be bad practice, and I prefer not to make the parser accommodate their use. Despite that, I understand the impact of such issues on the suitability of the parser for certain applications. I would like to find a balance between keeping the parser free from supporting quirks and correctly detecting them. Perhaps by adding a validation phase, which runs before the scanner and issues warnings or errors for questionable patterns like this and the one in issue #3 (which I should update with my findings and mark as I would like to dig a little deeper to see how Ruby represents these patterns internally. If it is fixed and predictable, I might reconsider addressing them. |
@ammar On a specification that is as vague as Ruby:
Its not possible to correctly re-implement "Ruby" because the definition of "Ruby" is under steady flux. Hence I propose to never even try to implement "Ruby" but implement a sane subset, explicitly not supporting stuff that does not make sense outside MRI implementation quirks. On how to not explicitly support something I heavily recommend raising errors instead of warnings, because this makes it much easier for downstream developers to realize: Okay MRI edge case, do not provide such an input. |
@ammar AKA Imo: Ideally
|
@mbj what does doing "Nothing (but document the fact in the readme)" look like? |
@backus It raises exceptions now, keep it like this. But document the fact that |
@mbj Thank you for chiming in. Those are very good points about the discrepancies between declared features and the actual implementation. Regarding what to do, I agree, and think that a |
Closing this for now since I think the current error is appropriate |
Example:
I don't understand yet what the source of this issue is
The text was updated successfully, but these errors were encountered: