Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str} #15

Closed
backus opened this issue Apr 20, 2016 · 10 comments
Closed

Comments

@backus
Copy link
Contributor

backus commented Apr 20, 2016

Example:

2.3.0 :001 > /\#{str}/
 => /\#{str}/
2.3.0 :003 > require 'regexp_parser'
 => true
2.3.0 :004 > Regexp::Parser.parse('\#{str}')
Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str}
  from /Users/johnbackus/.rvm/gems/ruby-2.3.0/gems/regexp_parser-0.3.2/lib/regexp_parser/scanner.rb:1698:in `scan'
  from /Users/johnbackus/.rvm/gems/ruby-2.3.0/gems/regexp_parser-0.3.2/lib/regexp_parser/lexer.rb:20:in `lex'
  from /Users/johnbackus/.rvm/gems/ruby-2.3.0/gems/regexp_parser-0.3.2/lib/regexp_parser/parser.rb:26:in `parse'
  from (irb):4
  from /Users/johnbackus/.rvm/rubies/ruby-2.3.0/bin/irb:11:in `<main>'

I don't understand yet what the source of this issue is

@jaynetics
Copy link
Collaborator

I'm also interested in this, and some issues which are probably related:

Regexp::Parser.parse('{str}') # => PrematureEndError
Regexp::Parser.parse('\#{}') # => NoMethodError
Regexp::Parser.parse('{}') # => NoMethodError

I guess the parser is trying to find a full interval quantifier whenever it encounters a { that is not preceded by a (non-escaped) #.

@backus
Copy link
Contributor Author

backus commented Apr 20, 2016

@Janosch-x are you sure you get a NoMethodError for your third example? I get the following:

Regexp::Parser.parse(/{}/) # => ArgumentError: No valid target found for '{}' quantifier

@backus
Copy link
Contributor Author

backus commented Apr 20, 2016

Also I think the rough explanation for the problem is this:

The parser assumes that when it encounters the { token it is going to parse either {\d} or {\d,\d}. These would be the patterns for forming a valid quantifier.

The issue is that Ruby seems to default to just matching on plain text if the quantifier doesn't make sense. For example, the regex /\Aa{a}\z/ doesn't make sense if you view the {} as a quantifier so ruby just doesn't parse it as an quantifier:

2.3.0 :036 > regex = /\Aa{a}\z/
 => /\Aa{a}\z/
2.3.0 :037 > regex =~ 'a'
 => nil
2.3.0 :038 > regex =~ 'a{a}'
 => 0
2.3.0 :039 > Regexp::Parser.parse(regex)
Regexp::Scanner::PrematureEndError: Premature end of pattern at a{a}\z

Unfortunately this doesn't seem to be part of the grammar for this gem.

backus added a commit to backus/regexp_parser that referenced this issue Apr 21, 2016
@ammar
Copy link
Owner

ammar commented Apr 21, 2016

I haven't investigated this in depth yet, but it looks like the treatment of a { and } as literals in certain cases is an implementation quirk of the regex engine. In other words, it's not a documented feature.

Ruby's documentation explicitly states that meta characters must be backslash-escaped when they are used as literals:

The following are metacharacters (, ), [, ], {, }, ., ?, +, *. They have a specific
meaning when appearing in a pattern. To match them literally they must be
backslash-escaped.

Unless there is documentation that details the cases in which unescaped meta characters will be treated as literals, then I think it is safe to consider this behavior an implementation quirk.

I consider using and counting on such quirks to be bad practice, and I prefer not to make the parser accommodate their use.

Despite that, I understand the impact of such issues on the suitability of the parser for certain applications. I would like to find a balance between keeping the parser free from supporting quirks and correctly detecting them. Perhaps by adding a validation phase, which runs before the scanner and issues warnings or errors for questionable patterns like this and the one in issue #3 (which I should update with my findings and mark as wont fix).

I would like to dig a little deeper to see how Ruby represents these patterns internally. If it is fixed and predictable, I might reconsider addressing them.

@mbj
Copy link

mbj commented Apr 21, 2016

@ammar On a specification that is as vague as Ruby:

  • Ruby is defined by its implementation
  • Bugs are present
  • Some bugs are declared features because to many depend on them
  • Lots of churn in MRI itself
  • Absence of MRI doing corpus testing

Its not possible to correctly re-implement "Ruby" because the definition of "Ruby" is under steady flux.

Hence I propose to never even try to implement "Ruby" but implement a sane subset, explicitly not supporting stuff that does not make sense outside MRI implementation quirks.

On how to not explicitly support something I heavily recommend raising errors instead of warnings, because this makes it much easier for downstream developers to realize: Okay MRI edge case, do not provide such an input.

@mbj
Copy link

mbj commented Apr 21, 2016

@ammar AKA Imo: Ideally regexp_parser does one of the following on this case:

  • Nothing (but document the fact in the readme)
  • A dedicated exception

@backus
Copy link
Contributor Author

backus commented Apr 21, 2016

@mbj what does doing "Nothing (but document the fact in the readme)" look like?

@mbj
Copy link

mbj commented Apr 21, 2016

@backus It raises exceptions now, keep it like this. But document the fact that regexp_parser does not support each MRI quirk.

@ammar
Copy link
Owner

ammar commented Apr 24, 2016

@mbj Thank you for chiming in. Those are very good points about the discrepancies between declared features and the actual implementation.

Regarding what to do, I agree, and think that a PrematureEndError is a fitting exception in this case. I will update the Supported Syntax section of the README to note that regexp_parser does not support MRI's quirks.

@backus
Copy link
Contributor Author

backus commented May 9, 2016

Closing this for now since I think the current error is appropriate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants