Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str} #15

backus · 2016-04-20T07:14:41Z

Example:

2.3.0 :001 > /\#{str}/
 => /\#{str}/
2.3.0 :003 > require 'regexp_parser'
 => true
2.3.0 :004 > Regexp::Parser.parse('\#{str}')
Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str}
  from /Users/johnbackus/.rvm/gems/ruby-2.3.0/gems/regexp_parser-0.3.2/lib/regexp_parser/scanner.rb:1698:in `scan'
  from /Users/johnbackus/.rvm/gems/ruby-2.3.0/gems/regexp_parser-0.3.2/lib/regexp_parser/lexer.rb:20:in `lex'
  from /Users/johnbackus/.rvm/gems/ruby-2.3.0/gems/regexp_parser-0.3.2/lib/regexp_parser/parser.rb:26:in `parse'
  from (irb):4
  from /Users/johnbackus/.rvm/rubies/ruby-2.3.0/bin/irb:11:in `<main>'

I don't understand yet what the source of this issue is

jaynetics · 2016-04-20T08:05:17Z

I'm also interested in this, and some issues which are probably related:

Regexp::Parser.parse('{str}') # => PrematureEndError
Regexp::Parser.parse('\#{}') # => NoMethodError
Regexp::Parser.parse('{}') # => NoMethodError

I guess the parser is trying to find a full interval quantifier whenever it encounters a { that is not preceded by a (non-escaped) #.

backus · 2016-04-20T17:41:57Z

@Janosch-x are you sure you get a NoMethodError for your third example? I get the following:

Regexp::Parser.parse(/{}/) # => ArgumentError: No valid target found for '{}' quantifier

backus · 2016-04-20T17:47:22Z

Also I think the rough explanation for the problem is this:

The parser assumes that when it encounters the { token it is going to parse either {\d} or {\d,\d}. These would be the patterns for forming a valid quantifier.

The issue is that Ruby seems to default to just matching on plain text if the quantifier doesn't make sense. For example, the regex /\Aa{a}\z/ doesn't make sense if you view the {} as a quantifier so ruby just doesn't parse it as an quantifier:

2.3.0 :036 > regex = /\Aa{a}\z/
 => /\Aa{a}\z/
2.3.0 :037 > regex =~ 'a'
 => nil
2.3.0 :038 > regex =~ 'a{a}'
 => 0
2.3.0 :039 > Regexp::Parser.parse(regex)
Regexp::Scanner::PrematureEndError: Premature end of pattern at a{a}\z

Unfortunately this doesn't seem to be part of the grammar for this gem.

Fixes ammar#15

ammar · 2016-04-21T12:04:02Z

I haven't investigated this in depth yet, but it looks like the treatment of a { and } as literals in certain cases is an implementation quirk of the regex engine. In other words, it's not a documented feature.

Ruby's documentation explicitly states that meta characters must be backslash-escaped when they are used as literals:

The following are metacharacters (, ), [, ], {, }, ., ?, +, *. They have a specific
meaning when appearing in a pattern. To match them literally they must be
backslash-escaped.

Unless there is documentation that details the cases in which unescaped meta characters will be treated as literals, then I think it is safe to consider this behavior an implementation quirk.

I consider using and counting on such quirks to be bad practice, and I prefer not to make the parser accommodate their use.

Despite that, I understand the impact of such issues on the suitability of the parser for certain applications. I would like to find a balance between keeping the parser free from supporting quirks and correctly detecting them. Perhaps by adding a validation phase, which runs before the scanner and issues warnings or errors for questionable patterns like this and the one in issue #3 (which I should update with my findings and mark as wont fix).

I would like to dig a little deeper to see how Ruby represents these patterns internally. If it is fixed and predictable, I might reconsider addressing them.

mbj · 2016-04-21T20:23:57Z

@ammar On a specification that is as vague as Ruby:

Ruby is defined by its implementation
Bugs are present
Some bugs are declared features because to many depend on them
Lots of churn in MRI itself
Absence of MRI doing corpus testing

Its not possible to correctly re-implement "Ruby" because the definition of "Ruby" is under steady flux.

Hence I propose to never even try to implement "Ruby" but implement a sane subset, explicitly not supporting stuff that does not make sense outside MRI implementation quirks.

On how to not explicitly support something I heavily recommend raising errors instead of warnings, because this makes it much easier for downstream developers to realize: Okay MRI edge case, do not provide such an input.

mbj · 2016-04-21T20:55:23Z

@ammar AKA Imo: Ideally regexp_parser does one of the following on this case:

Nothing (but document the fact in the readme)
A dedicated exception

backus · 2016-04-21T21:59:45Z

@mbj what does doing "Nothing (but document the fact in the readme)" look like?

mbj · 2016-04-21T22:00:37Z

@backus It raises exceptions now, keep it like this. But document the fact that regexp_parser does not support each MRI quirk.

ammar · 2016-04-24T13:26:50Z

@mbj Thank you for chiming in. Those are very good points about the discrepancies between declared features and the actual implementation.

Regarding what to do, I agree, and think that a PrematureEndError is a fitting exception in this case. I will update the Supported Syntax section of the README to note that regexp_parser does not support MRI's quirks.

backus · 2016-05-09T21:52:08Z

Closing this for now since I think the current error is appropriate

backus added a commit to backus/regexp_parser that referenced this issue Apr 21, 2016

Fix parsing of incomplete quantifier ranges

45dcf31

Fixes ammar#15

backus closed this as completed May 9, 2016

jaynetics mentioned this issue Dec 23, 2018

Fails to parse lone opening brace #59

Closed

This was referenced Jun 2, 2020

Lint/MixedRegexpCaptureTypes and "premature end of pattern" rubocop/rubocop#8083

Closed

Rethink fallbacks for formally incorrect grammar #63

Closed

sonalinavlakhe mentioned this issue Aug 5, 2020

Add new Lint/OutOfRangeRefInRegexp cop #7755 rubocop/rubocop#8407

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str} #15

Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str} #15

backus commented Apr 20, 2016

jaynetics commented Apr 20, 2016

backus commented Apr 20, 2016

backus commented Apr 20, 2016

ammar commented Apr 21, 2016

mbj commented Apr 21, 2016

mbj commented Apr 21, 2016

backus commented Apr 21, 2016

mbj commented Apr 21, 2016

ammar commented Apr 24, 2016

backus commented May 9, 2016

Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str} #15

Regexp::Scanner::PrematureEndError: Premature end of pattern at #{str} #15

Comments

backus commented Apr 20, 2016

jaynetics commented Apr 20, 2016

backus commented Apr 20, 2016

backus commented Apr 20, 2016

ammar commented Apr 21, 2016

mbj commented Apr 21, 2016

mbj commented Apr 21, 2016

backus commented Apr 21, 2016

mbj commented Apr 21, 2016

ammar commented Apr 24, 2016

backus commented May 9, 2016