-
Notifications
You must be signed in to change notification settings - Fork 444
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
impl: fix prefix literal matching bug
This commit fixes a bug where it was possible to report a match where none existed. Basically, in the current regex crate, it just cannot deal with a mixture of look-around assertions in the prefix of a pattern and prefix literal optimizations. Before 1.8, this was handled by simply refusing to extract literals in that case. But in 1.8, with a rewrite of the literal extractor, literals are now extracted for patterns like this: (?i:(?:\b|_)win(?:32|64|dows)?(?:\b|_)) So in 1.8, since it was still using the old engines that can't deal with this, I added some extra logic to throw away any extracted prefix literals if a look-around assertion occurred in the prefix of the pattern. The problem is that the logic I used was "always occurs in the prefix of the pattern" instead of "may occur in the prefix of the pattern." In the pattern above, it's the latter case. So it slipped by and the regex engine tried to use the prefix literals to accelerat the search. This in turn caused mishandling of the `\b` and led to a false positive match. The specific reason why the current regex engines can't deal with this is because they weren't designed to handle searches that took the surrounding context into account when resolving look-around assertions. It was a pretty big oversight on my part many years ago. The new engines we'll be migrating to Real Soon Now don't have this problem and can deal with the prefix literal optimizations while correctly handling look-around assertions in the prefix. Fixes #981
- Loading branch information
1 parent
93316a3
commit 3f3587a
Showing
4 changed files
with
84 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters