Skip to content

Commit

Permalink
Adjust the end of the haystack after a DFA match.
Browse files Browse the repository at this point in the history
If the caller asks for captures, and the DFA runs, and there's a match,
and there are actually captures in the regex, then the haystack sent to
the NFA is shortened to correspond to only the match plus some room at the
end for matching zero-width assertions. This "room at the end" needs to be
big enough to at least fit an UTF-8 encoded Unicode codepoint.

Fixes #264.
  • Loading branch information
BurntSushi committed Aug 5, 2016
1 parent 225f8e1 commit 1882b2c
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
6 changes: 1 addition & 5 deletions src/exec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -840,11 +840,7 @@ impl<'c> ExecNoSync<'c> {
) -> Option<(usize, usize)> {
// We can't use match_end directly, because we may need to examine
// one "character" after the end of a match for lookahead operators.
let e = if self.ro.nfa.uses_bytes() {
cmp::min(match_end + 1, text.len())
} else {
cmp::min(next_utf8(text, match_end), text.len())
};
let e = cmp::min(next_utf8(text, match_end), text.len());
self.captures_nfa(slots, &text[..e], match_start)
}

Expand Down
4 changes: 4 additions & 0 deletions tests/regression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,7 @@ matiter!(word_boundary_dfa, r"\b", "a b c",

// See: https://github.com/rust-lang-nursery/regex/issues/268
matiter!(partial_anchor, u!(r"^a|b"), "ba", (0, 1));

// See: https://github.com/rust-lang-nursery/regex/issues/264
mat!(ascii_boundary_no_capture, u!(r"(?-u)\B"), "\u{28f3e}", Some((0, 0)));
mat!(ascii_boundary_capture, u!(r"(?-u)(\B)"), "\u{28f3e}", Some((0, 0)));

0 comments on commit 1882b2c

Please sign in to comment.