Skip to content

Commit

Permalink
Don't union inner literals of repetitions.
Browse files Browse the repository at this point in the history
If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`,
which is wrong. This does prevent us from extracting `foofoofoo` from
`foo{3}`, which is unfortunate, but we miss plenty of other stuff too.
Literal extracting needs a good rethink (all the way down into the regex
engine).

Fixes BurntSushi#93
  • Loading branch information
amsharma91 committed Sep 20, 2016
1 parent e322569 commit 255312f
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 5 deletions.
11 changes: 6 additions & 5 deletions grep/src/literals.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ Note that this implementation is incredibly suspicious. We need something more
principled.
*/
use std::cmp;
use std::iter;

use regex::bytes::Regex;
use syntax::{
Expand Down Expand Up @@ -181,17 +180,19 @@ fn repeat_range_literals<F: FnMut(&Expr, &mut Literals)>(
lits: &mut Literals,
mut f: F,
) {
use syntax::Expr::*;

if min == 0 {
// This is a bit conservative. If `max` is set, then we could
// treat this as a finite set of alternations. For now, we
// just treat it as `e*`.
lits.cut();
} else {
let n = cmp::min(lits.limit_size(), min as usize);
let es = iter::repeat(e.clone()).take(n).collect();
f(&Concat(es), lits);
// We only extract literals from a single repetition, even though
// we could do more. e.g., `a{3}` will have `a` extracted instead of
// `aaa`. The reason is that inner literal extraction can't be unioned
// across repetitions. e.g., extracting `foofoofoo` from `(\w+foo){3}`
// is wrong.
f(e, lits);
if n < min as usize {
lits.cut();
}
Expand Down
9 changes: 9 additions & 0 deletions tests/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,15 @@ clean!(regression_90, "test", ".", |wd: WorkDir, mut cmd: Command| {
assert_eq!(lines, ".foo:test\n");
});

// See: https://github.com/BurntSushi/ripgrep/issues/93
clean!(regression_93, r"(\d{1,3}\.){3}\d{1,3}", ".",
|wd: WorkDir, mut cmd: Command| {
wd.create("foo", "192.168.1.1");

let lines: String = wd.stdout(&mut cmd);
assert_eq!(lines, "foo:192.168.1.1\n");
});

// See: https://github.com/BurntSushi/ripgrep/issues/20
sherlock!(feature_20, "Sherlock", ".", |wd: WorkDir, mut cmd: Command| {
cmd.arg("--no-filename");
Expand Down

0 comments on commit 255312f

Please sign in to comment.