Skip to content

Commit

Permalink
printer: fix --count-matches output
Browse files Browse the repository at this point in the history
In order to implement --count-matches, we simply re-execute the regex on
the spans reported by the searcher. The spans always correspond to the
lines that participated in the match. This is the correct thing to do,
except when the regex contains look-ahead (or look-behind).

In particular, the look-around permits the regex's match success to
depends on an arbitrary point before or after the lines actually
reported as participating in the match. Since only the matched lines are
reported to the printer, it is possible for subsequent searching on
those lines to fail.

A true fix for this would somehow make the total span available to the
printer. But that seems tricky since it isn't always available. For
PCRE2's case in multiline mode, it is available because we force it to
be so for correctness.

For now, we simply detect this corner case heuristically. If the match
count is zero, then it necessarily means there is some kind of
look-around that isn't matching. So we set the match count to 1. This is
probably incorrect in some cases, although my brain can't quite come up
with a concrete example. Nevertheless, this is strictly better than the
status quo.

Fixes #1573
  • Loading branch information
BurntSushi committed May 9, 2020
1 parent a2e6aec commit 7ed9a31
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Bug fixes:

* [BUG #1537](https://github.com/BurntSushi/ripgrep/issues/1537):
Fix match bug caused by inner literal optimization.
* [BUG #1573](https://github.com/BurntSushi/ripgrep/issues/1573):
Fix incorrect `--count-matches` output when using look-around.


12.0.1 (2020-03-29)
Expand Down
13 changes: 13 additions & 0 deletions crates/printer/src/summary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -591,6 +591,19 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
true
})
.map_err(io::Error::error_message)?;
if match_count == 0 {
// It is possible for the match count to be zero when
// look-around is used. Since `SinkMatch` won't necessarily
// contain the look-around in its match span, the search here
// could fail to find anything.
//
// It seems likely that setting match_count=1 here is probably
// wrong in some cases, but I don't think we can do any
// better. (Because this printer cannot assume that subsequent
// contents have been loaded into memory, so we have no way of
// increasing the search span here.)
match_count = 1;
}
stats.add_matches(match_count);
stats.add_matched_lines(mat.lines().count() as u64);
} else if self.summary.config.kind.quit_early() {
Expand Down
42 changes: 42 additions & 0 deletions tests/regression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -822,3 +822,45 @@ foo: TaskID int `json:\"taskID\"`
";
eqnice!(expected, cmd.arg("TaskID +int").stdout());
});

// See: https://github.com/BurntSushi/ripgrep/issues/1573
//
// Tests that if look-ahead is used, then --count-matches is correct.
rgtest!(r1573, |dir: Dir, mut cmd: TestCommand| {
// Only PCRE2 supports look-ahead.
if !dir.is_pcre2() {
return;
}

dir.create_bytes("foo", b"\xFF\xFE\x00\x62");
dir.create(
"foo",
"\
def A;
def B;
use A;
use B;
",
);

// Check that normal --count is correct.
cmd.args(&[
"--pcre2",
"--multiline",
"--count",
r"(?s)def (\w+);(?=.*use \w+)",
"foo",
]);
eqnice!("2\n", cmd.stdout());

// Now check --count-matches.
let mut cmd = dir.command();
cmd.args(&[
"--pcre2",
"--multiline",
"--count-matches",
r"(?s)def (\w+);(?=.*use \w+)",
"foo",
]);
eqnice!("2\n", cmd.stdout());
});

0 comments on commit 7ed9a31

Please sign in to comment.