-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
^$ matches a phantom line at end of file #441
Comments
cc @BatmanAoD |
Thanks for filing this. I agree that it's reasonable to keep the (Also, I know you asked for example use-cases for this, and unfortunately, although I know I've used something like |
@BatmanAoD Right. When I asked that, I hadn't thought of "match empty lines." Once I spent a couple of minutes thinking, I remembered it. That's a strong enough use case as far as I'm concerned. :-) |
(I wouldn't expect someone to care about matching empty lines necessarily, but I could see someone being interested in counting them.) |
I use matching empty lines in conjunction with the --invert-match option to filter out empty lines when piping commands. |
@blankname That actually shouldn't be a problem! You'll "erroneously" filter out a phantom line--which should have no effect. @BurntSushi Counting seems like an interesting application indeed, but as soon as I thought about that I thought "isn't that just a sort of crumby, broken |
I've never heard of, or used, |
|
Here are a pair of regression tests: // See: https://github.com/BurntSushi/ripgrep/issues/441
clean!(regression_441_mmap, "^$", "test", |wd: WorkDir, mut cmd: Command| {
wd.create("test", "a\nb\n\nc\n\n\nd\n");
cmd.arg("--mmap");
let lines: String = wd.stdout(&mut cmd);
assert_eq!(lines, "\n\n\n");
});
// See: https://github.com/BurntSushi/ripgrep/issues/441
clean!(regression_441_no_mmap, "^$", "test", |wd: WorkDir, mut cmd: Command| {
wd.create("test", "a\nb\n\nc\n\n\nd\n");
cmd.arg("--no-mmap");
let lines: String = wd.stdout(&mut cmd);
assert_eq!(lines, "\n\n\n");
}); Interestingly, the output of the |
And also a more precise comparison with grep:
|
This fixes a bug where a "match" color escape was erroneously emitted after the new line character. This is because `^` is actually allowed to match after the end of a trailing new line, which means `^$` matches both before and after the trailing new line when multiline mode is enabled. The trailing match was causing the phantom escape sequence to appear, which we don't want. Incidentally, this is the root cause of #441 as well, although this commit doesn't fix that issue, since the line itself is printed before we detect the phantom match. Fixes #599
I believe this bug should be fixed in libripgrep. |
"Should" meaning "ought to be"? Since the libripgrep issue itself states that libripgrep won't be a proper "thing unto itself", do you have an idea of which specific subcomponent should handle this? |
libripgrep is a complete rewrite of the search components of ripgrep. As I progress through that rewrite, I'm looking at bugs in this space that I can fix. The libripgrep milestone is tracked here: https://github.com/BurntSushi/ripgrep/issues?q=is%3Aopen+is%3Aissue+milestone%3Alibripgrep
This just means that there isn't going to be a singular ripgrep library. There will be many. The |
Unlikely. The fix for #416 is to fix the regex engine.
They are both true. I don't think the distinction matters much. |
This commit updates the CHANGELOG to reflect all the work done to make libripgrep a reality. * Closes #162 (libripgrep) * Closes #176 (multiline search) * Closes #188 (opt-in PCRE2 support) * Closes #244 (JSON output) * Closes #416 (Windows CRLF support) * Closes #917 (trim prefix whitespace) * Closes #993 (add --null-data flag) * Closes #997 (--passthru works with --replace) * Fixes #2 (memory maps and context handling work) * Fixes #200 (ripgrep stops when pipe is closed) * Fixes #389 (more intuitive `-w/--word-regexp`) * Fixes #643 (detection of stdin on Windows is better) * Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird) * Fixes #764 (coalesce color escapes) * Fixes #922 (memory maps failing is no big deal) * Fixes #937 (color escapes no longer used for empty matches) * Fixes #940 (--passthru does not impact exit status) * Fixes #1013 (show runtime CPU features in --version output)
This commit updates the CHANGELOG to reflect all the work done to make libripgrep a reality. * Closes #162 (libripgrep) * Closes #176 (multiline search) * Closes #188 (opt-in PCRE2 support) * Closes #244 (JSON output) * Closes #416 (Windows CRLF support) * Closes #917 (trim prefix whitespace) * Closes #993 (add --null-data flag) * Closes #997 (--passthru works with --replace) * Fixes #2 (memory maps and context handling work) * Fixes #200 (ripgrep stops when pipe is closed) * Fixes #389 (more intuitive `-w/--word-regexp`) * Fixes #643 (detection of stdin on Windows is better) * Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird) * Fixes #764 (coalesce color escapes) * Fixes #922 (memory maps failing is no big deal) * Fixes #937 (color escapes no longer used for empty matches) * Fixes #940 (--passthru does not impact exit status) * Fixes #1013 (show runtime CPU features in --version output)
This is simple to demonstrate. Note the behavior difference between GNU grep and ripgrep:
The
^$
pattern is a useful idiom for detecting empty lines in a file. ripgrep's behavior in particular outputs a line numbered 8, even though there are only 7 lines in the file:More generally, ripgrep should never emit a line with a number greater than the number of lines in the file, even if the file ends with a trailing
\n
.See also #416 and rust-lang/regex#355 (comment) for the root cause
The text was updated successfully, but these errors were encountered: