Exits immediately without warning if it encounters a NUL byte inside the file to be searched, might exit with wrong exit code depending on the position of the match #1227
Labels
duplicate
An issue that is duplicative of another.
What version of ripgrep are you using?
How did you install ripgrep?
ripgrep
package version 0.10.0-2 from Debian Buster (current "Testing", soon to be Debian 10):What operating system are you using ripgrep on?
Describe your question, feature request, or bug.
with several GB of log files via STDIN (actually
xzcat
outout),rg
as well asrg -F
immediately exited without any output whilefgrep
found many hits until it issued the warningBinary file (standard input) matches
.If this is a bug, what are the steps to reproduce the behavior?
Consider the following example (based on this file):
In the third example with "rg a", rg neither crashed nor issued a warning. fgrep in comparison issued a warning.
While the above example might be close to what fgrep does, just without the warning, the following example is even worse:
So fgrep properly indicates with the exit code if there was a hit even though it didn't output anything besides the warning about binary junk.
But even though the hit would have been before the NUL byte,
rg
claims (via exit code) that there is no hit inside the STDIN despiterg -a
says otherwise (via output and exit code).cat aeh.txt | strace rg ä
shows that it exits rather quickly after having read the NUL byte:Constraints to trigger the issue: data must contain a NUL byte and neither of the options
-a
and--text
must be set. On larger files (gigabytes) it is obvious that rg exits preliminarily if the NUL byte is close to the beginning solely because of how quick the command exits. We actually discovered the issue that way: Searching through syslog files which contained anything remote syslog servers sent to our server, including garbled binary junk. rg exited way too quickly and without any output at all, especially in comparison to fgrep.Impact: Does not indicate that there were hits and preliminarily exits without further notice, hence can yield wrong results (exit code as well as output) without any indication of there being an issue with binary junk.
Workaround: always use option
-a
or--text
when contents might contain binary junk instead of relying on warnings fromrg
as fgrep does.If this is a bug, what is the actual behavior?
If this is a bug, what is the expected behavior?
Behaving as grep/egrep/fgrep: Informing the user about binary junk in the input instead of silently exiting as if no match was found or all matches were found despite this wasn't true.
Notes
rg
.-F
and--no-encoding
make a difference in this case, but they don't.nul
andnull
in the list of open issues) I stumbled upon find a way to surface binary file detection better #306 which seems to declare the observed behaviour as feature named "skipping binary files". But this is IMHO definitely the wrong approach as you can't guess that in advance when e.g. getting gigabytes fed on STDIN just to discover that at some point there's binary junk inmidst already grepped text. So IMHO you should just behave here like grep/egrep/fgrep does and report the fact that you've found a hit in a binary file instead of silently exiting. (It might be ok-ish if you just skip such files when doing recursive greps, but e.g. not if that files was explicitly listed as parameter or data comes from STDIN.)The text was updated successfully, but these errors were encountered: