-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redirecting ripgrep to a file within the search directory will cause rg to also search that file #286
Comments
I think we can probably fix this by getting the file descriptor of where stdout is being redirected and then make sure we don't search any file with that same descriptor. |
On Windows you'd solve this by getting the ID of the file. Call |
@retep998 There's some subtle additional details. From MSDN:
AFAIK, the only way to get a guarantee that the file index numbers aren't reused is if you keep the file handle open. Which I guess is fine in this case, since we only need to keep one handle (stdout) open for the lifetime of the process. |
I guess this applies similarly on Linux too. You can't rely on just the inode number, since your directory might span multiple mount points. |
There's enough subtlety here that I'm going to push this logic into a new independent crate, and then use that inside of both ripgrep and |
When running ripgrep like this: rg foo > output we must be careful not to search `output` since ripgrep is actively writing to it. Searching it can cause massive blowups where the file grows without bound. While this is conceptually easy to fix (check the inode of the redirection and the inode of the file you're about to search), there are a few problems with it. First, inodes are a Unix thing, so we need a Windows specific solution to this as well. To resolve this concern, I created a new crate, `same-file`, which provides a cross platform abstraction. Second, stat'ing every file is costly. This is not avoidable on Windows, but on Unix, we can get the inode number directly from directory traversal. However, this information wasn't exposed, but now it is (through both the ignore and walkdir crates). Fixes #286
When running ripgrep like this: rg foo > output we must be careful not to search `output` since ripgrep is actively writing to it. Searching it can cause massive blowups where the file grows without bound. While this is conceptually easy to fix (check the inode of the redirection and the inode of the file you're about to search), there are a few problems with it. First, inodes are a Unix thing, so we need a Windows specific solution to this as well. To resolve this concern, I created a new crate, `same-file`, which provides a cross platform abstraction. Second, stat'ing every file is costly. This is not avoidable on Windows, but on Unix, we can get the inode number directly from directory traversal. However, this information wasn't exposed, but now it is (through both the ignore and walkdir crates). Fixes #286
This leads to ripgrep searching its own output, taking a lot of time and eating up disk space. Here's a minimal example:
Then, running
rg -e "hello"
will output the following:When doing the same with grep instead, we get the expected output (and switching hello with yellow)
The more searches that ripgrep reports, the bigger the file blow up is (I did this on a ripgrep search that had 17,000 hits...). Currently, this can be worked around just by piping to a file outside the directory being searched, or piping into a pager instead.
The text was updated successfully, but these errors were encountered: