Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdin is not searched if it is a directory #2906

Closed
1 task done
lolbinarycat opened this issue Sep 30, 2024 · 4 comments
Closed
1 task done

stdin is not searched if it is a directory #2906

lolbinarycat opened this issue Sep 30, 2024 · 4 comments
Labels
doc An issue with or an improvement to documentation.

Comments

@lolbinarycat
Copy link

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 13.0.0
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)

How did you install ripgrep?

nix-inst ripgrep

https://codeberg.org/binarycat/nix-inst

What operating system are you using ripgrep on?

NixOS 24.05 (Uakari)

Describe your bug.

if stdin is a directory, ripgrep acts like it is a terminal

What are the steps to reproduce the behavior?

rg a < some_dir

What is the actual behavior?

if stdin is a directory, ripgrep acts like it is a terminal

What is the expected behavior?

ripgrep recurses into stdin

yes this is kinda a weird request, but it's useful for quick scripts that wrap rg, with no way to pass file arguments to rg.

filing this as a bug because although i can't find any docs that say "ripgrep searches stdin unless it is a terminal", they probably should exist.

@lolbinarycat
Copy link
Author

if this is too niche to spend your time fixing, i am a rust developer and would be willing to fix this.

@BurntSushi BurntSushi added the doc An issue with or an improvement to documentation. label Sep 30, 2024
@BurntSushi
Copy link
Owner

I don't understand the use case. You mention quick wrapper scripts, but I don't really get it. I write lots of wrapper scripts, and it's easy to forward arguments from the wrapper script to a command run by that wrapper script.

You don't mention the expected behavior here. I guess you're expecting ripgrep to search it as if it were just a directory provided as an argument? Do other Unix commands do this? I've never seen this usage pattern before.

The way ripgrep works is that it looks for something resembling a file on stdin to search. And if it can't find a file (which it can't in this case), it searches the current working directory. This is actually a heuristic and this heuristic has caused lots of problems because it's common for things to pass something that looks like a file on stdin even when it's not really intended. So I'm very hesitant to change that heuristic. And your suggesting here would add a third case to that heuristic.

I think at best the docs can be improved. The man page already mentions this behavior, although I suppose it could be slightly more precise to cover this case. But I'm not even sure this is worth doing because this seems so idiosyncratic to me. Maybe you can elaborate more.

if this is too niche to spend your time fixing, i am a rust developer and would be willing to fix this.

Note that it's not just the initial effort here. It's also the ongoing maintenance burden that stuff like this adds. Especially to what is already a finicky heuristic.

@lolbinarycat
Copy link
Author

I don't understand the use case. You mention quick wrapper scripts, but I don't really get it. I write lots of wrapper scripts, and it's easy to forward arguments from the wrapper script to a command run by that wrapper script.

yes, unless all those arguments are already used for something (eg. constructing the regex to search with)

You don't mention the expected behavior here. I guess you're expecting ripgrep to search it as if it were just a directory provided as an argument?

yes, that's what "recurses into" means

Do other Unix commands do this? I've never seen this usage pattern before.

unix commands don't typically change behavior based on the type of stdin. they sometime do for stdout, but usually just for terminal colors or automatically invoking a pager.

usually they just crash if any stdio is a directory, not silently ignore it.

The way ripgrep works is that it looks for something resembling a file on stdin to search. And if it can't find a file (which it can't in this case), it searches the current working directory. This is actually a heuristic and this heuristic has caused lots of problems because it's common for things to pass something that looks like a file on stdin even when it's not really intended. So I'm very hesitant to change that heuristic. And your suggesting here would add a third case to that heuristic.

I was under the impression it just checked if stdin was a pty, and if not, treats it like it was passed as an argument.

@BurntSushi
Copy link
Owner

yes, that's what "recurses into" means

Ah sorry I missed this.

unix commands don't typically change behavior based on the type of stdin. they sometime do for stdout, but usually just for terminal colors or automatically invoking a pager.

usually they just crash if any stdio is a directory, not silently ignore it.

ripgrep isn't without precedent here, but yes, one of the downsides of messing around with stdin like this is that there are indeed some cases where ripgrep will ignore what's on stdin and search the current working directory. This is why in ripgrep 14, I introduced some logging messages (that appear when you pass --debug) that give a little more insight into what's happening.

I was under the impression it just checked if stdin was a pty, and if not, treats it like it was passed as an argument.

No, it's more complicated than that. Whether it's a pty or not is a minimum criteria:

pub fn is_readable_stdin() -> bool {
use std::io::IsTerminal;
#[cfg(unix)]
fn imp() -> bool {
use std::{
fs::File,
os::{fd::AsFd, unix::fs::FileTypeExt},
};
let stdin = std::io::stdin();
let fd = match stdin.as_fd().try_clone_to_owned() {
Ok(fd) => fd,
Err(err) => {
log::debug!(
"for heuristic stdin detection on Unix, \
could not clone stdin file descriptor \
(thus assuming stdin is not readable): {err}",
);
return false;
}
};
let file = File::from(fd);
let md = match file.metadata() {
Ok(md) => md,
Err(err) => {
log::debug!(
"for heuristic stdin detection on Unix, \
could not get file metadata for stdin \
(thus assuming stdin is not readable): {err}",
);
return false;
}
};
let ft = md.file_type();
let is_file = ft.is_file();
let is_fifo = ft.is_fifo();
let is_socket = ft.is_socket();
let is_readable = is_file || is_fifo || is_socket;
log::debug!(
"for heuristic stdin detection on Unix, \
found that \
is_file={is_file}, is_fifo={is_fifo} and is_socket={is_socket}, \
and thus concluded that is_stdin_readable={is_readable}",
);
is_readable
}
#[cfg(windows)]
fn imp() -> bool {
let stdin = winapi_util::HandleRef::stdin();
let typ = match winapi_util::file::typ(stdin) {
Ok(typ) => typ,
Err(err) => {
log::debug!(
"for heuristic stdin detection on Windows, \
could not get file type of stdin \
(thus assuming stdin is not readable): {err}",
);
return false;
}
};
let is_disk = typ.is_disk();
let is_pipe = typ.is_pipe();
let is_readable = is_disk || is_pipe;
log::debug!(
"for heuristic stdin detection on Windows, \
found that is_disk={is_disk} and is_pipe={is_pipe}, \
and thus concluded that is_stdin_readable={is_readable}",
);
is_readable
}
#[cfg(not(any(unix, windows)))]
fn imp() -> bool {
log::debug!("on non-{{Unix,Windows}}, assuming stdin is not readable");
false
}
!std::io::stdin().is_terminal() && imp()
}

I've pushed a wording change to the man page to cover this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc An issue with or an improvement to documentation.
Projects
None yet
Development

No branches or pull requests

2 participants