ripgrep: add --pre flag #989

BurntSushi · 2018-07-21T20:19:46Z

The preprocessor flag accepts a command program and executes this
program for every input file that is searched. Instead of searching the
file directly, ripgrep will instead search the stdout contents of the
program.

Closes #978, Closes #981

This PR is based off of @c-blake's work in #981. See this comment for a description of the differences.

cc @okdana

The preprocessor flag accepts a command program and executes this program for every input file that is searched. Instead of searching the file directly, ripgrep will instead search the stdout contents of the program. Closes #978, Closes #981

c-blake · 2018-07-21T21:01:14Z

Replying to the differences comment here. I've been testing this as my primary grep ever since I posted my pull request. I haven't run into many issues.

For ripgrep's stdin you don't really want/need a preprocessor. There's only one of them. You can probably just insert any preprocessing stage you want in the pipeline before ripgrep in that case, but yeah it should be noted.

In my testing the preprocessor could read stdin already. So, I left that attach-y code in but commented out.

There is one gotcha with regard to stderr/preprocessor log message handling - and this goes for the decompressor, too. If the decompressor or preprocessor fills up the stderr pipe buffer to the point where a write blocks then ripgrep hangs forever. On Linux this is about 64KB of writes. That is more of a gotcha with some chatty programs like pdftotext that might spit out gobs of "warnings" but not really fail than for programs like decompressors, but other OSes might have smaller buffers. What I do is just redirect stderr in my preprocessor to keep ripgrep's buffer from ever getting anything. It might also make sense to just never read stderr or send it to oblivion and have preprocessors send them to a file first if user's want to keep them.

I knew that man page formatting was a disaster. I figured it would be easy to fix if you liked this. Thanks!

I think just a long --pre is reasonable. If you wanted, we could also do --input-preproc with -I as a short spelling. Most of the obvious names are already taken like, --[fF]ilter and so on. Like 34 of the 52 characters are taken already. If someone has a script wrapper that turns it on, and then wants to turn it off on their typed command line they would only have to add -I=, but two more letters without a shift key feels about the same. Anyway, just a long option seems fine if the long option is fairly short like --pre. I'm mostly just happy you're cool with this new feature, and I won't have to maintain some patch forever or write my own with some hypothetical libripgrep.

User-specified commands/programs fail to be found/installed all the time. Somebody in Rust-land should do a better job with that someday, but I don't think we need to here. Anyway, I like all your changes. No real pushback from me.

BurntSushi · 2018-07-21T21:25:05Z

@c-blake Excellent, thanks for the feedback! Your note about stderr and deadlock is a good reminder, thank you. I filed #990 for that, and hopefully we can pick a forward course.

Otherwise, I'm glad you're happy with --pre, so let's run with that!

BurntSushi · 2018-07-21T21:28:09Z

In my testing the preprocessor could read stdin already.

This is very weird. The first thing I tried was actually your example script in the --help docs, and it hung when attempting to search a file that was neither a PDF nor compressed. Namely, this caused it to hit the fallback cat case, which just blocked on stdin. This seemed to make sense from my reading of the code as well, since ripgrep didn't have anything on stdin, so reading from it isn't going to get you anywhere, and it certainly won't contain the contents of the file being searched.

c-blake · 2018-07-21T21:30:30Z

Oops. Yeah, you probably want to put the filename after "cat". You don't for some things like gunzip where if you give it a filename it will replace your compressed file with a decompressed version.

c-blake · 2018-07-21T21:43:46Z

Oh, and of course, I would prefer it if we did attach each new input file to the stdin of the preprocessor. I just didn't get that far, but that kind of redundancy (both stdin and a filename) would be better, IMO. I could swear I tested that and it was working, but maybe I'm misremembering. It's been a while.

BurntSushi · 2018-07-21T21:44:42Z

@c-blake This PR does that. :-)

BurntSushi mentioned this pull request Jul 21, 2018

Try to fullfill all the goals of the generic decoder program feature #981

Closed

ripgrep: add --pre flag

e09265e

The preprocessor flag accepts a command program and executes this program for every input file that is searched. Instead of searching the file directly, ripgrep will instead search the stdout contents of the program. Closes #978, Closes #981

BurntSushi force-pushed the ag/generic-decode branch from 0887977 to e09265e Compare July 21, 2018 20:30

BurntSushi mentioned this pull request Jul 21, 2018

read stderr when running external commands #990

Closed

BurntSushi merged commit 231456c into master Jul 21, 2018

BurntSushi deleted the ag/generic-decode branch July 21, 2018 21:25

chansey97 mentioned this pull request May 6, 2024

--pre doesn't work on Windows (CMD, MSYS2 bash, Powershell) #2798

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ripgrep: add --pre flag #989

ripgrep: add --pre flag #989

BurntSushi commented Jul 21, 2018 •

edited

Loading

c-blake commented Jul 21, 2018

BurntSushi commented Jul 21, 2018

BurntSushi commented Jul 21, 2018

c-blake commented Jul 21, 2018

c-blake commented Jul 21, 2018

BurntSushi commented Jul 21, 2018

ripgrep: add --pre flag #989

ripgrep: add --pre flag #989

Conversation

BurntSushi commented Jul 21, 2018 • edited Loading

c-blake commented Jul 21, 2018

BurntSushi commented Jul 21, 2018

BurntSushi commented Jul 21, 2018

c-blake commented Jul 21, 2018

c-blake commented Jul 21, 2018

BurntSushi commented Jul 21, 2018

BurntSushi commented Jul 21, 2018 •

edited

Loading