Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added buffering to stdout when its not a terminal #736

Closed
wants to merge 14 commits into from
Closed

Added buffering to stdout when its not a terminal #736

wants to merge 14 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 4, 2021

The main changes are making the functions in output.rs generic so they accept a BufWriter
I also changed from using a channel to a sync_channel so the program wont use too much memory if the output device is slow.
Now on my system doing fd -uuu . / | grep testfile actually outperforms find

@sharkdp
Copy link
Owner

sharkdp commented Mar 14, 2021

Thank you very much for your contribution.

This is something I wanted to look into for a long time! Before I take a closer look, I have a few questions:

  • You set TTY_FLUSH_INTERVAL to 1 in your last commit. What exactly is there still to buffer if every result is flushed?
  • If we still get a benefit with TTY_FLUSH_INTERVAL == 1, couldn't we also use this in interactive mode, when attached to a terminal?

And concerning the sync-channel change:

  • Could you please explain what the implications of this change are? I would assume that the senders will be blocked if the channel is full? Are there any drawbacks from this (performance-wise or other)?

@ghost
Copy link
Author

ghost commented Mar 15, 2021

TTY_FLUSH_INTERVAL is only active when stdout is an interactive terminal, when its not a terminal the buffer is flushed when the BufWriter is full.

I switched to using a sync channel to reduce the maximum memory consumption of fd when stdout is blocked. Doing
fd -uuu . / | pv -qL 1 > /dev/null leads to 140Mb of usage on my machine. Now with the sync sender i only get 6Mb of usage.

@sharkdp
Copy link
Owner

sharkdp commented Jul 27, 2021

Rebased and fixed conflicts

@sharkdp
Copy link
Owner

sharkdp commented Aug 8, 2021

@sourlemon207 Are you still available to continue working on this? The CI build currently breaks to due TTY_FLUSH_INTERVAL being 1 and taking something modulo TTY_FLUSH_INTERVAL.

@ghost
Copy link
Author

ghost commented Aug 11, 2021

@sharkdp I will try fix this during the week

@tavianator
Copy link
Collaborator

Note that Rust itself will probably do this for you in the future: rust-lang/rust#60673 rust-lang/rust#78515

I think a better approach would be to make a wrapper for StdoutLock that can optionally buffer, instead of buffering to a vec in the workers.

@ghost
Copy link
Author

ghost commented Aug 12, 2021

Note that Rust itself will probably do this for you in the future: rust-lang/rust#60673 rust-lang/rust#78515

I think a better approach would be to make a wrapper for StdoutLock that can optionally buffer, instead of buffering to a vec in the workers.

Are you referring to the line

let mut buffer = Vec::with_capacity(MAX_BUFFER_LENGTH);

that buffer is there so that if there is less than MAX_BUFFER_LENGTH entries the entries will be printed in order.
What I did was wrap StdoutLock in a std::io::BufWriter which is like what you are suggesting.

One rust does the buffering like c you can just replace the lines

let stdout = io::stdout();
let stdout = stdout.lock();
let mut stdout = io::BufWriter::new(stdout);

with

let mut stdout = io::stdout().lock();

@tavianator
Copy link
Collaborator

You're right, I did misread the patch. Some code I thought was new actually just changed indentation. I won't have time for a full review for a few days but that does seem like the right idea.

By the way there are some .orig files added that don't need to be.

&config,
&wants_to_quit,
);
if config.interactive_terminal {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this removes the functionality where the results are sorted if the response is fast and short enough if we aren't using an "interactive_terminal", which probably isn't desirable. I think probably the only difference here should be whether we flush the result after each result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. That would simplify the diff quite a bit too, and avoid the duplication between the branches of the if config.interactive_terminal.

);
}
buffer.clear();
let r = stdout.flush();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this chunk of code:

                                       let r = stdout.flush();
                                        if r.is_err() {
                                            // Probably a broken pipe. Exit gracefully.
                                            process::exit(ExitCode::GeneralError.into());
                                        }

should be factored out into a function, since it is repeated a few times?

Copy link
Collaborator

@tmccombs tmccombs Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking my other comment into account, it could potentially be something like:

fn maybe_flush<W: Write>(config: &Arc<Options>, stdout: &mut W>) {
   if config.interactive_terminal {
     if stdout.flush().is_err() {
       process::exit(ExitCode::GeneralError.into());
    }
  }
}

let r = stdout.flush();
if r.is_err() {
// Probably a broken pipe. Exit gracefully.
process::exit(ExitCode::GeneralError.into());
Copy link
Collaborator

@tmccombs tmccombs Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
process::exit(ExitCode::GeneralError.into());
return ExitCode::GeneralError.into();

@sharkdp
Copy link
Owner

sharkdp commented Nov 14, 2021

I'm closing this due to inactivity. Please feel free to comment in case it should be re-opened.

@sharkdp sharkdp closed this Nov 14, 2021
tmccombs added a commit to tmccombs/fd that referenced this pull request Nov 15, 2021
This is based on the work of sharkdp#736 by @sourlemon207.

I've added the suggestion I recommended on that PR.
sharkdp pushed a commit to tmccombs/fd that referenced this pull request Nov 26, 2021
This is based on the work of sharkdp#736 by @sourlemon207.

I've added the suggestion I recommended on that PR.
sharkdp pushed a commit that referenced this pull request Nov 26, 2021
This is based on the work of #736 by @sourlemon207.

I've added the suggestion I recommended on that PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants