Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow streaming, line-buffered input and output #287

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

corneliusroemer
Copy link

The major shortcoming of sd right now is that it doesn't support streaming stdin to stdout, instead, all input is read into memory which means sd can't be used for bigger-than-memory tasks.

This PR is based on @vtronko's work in #100 but adapted for current main.

It's a bit of a graft and rough around the edges, no validation of multi-line mode switch off etc, but it's a proof of concept and already useful (roughly 3x faster than sed).

Resolves:

Copy link
Collaborator

@CosmicHorrorDev CosmicHorrorDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial feedback to get things stared

@@ -57,6 +57,10 @@ w - match full words only
*/
pub flags: Option<String>,

#[arg(short = 'l', long = "line-buffered", default_value_t = false)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[arg(short = 'l', long = "line-buffered", default_value_t = false)]
#[arg(short, long, default_value_t)]

Nit: we can use clap's "magic values" for all of these

@@ -43,6 +43,34 @@ fn try_main() -> Result<()> {
Source::from_stdin()
};

if options.line_buffered && sources == Source::from_stdin() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this silently ignores the --line-buffered flag when running with files which seems pretty problematic

Comment on lines +64 to +67
handle
.write_all(replaced.as_ref())
.expect("Error writing to standard output");
handle.flush().expect("Error flushing output");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
handle
.write_all(replaced.as_ref())
.expect("Error writing to standard output");
handle.flush().expect("Error flushing output");
handle.write_all(replaced.as_ref())?;
handle.flush()?;

Failing to write is a pretty normal operation that we handle with the normal failure paths currently

.write_all(replaced.as_ref())
.expect("Error writing to standard output");
handle.flush().expect("Error flushing output");
buffer.clear();
Copy link
Collaborator

@CosmicHorrorDev CosmicHorrorDev Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this buffer.clear() was moved right before the .read_line(). Forgetting to clear a buffer that you're reusing is a super common mistake that's annoying to debug

@nc7s
Copy link
Contributor

nc7s commented Dec 25, 2023

2 cents:

  • I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?
  • I "unified" stdin and file scenarios into mmaps. They don't have read_line(). We can maybe implement a BufRead container for it? Or split into separate logic again?

@CosmicHorrorDev
Copy link
Collaborator

CosmicHorrorDev commented Dec 25, 2023

  • I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?

Very likely people are, yes. Or anyone who wants to match over multiple lines

  • I "unified" stdin and file scenarios into mmaps. They don't have read_line(). We can maybe implement a BufRead container for it? Or split into separate logic again?

You can get a slice from it and wrap it in an std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again

@nc7s
Copy link
Contributor

nc7s commented Dec 25, 2023

@CosmicHorrorDev:

  • I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?

Very likely people are, yes. Or anyone who wants to match over multiple lines

Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like --multiline might be better.

I'm curious if a temp file would be useful here.

You can get a slice from it and wrap it in an std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again

TIL std::io::Cursor ;) The reason of splitting being?

@CosmicHorrorDev
Copy link
Collaborator

CosmicHorrorDev commented Dec 25, 2023

Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like --multiline might be better.

I'm curious if a temp file would be useful here.

Allowing for line buffering inputs opens the door for streaming stdin and stdout (e.g. someone running just sd can type lines and see the live output after each line). It's more of a conceptual thing since a lot of people think of text files by line

TIL std::io::Cursor ;) The reason of splitting being?

To support streaming reads from stdin. I was looking through ripgreps source and it looks like they still special case streaming stdin, so we'll likely have to too if we want that behavior (which I do)

@nc7s
Copy link
Contributor

nc7s commented Dec 25, 2023

Did a little test on Cursor<Mmap>::lines(), it works. We can unify the logic (again?) on iterating through .lines().

(Excuse my "unificationism" XD)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants