tr: Stream output instead of buffering #6012

BenWiederhake · 2024-02-25T12:00:50Z

This should lower memory consumption, and fixes OOM in some scenarios.

Before, buggy behavior:

$ yes | tr -ds a b | head -n5  # GNU
y
y
y
y
y
$ yes | cargo run -- -ds a b | head -n5  # WARNING: THIS CAUSES OOM!
^C
[$? = 130]
$ yes yes | tr -s ye s | head -n5  # GNU
s
s
s
s
s
$ yes yes | cargo run -- -s ye s | head -n5  # WARNING: THIS CAUSES OOM!
^C
[$? = 130]

After:

$ yes | cargo run -- -ds a b | head -n5
y
y
y
y
y
$ yes yes | cargo run -- -s ye s | head -n5
s
s
s
s
s
$

It might be tempting to factor out the common line translate_input(&mut locked_stdin, &mut buffered_stdout, op); from all if-branches. However, note that op has distinct types in each case, and virtualizing it (e.g. let op: Box<dyn SymbolTranslator>) would incur additional runtime costs. That cost seems unnecessary just to gain a little bit of beauty.

I'm not really sure how to test this. Ideally we could pipe in some data, and NOT close the stdin pipe, and see whether the program produces any partial output. However, I didn't see any methods/functions for doing so, and it sounds like it's too hard to get right to be worth it.

Maybe I could write a BENCHMARKING.md file, like for shuf? What do you suggest?

github-actions · 2024-02-25T12:37:19Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

BenWiederhake · 2024-02-25T15:27:04Z

Changes since last push:

Nothing, I just want to re-run CI
CI flaked in test_split::test_round_robin_limited_file_descriptors

Test output of flaky test `test_split::test_round_robin_limited_file_descriptors`

---- test_split::test_round_robin_limited_file_descriptors stdout ----
run: /target/i686-unknown-linux-gnu/debug/coreutils split -n r/40 onehundredlines.txt
thread 'test_split::test_round_robin_limited_file_descriptors' panicked at 'Command was expected to succeed.
stdout = 
 stderr = split: unable to open 'xaz'; aborting
', tests/by-util/test_split.rs:1646:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: tests::common::util::CmdResult::success
             at ./tests/common/util.rs:388:9

Sadly, this does not show a backtrace of the failing command. Also, I cannot reproduce this problem. On my machine the command only needs 5 open files at any point in time, so 9 seems like this already includes a lot of wiggle room, so raising this even further is unlikely to fix the issue.

github-actions · 2024-02-25T15:56:54Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

src/uu/tr/src/tr.rs

BenWiederhake · 2024-02-29T23:04:00Z

Changes since last push:

Rebased on current master
Added documentation for SymbolTranslator::chain, and a short explanation that the *_op variables are the translation operations, as requested by @sylvestre

BenWiederhake · 2024-02-29T23:06:23Z

Changes since last push:

Apparently, links in the documentation need <…> around them.

tertsdiepraam

I like it! Just one suggestion on the link in the comment.

src/uu/tr/src/operation.rs

BenWiederhake · 2024-03-01T23:01:44Z

Changes since last push(es):

Rebased on current main
Make documentation prettier. Thanks @tertsdiepraam!
Learn the hard way that the "commit suggestion" button doesn't create nice fixups.

This should lower memory consumption, and fixes OOM in some scenarios.

github-actions · 2024-03-10T01:10:11Z

GNU testsuite comparison:

Congrats! The gnu test tests/chown/preserve-root is no longer failing!

BenWiederhake force-pushed the dev-tr-stream branch from 3dc2b0f to 9af8eda Compare February 25, 2024 15:23

BenWiederhake mentioned this pull request Feb 25, 2024

split: make flaky test more verbose #6015

Closed

sylvestre reviewed Feb 28, 2024

View reviewed changes

src/uu/tr/src/tr.rs Show resolved Hide resolved

BenWiederhake force-pushed the dev-tr-stream branch from 9af8eda to 86b12a5 Compare February 29, 2024 23:02

BenWiederhake force-pushed the dev-tr-stream branch from 86b12a5 to 55eee60 Compare February 29, 2024 23:06

BenWiederhake mentioned this pull request Feb 29, 2024

Windows CI is broken #6033

Open

tertsdiepraam reviewed Mar 1, 2024

View reviewed changes

src/uu/tr/src/operation.rs Outdated Show resolved Hide resolved

BenWiederhake force-pushed the dev-tr-stream branch from 9127b4b to 8898ba8 Compare March 1, 2024 23:00

tr: stream output instead of buffering

9cbe605

This should lower memory consumption, and fixes OOM in some scenarios.

sylvestre force-pushed the dev-tr-stream branch from 8898ba8 to 9cbe605 Compare March 9, 2024 21:51

sylvestre merged commit fe0c814 into uutils:main Mar 10, 2024
62 checks passed

BenWiederhake deleted the dev-tr-stream branch March 11, 2024 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tr: Stream output instead of buffering #6012

tr: Stream output instead of buffering #6012

BenWiederhake commented Feb 25, 2024

github-actions bot commented Feb 25, 2024

BenWiederhake commented Feb 25, 2024

github-actions bot commented Feb 25, 2024

BenWiederhake commented Feb 29, 2024

BenWiederhake commented Feb 29, 2024

tertsdiepraam left a comment

BenWiederhake commented Mar 1, 2024

github-actions bot commented Mar 10, 2024

tr: Stream output instead of buffering #6012

tr: Stream output instead of buffering #6012

Conversation

BenWiederhake commented Feb 25, 2024

github-actions bot commented Feb 25, 2024

BenWiederhake commented Feb 25, 2024

github-actions bot commented Feb 25, 2024

BenWiederhake commented Feb 29, 2024

BenWiederhake commented Feb 29, 2024

tertsdiepraam left a comment

Choose a reason for hiding this comment

BenWiederhake commented Mar 1, 2024

github-actions bot commented Mar 10, 2024