-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cut: allow non utf8 characters for delimiters #6037
Conversation
@@ -337,6 +347,88 @@ fn cut_files(mut filenames: Vec<String>, mode: &Mode) { | |||
} | |||
} | |||
|
|||
// This is temporary helper function to convert OsString to &[u8] for unix targets only | |||
// TODO Remove this function and re-implement the functionality in each place that calls it | |||
// for all targets using https://doc.rust-lang.org/nightly/std/ffi/struct.OsStr.html#method.as_encoded_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as_encoded_bytes
isn't suitable for this, the results for invalid Unicode on Windows aren't meaningful for anything other than passing back into from_encoded_bytes_unchecked
. (Specifically they're WTF-8, which doesn't help users.)
Your implementation is the right way to do it, I think we should keep it (and not unwrap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! could not get back to it before it got merged - will update in the next one as there are still few things to be fixed in cut
for full GNU tests compatibility
// for all targets using https://doc.rust-lang.org/nightly/std/ffi/struct.OsStr.html#method.as_encoded_bytes | ||
// once project's MSRV is bumped up to 1.74.0+ so that function becomes available | ||
// For now - support unix targets only and on non-unix (i.e. Windows) will just return an error if delimiter value is not UTF-8 | ||
fn os_string_as_bytes(os_string: &OsString) -> UResult<&[u8]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use &OsStr
for this (and also get access to 'static
lifetimes that way).
.get_one::<OsString>(options::OUTPUT_DELIMITER) | ||
.map(|os_string| { | ||
if os_string.is_empty() || os_string == "''" { | ||
"\0".as_bytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"\0".as_bytes() | |
b"\0" |
GNU testsuite comparison:
|
* ci: use codecov token in CICD/GnuTests workflows * tee: fail test if string setup fails * sort: add skipped test for combined flags Now that clap#2624 has been resolved, we can and should test both variants. * cat: don't flake even on exotic pipe buffer sizes See also 9995c63. There is a race condition between the writing thread and the command. It is easily possible that on the developer's machine, the writing thread is always faster, filling the kernel's buffer of the stdin pipe, thus succeeding the write. It is also easily possible that on the busy CI machines, the child command runs first for whatever reason, and exits early, thus killing the pipe, which causes the later write to fail. This results in a flaky test. Let's prevent flaky tests. * numfmt: don't flake even on exotic pipe buffer sizes * split: don't flake even on exotic pipe buffer sizes * simulate terminal utility (squash) * workaround: run builds with retry (a) * added configurable terminal size * chore(deps): update rust crate rayon to 1.9 * cargo: fix feature = "cargo-clippy" deprecation * tests/printf: Fix char_as_byte test, add char and string padding tests * printf: Change get_char and write_padded to handle bytes instead of chars * uucore/format: add padlen to spell-checker:ignore * tests/printf: Verify the correct error behavior of printf when provided with '%0c' or '%0s' * printf: Raise error on '%0c' and '%0s' formats * cp: fix flaky test test_cp_arg_interactive_update, document adjacent bug * chore(deps): update rust crate walkdir to 2.5 * cat: permit repeating command-line flags * cat: fix -b and -n anti-symmetry * cat: ignore -u flag, just like GNU does * tests/common/util.rs: add cfg(feature = "env") * cat: prefix two test fns with "test_" * Bump mio from 0.8.10 to 0.8.11 * extend error message for case when writer instanciation fails second time * Bump chrono from 0.4.34 to 0.4.35 * ls: use chrono::TimeDelta::try_seconds instead of deprecated chrono::TimeDelta::seconds * touch: replace use of deprecated chrono functions * chmod: slightly adjust error message when preserve-root is triggered One of the GNU tests checks for the exact error message. * chgrp+chown: also trigger preserve-root during dirwalking, fix error message This is explicitly tested in the GNU tests. * uucore: drop unused function resolve_relative_path This function is by necessity ill-defined: Depending on the context, '..' is either the logical parent directory, sometimes the physical parent directory. This function can only work for the latter case, in which case `Path::canonicalize` is often a better approach. * split: close as much fds as needed for opening new one * use std::command::pre_exec() to set limits on child before exec * chore(deps): update softprops/action-gh-release action to v2 * dd: treat arg as bytes if it contains 'B' * Fix clippy warnings * tr: stream output instead of buffering This should lower memory consumption, and fixes OOM in some scenarios. * shuf: fix and test off-by-one errors around ranges * shuf: fix error message text on negative-sized ranges Found by @cakebaker: uutils#6011 (comment) * chcon: allow overriding between --dereference and --no-dereference * chcon: allow repeated flags and arguments * touch: Respect -h when getting metadata (uutils#5951) * Add tests that stat symlinks * Check follow first in stat * Don't run tests on FreeBSD It would be possible to get them to run on FreeBSD by avoiding get_symlink_times, but the behavior we're testing is not platform-specific, so it's fine to not test it on FreeBSD. --------- Co-authored-by: Sylvestre Ledru <sylvestre@debian.org> * pr: fix deprecation warnings & remove comment * chgrp: fix clippy warning * cut: allow non utf8 characters for delimiters (uutils#6037) * cp: improve the support of --attributes-only (uutils#6051) * cp: improve the support of --attributes-only * remove useless comments Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> --------- Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> * cp: Split the copy_file function a bit * parser: if closing square bracket not found, stop looking for it again This solves uutils#5584, where the fuzzing would take hours without this. * Fix install: invalid link at destination also remove some FixMEs for FreeBsd * Bump nix from 0.27 to 0.28 * uucore/pipes: adapt to new return type of nix fn nix 0.28 changed the return type of unistd::pipe() from Result<(RawFd, RawFd), Error> to Result<(OwnedFd, OwnedFd), Error> * tty: unistd::ttyname takes AsFd instead of RawFd change introduced by nix 0.28 * stty: remove ofill output flag flag was removed from nix::sys::termios::OutputFlags in nix 0.28 * cat: adapt to type change of unistd::write() nix 0.28 changed "write(fd: RawFd, buf: &[u8]) -> Result<usize>" to "write<Fd: AsFd>(fd: Fd, buf: &[u8]) -> Result<usize>" * chore(deps): update rust crate blake3 to 1.5.1 --------- Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> Co-authored-by: Ben Wiederhake <BenWiederhake.GitHub@gmx.de> Co-authored-by: Ulrich Hornung <hornunguli@gmx.de> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Sylvestre Ledru <sylvestre@debian.org> Co-authored-by: Dimitris Apostolou <dimitris.apostolou@icloud.com> Co-authored-by: Dorian Péron <dorianperon.i@gmail.com> Co-authored-by: Terts Diepraam <terts.diepraam@gmail.com> Co-authored-by: mhead <mtrxhead@protonmail.com> Co-authored-by: Yash Thakur <45539777+ysthakur@users.noreply.github.com> Co-authored-by: Zoltan Kiss <121870572+cj-zoltan-kiss@users.noreply.github.com>
* ci: use codecov token in CICD/GnuTests workflows * tee: fail test if string setup fails * sort: add skipped test for combined flags Now that clap#2624 has been resolved, we can and should test both variants. * cat: don't flake even on exotic pipe buffer sizes See also 9995c63. There is a race condition between the writing thread and the command. It is easily possible that on the developer's machine, the writing thread is always faster, filling the kernel's buffer of the stdin pipe, thus succeeding the write. It is also easily possible that on the busy CI machines, the child command runs first for whatever reason, and exits early, thus killing the pipe, which causes the later write to fail. This results in a flaky test. Let's prevent flaky tests. * numfmt: don't flake even on exotic pipe buffer sizes * split: don't flake even on exotic pipe buffer sizes * simulate terminal utility (squash) * workaround: run builds with retry (a) * added configurable terminal size * chore(deps): update rust crate rayon to 1.9 * cargo: fix feature = "cargo-clippy" deprecation * tests/printf: Fix char_as_byte test, add char and string padding tests * printf: Change get_char and write_padded to handle bytes instead of chars * uucore/format: add padlen to spell-checker:ignore * tests/printf: Verify the correct error behavior of printf when provided with '%0c' or '%0s' * printf: Raise error on '%0c' and '%0s' formats * cp: fix flaky test test_cp_arg_interactive_update, document adjacent bug * chore(deps): update rust crate walkdir to 2.5 * cat: permit repeating command-line flags * cat: fix -b and -n anti-symmetry * cat: ignore -u flag, just like GNU does * tests/common/util.rs: add cfg(feature = "env") * cat: prefix two test fns with "test_" * Bump mio from 0.8.10 to 0.8.11 * extend error message for case when writer instanciation fails second time * Bump chrono from 0.4.34 to 0.4.35 * ls: use chrono::TimeDelta::try_seconds instead of deprecated chrono::TimeDelta::seconds * touch: replace use of deprecated chrono functions * chmod: slightly adjust error message when preserve-root is triggered One of the GNU tests checks for the exact error message. * chgrp+chown: also trigger preserve-root during dirwalking, fix error message This is explicitly tested in the GNU tests. * uucore: drop unused function resolve_relative_path This function is by necessity ill-defined: Depending on the context, '..' is either the logical parent directory, sometimes the physical parent directory. This function can only work for the latter case, in which case `Path::canonicalize` is often a better approach. * split: close as much fds as needed for opening new one * use std::command::pre_exec() to set limits on child before exec * chore(deps): update softprops/action-gh-release action to v2 * dd: treat arg as bytes if it contains 'B' * Fix clippy warnings * tr: stream output instead of buffering This should lower memory consumption, and fixes OOM in some scenarios. * shuf: fix and test off-by-one errors around ranges * shuf: fix error message text on negative-sized ranges Found by @cakebaker: uutils#6011 (comment) * chcon: allow overriding between --dereference and --no-dereference * chcon: allow repeated flags and arguments * touch: Respect -h when getting metadata (uutils#5951) * Add tests that stat symlinks * Check follow first in stat * Don't run tests on FreeBSD It would be possible to get them to run on FreeBSD by avoiding get_symlink_times, but the behavior we're testing is not platform-specific, so it's fine to not test it on FreeBSD. --------- Co-authored-by: Sylvestre Ledru <sylvestre@debian.org> * pr: fix deprecation warnings & remove comment * chgrp: fix clippy warning * cut: allow non utf8 characters for delimiters (uutils#6037) * cp: improve the support of --attributes-only (uutils#6051) * cp: improve the support of --attributes-only * remove useless comments Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> --------- Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> * cp: Split the copy_file function a bit * parser: if closing square bracket not found, stop looking for it again This solves uutils#5584, where the fuzzing would take hours without this. * Fix install: invalid link at destination also remove some FixMEs for FreeBsd * Bump nix from 0.27 to 0.28 * uucore/pipes: adapt to new return type of nix fn nix 0.28 changed the return type of unistd::pipe() from Result<(RawFd, RawFd), Error> to Result<(OwnedFd, OwnedFd), Error> * tty: unistd::ttyname takes AsFd instead of RawFd change introduced by nix 0.28 * stty: remove ofill output flag flag was removed from nix::sys::termios::OutputFlags in nix 0.28 * cat: adapt to type change of unistd::write() nix 0.28 changed "write(fd: RawFd, buf: &[u8]) -> Result<usize>" to "write<Fd: AsFd>(fd: Fd, buf: &[u8]) -> Result<usize>" * chore(deps): update rust crate blake3 to 1.5.1 --------- Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> Co-authored-by: Ben Wiederhake <BenWiederhake.GitHub@gmx.de> Co-authored-by: Ulrich Hornung <hornunguli@gmx.de> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Sylvestre Ledru <sylvestre@debian.org> Co-authored-by: Dimitris Apostolou <dimitris.apostolou@icloud.com> Co-authored-by: Dorian Péron <dorianperon.i@gmail.com> Co-authored-by: Terts Diepraam <terts.diepraam@gmail.com> Co-authored-by: mhead <mtrxhead@protonmail.com> Co-authored-by: Yash Thakur <45539777+ysthakur@users.noreply.github.com> Co-authored-by: Zoltan Kiss <121870572+cj-zoltan-kiss@users.noreply.github.com>
This PR refactors how
cut
processes delimiters and allows non UTF8 values for those options (-d
/--delimiter
,--output-delimiter
) to align with GNU behavior.It fixes
8bit-delim
test from GNUtests/cut/cut.pl
set of tests.NOTE: There is a TODO left in for when MSRV for the project is bumped to 1.74.0+