Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

join: "support" field numbers larger than usize::MAX #2882

Merged
merged 1 commit into from
Jan 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/uu/join/src/join.rs
Original file line number Diff line number Diff line change
Expand Up @@ -809,8 +809,13 @@ fn get_field_number(keys: Option<usize>, key: Option<usize>) -> UResult<usize> {
/// Parse the specified field string as a natural number and return
/// the zero-based field number.
fn parse_field_number(value: &str) -> UResult<usize> {
// TODO: use ParseIntError.kind() once MSRV >= 1.55
// For now, store an overflow Err from parsing a value 10x 64 bit usize::MAX
// Adapted from https://github.com/rust-lang/rust/issues/22639
let overflow = "184467440737095516150".parse::<usize>().err().unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hack could potentially add some overhead, since we're probably doing this every time a field number gets parsed (I don't know how expensive these operations are, and I'm not sure how aggressive the compiler's optimizations are). In the typical case, this should be negligible, since this only happens a linear number of times in the size of the arguments, but someone could possibly be relying on some crazy xargs that causes it to show up. Possible workarounds would be passing overflow as an arg, storing it in global scope, or maybe a const fn (I'd have to check when things were added vs. MSRV), but my inclination is to keep the hack scoped to these lines, and just accept whatever overhead until either someone complains, or MSRV gets bumped enough we can use the proper solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever trick, hahaha. I can't believe we cannot properly introspect the kind until Rust 1.55. Even though it's not a const fn, it might get optimized out anyway. I think we can accept this if you add a bit of documentation on how this workaround works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a bit of explanation, and a link to where I found the trick. But yeah, I suspect the actual runtime cost is either tiny or non-existent, I just haven't done any real investigation or benchmarking to confirm that.

match value.parse::<usize>() {
Ok(result) if result > 0 => Ok(result - 1),
Err(ref e) if *e == overflow => Ok(usize::MAX),
_ => Err(USimpleError::new(
1,
format!("invalid field number: {}", value.quote()),
Expand Down
21 changes: 21 additions & 0 deletions tests/by-util/test_join.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,27 @@ fn different_field() {
.stdout_only_fixture("different_field.expected");
}

#[test]
fn out_of_bounds_fields() {
new_ucmd!()
.arg("fields_1.txt")
.arg("fields_4.txt")
.arg("-1")
.arg("3")
.arg("-2")
.arg("5")
.succeeds()
.stdout_only_fixture("out_of_bounds_fields.expected");

new_ucmd!()
.arg("fields_1.txt")
.arg("fields_4.txt")
.arg("-j")
.arg("100000000000000000000") // > usize::MAX for 64 bits
.succeeds()
.stdout_only_fixture("out_of_bounds_fields.expected");
}

#[test]
fn unpaired_lines() {
new_ucmd!()
Expand Down
25 changes: 25 additions & 0 deletions tests/fixtures/join/out_of_bounds_fields.expected
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
1 2 c 1 cd
1 3 d 2 de
1 5 e 3 ef
1 7 f 4 fg
1 11 g 5 gh
2 2 c 1 cd
2 3 d 2 de
2 5 e 3 ef
2 7 f 4 fg
2 11 g 5 gh
3 2 c 1 cd
3 3 d 2 de
3 5 e 3 ef
3 7 f 4 fg
3 11 g 5 gh
5 2 c 1 cd
5 3 d 2 de
5 5 e 3 ef
5 7 f 4 fg
5 11 g 5 gh
8 2 c 1 cd
8 3 d 2 de
8 5 e 3 ef
8 7 f 4 fg
8 11 g 5 gh