Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csplit: Handle repeated args, fix remainder after error #6114

Merged
merged 2 commits into from
Mar 25, 2024

Conversation

BenWiederhake
Copy link
Collaborator

@BenWiederhake BenWiederhake commented Mar 23, 2024

This PR tackles to things with csplit:

  • Support (and test) passing repeated arguments; this was pleasantly simple, but discovered what seems to be a bug in GNU
  • Fix a bug in uutils csplit, because I found/solved it while investigating the "GNU bug". Specifically, we used to unconditionally emit all the remainder of the input, even though GNU does no such thing. (This is especially silly if -sk is given.)

The GNU bug seems to be that an empty output file sometimes consumes a line:

$ rm xx*; cargo run csplit 50.txt /13/ 9 -k --suppress-matched; echo "= $?"; head -n2 xx02
27
0
111
= 0
14
15
$ rm xx*; csplit 50.txt /13/ 9 -k --suppress-matched; echo "= $?"; head -n2 xx02
27
0
108
= 0
15
16
$

As you can see, GNU csplit consumed the line containing 13, even though there is no reason I can see.

I marked it as FIXME in the code.

This is work towards #5998.

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Mar 24, 2024

As you can see, GNU csplit consumed the line containing 13, even though there is no reason I can see.

My hypothesis for what's going on is that this is due to --suppressed-matched. The line with 13 is suppressed because that matches the first pattern. Then 9 is lower than the current line number, so it matches, and is suppressed too. This feels like reasonable behaviour to me. What would the bug be exactly?

Edit: basically, we should add --suppressed-matched to lines as well.

@BenWiederhake
Copy link
Collaborator Author

Argh, sorry, I did a typo in the worst way. Let me retry:

As you can see, GNU csplit consumed the line containing 14, even though there is no reason I can see.

It seems the 9 somehow consumes the 14, but I don't understand why.

@tertsdiepraam
Copy link
Member

I figured that was a typo. I think what's happening is that line 14 is suppressed because 9 matches it in the sense that as soon as we get to 14 that is greater than 9 and therefore it matches and is suppressed. It makes sense if you interpret 9 as "match (and suppress) a line with line number than greater than or equal to 9".

@BenWiederhake
Copy link
Collaborator Author

Is that really the logic here? Oof.

In any case, sounds like suppressing the extra line is unrelated to repeated args or emitting the remainder of the input after an error, and the // FIXME: GNU starts at 15 is already in place. So maybe this PR can be merged first?

@tertsdiepraam
Copy link
Member

It would be weird if that wasn't the logic because 9 does cause a split right? So the matching criterion is already >=9. You're right about this PR. I'll take look in an hour.

@tertsdiepraam tertsdiepraam merged commit e4a1455 into uutils:main Mar 25, 2024
62 checks passed
@BenWiederhake BenWiederhake deleted the dev-csplit-repeated-args branch March 25, 2024 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants