Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail: fix issue #6543 (--pid when reading from stdin) #6582

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

just-an-engineer
Copy link
Contributor

Fixed tail to follow stdin when a PID is specified to match GNU behavior

All test pass, and manually confirmed the issues raised in #6543 were fixed

@cakebaker cakebaker changed the title Fix issue #6543 tail: fix issue #6543 Jul 18, 2024
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for all relevant scenarios. You should add at least one test that would fail with the old code.

src/uu/tail/src/tail.rs Outdated Show resolved Hide resolved
@just-an-engineer
Copy link
Contributor Author

After looking a bit, I don't think there's enough inspection into internal stuff to see if it's saved as a file or stdin, without changing more things in tail. So I can add a TODO in the tail test file to add in a test to check /dev/stdin is processed as stdin, (and stdin to stdin, file to file), if tail is ever changed to allow that visibility. Or I can open an issue. But that's dependent if you think it will ever actually be expanded to that, otherwise it could just sit forever and clutter up other actual todo's.

But I removed the comment, because it was just straight-up incorrect (brain fart on my end), and added a test to check tail doesn't exit automatically when --pid is specified. I confirmed it fails if I remove the || settings.pid != 0 from the follow conditional, and passes when it's added back

Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test! 10 seconds seems a bit short; some CI runners are incredibly slow, let's crank it up to "basically infinity". Also, I think that the test could check one more thing.

tests/by-util/test_tail.rs Outdated Show resolved Hide resolved
Comment on lines 4856 to 4857
Command::new("kill")
.arg("-9")
.arg(sleep_pid.to_string())
.output()
.expect("failed to kill sleep command");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After killing sleep, we have a perfect opportunity to quickly check that tail really does terminate by itself. Maybe insert something like this?

    child.make_assertion_with_delay(1000).is_alive();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, for checking if it's is_alive, why not is_not_alive. Running the commands manually, I confirm that the tail exits after sleep exits, and after it's killed. But, I changed it to is_not_alive, and it failed on Linux. Although it passed on Windows. I need to switch back over to Windows and confirm it's breaking now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my previous question, I believe it should be is_not_alive, and the cause for failure was because sleep's parent was the Rust test program, and without exiting, and without reaping, it became a zombie. However, the libc kill function in is_dead saw it as still alive (returning 0), and the reason the program exited before was because of recv_timeout(settings.sleep_sec) in watch.rs (currently line 532 in this commit), which I assume errored out, leading to the program ending, and leading to the appearance that it detected the pid specified dying and following suite, when it wasn't actually the case.

tests/by-util/test_tail.rs Outdated Show resolved Hide resolved
@just-an-engineer
Copy link
Contributor Author

Just add those things. Didn't even cross my mind to test it died afterwards, but that's a good idea. And yeah, it was redundant to have a 2nd wait.

@BenWiederhake
Copy link
Collaborator

Aaand there's a formatting error :D

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

.output()
.expect("failed to kill sleep command");

child.make_assertion_with_delay(100).is_alive();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.1 seconds is a bit very short. Again, some CI runners are really slow, that's why I'm worried that this would result in a flaky test (we already have too many of them). Maybe increase it to 1 second, i.e. 1000 milliseconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I'll go ahead and do that, then push. I'm usually of the mindset of trying to make sure any one test doesn't take too long, because it begins to add up over a large test suite. But I'd hate to make a weak or flaky test

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/rm/rm2 is no longer failing!

@sylvestre
Copy link
Contributor

some windows jobs are failing :)

@just-an-engineer
Copy link
Contributor Author

Alright. I'll go ahead and set up a dev environment on windows and get it working, even if that's just a conditional compilation. I'll try to push tomorrow

@BenWiederhake BenWiederhake changed the title tail: fix issue #6543 tail: fix issue #6543 (--pid when reading from stdin) Jul 27, 2024
@just-an-engineer just-an-engineer marked this pull request as draft July 29, 2024 13:54
Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@just-an-engineer just-an-engineer marked this pull request as ready for review July 29, 2024 16:46
@BenWiederhake
Copy link
Collaborator

Why is macos "FIXME: for currently not working platforms"? It's unixoid enough, it should work, shouldn't it?

@just-an-engineer
Copy link
Contributor Author

My apologies, I missed the notification and didn't see this last week.
I'm not sure. It wasn't something I was looking for for this particular issue, but I'll raise an issue and start a discussion there. I don't have any apple devices on which to test it with, currently, either. I think I saw something about some test being intermittently broken on apple, not all the time, but I could be misremembering.

Copy link
Contributor Author

@just-an-engineer just-an-engineer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Changes Requested, I believe those were the only changes requested were the tests, which were added. Do I need to do anything to resolve that? Because I see it's one of the blocking things

Copy link

github-actions bot commented Aug 7, 2024

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

@@ -4824,3 +4825,61 @@ fn test_obsolete_encoding_windows() {
.stderr_is("tail: bad argument encoding: '-�b'\n")
.code_is(1);
}

#[test]
#[cfg(not(target_vendor = "apple"))] // FIXME: for currently not working platforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add details why it isn't working on other os

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I'm sorry I didn't answer for over a month. I remember seeing your message shortly after you wrote it, but I don't remember after that, if I got busy or what. I had thought I had written something for it, but I guess not.
I think I had known at one point, but I don't have an Apple device to test on, and so currently I don't actually know.
I'll reach out to a friend with one and see if they can figure it out or not, and try to update this thread or the code accordingly.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants