Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitoring arbitrary kevents #579

Closed
wants to merge 3 commits into from
Closed

Conversation

sorcio
Copy link
Contributor

@sorcio sorcio commented Jul 29, 2018

So, this is just to start thinking how we can implement support for arbitrary kevent events (#578). I'm not a kqueue expert at all, I just happen to run macOS. @njsmith is this similar to what you had in mind?

I think there are weird interactions between flags that are not apparent from the documentation. For example it appears that waiting for process exit will get an event with EV_ONESHOT which makes sense (you can't wait for an event from a pid that doesn't exist anymore). So maybe making it actually arbitrary involves some more tricks that I haven't explored yet.

But it seems to work ok to wait for a child process to terminate. I'm afraid I might be missing something for that use case too, though, because there is a race condition between process creation and opening a monitor. If the process has already terminated we get an exception, or we might even monitor the wrong process. I'm not sure this is relevant to this issue. Maybe the API should be changed for the bigger picture but right now I cannot suggest how.

@njsmith
Copy link
Member

njsmith commented Jul 30, 2018

there is a race condition between process creation and opening a monitor. If the process has already terminated we get an exception, or we might even monitor the wrong process. I'm not sure this is relevant to this issue

So this is one of those classic weird and subtle Unix things. You've heard of "zombie processes"? On Unix when a process exits, the kernel cleans up most of it (the address space, the threads, etc.), but it keeps around a little bit of metadata (the process name, the return value) – this is a "zombie process". This metadata stays in the process table under the original process's PID, until someone calls waitpid, which consumes the zombie (om nom nom) and frees up that slot in the process table.

So after you fork a child process, the pid you get back is guaranteed to keep referring to that exact process – or at least, its undead remains – until you call waitpid. This avoids the race condition. (Unless someone else calls waitpid on our process without our knowledge, of course, but hey, that's life.)

This does mean that you have to be careful that for each child process, you call waitpid exactly once, and that once you've called waitpid you can't use EVFILT_PROC. But that's a concern for higher-level code to worry about, not something that we need handle down here in the low-level kevent API.



async def test_kqueue_monitor_proc_event():
p = subprocess.Popen(['sleep', '0.1'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all that stuff about zombies means you don't need this process to sleep. Maybe true and false and then you can also check the exit code?

@njsmith
Copy link
Member

njsmith commented Jul 30, 2018

I started writing more high-level review, but realized it was so high-level it wasn't really about this specific PR at all, so I posted it on the issue instead :-): #578 (comment)

@sorcio
Copy link
Contributor Author

sorcio commented Jul 30, 2018

Not sure if Popen does something I don't understand compared to a naked fork/exec, but I cannot register a kevent for a child process pid after that process has terminated. I will definitely look further into it, because I too expected it to provide some way to deal with zombies. Now that I think of it, I was so confused by this that I forgot to reap the child :)

Thanks for the review and the excellent write-up! I will follow up on that later.

@codecov
Copy link

codecov bot commented Jul 30, 2018

Codecov Report

Merging #579 into master will decrease coverage by 1.97%.
The diff coverage is 86.66%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #579      +/-   ##
=========================================
- Coverage   99.27%   97.3%   -1.98%     
=========================================
  Files          89      92       +3     
  Lines       10628   12943    +2315     
  Branches      747    1191     +444     
=========================================
+ Hits        10551   12594    +2043     
- Misses         59     325     +266     
- Partials       18      24       +6
Impacted Files Coverage Δ
trio/_core/_io_kqueue.py 86.53% <72.72%> (+6.75%) ⬆️
trio/_core/tests/test_kqueue.py 94.73% <94.73%> (ø)
trio/_core/_windows_cffi.py 43.75% <0%> (-47.56%) ⬇️
trio/_core/_io_windows.py 51.68% <0%> (-25.41%) ⬇️
trio/hazmat.py 81.25% <0%> (-18.75%) ⬇️
trio/_core/tests/test_windows.py 89.18% <0%> (-10.82%) ⬇️
trio/_core/__init__.py 98.21% <0%> (-1.79%) ⬇️
trio/_core/tests/test_io.py 99.23% <0%> (-0.77%) ⬇️
trio/tests/test_socket.py 99.85% <0%> (-0.15%) ⬇️
trio/_core/tests/test_run.py 100% <0%> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd0d877...48a91ef. Read the comment docs.

@njsmith
Copy link
Member

njsmith commented Jul 30, 2018

I guess subprocess could be automatically reaping the zombie before we notice, but looking at the code I think it would only do this if you let the Popen object get garbage collected, so probably not.

It's also possible that kqueue simple doesn't let you register a filter for a process that's already dead. Maybe you're expected to take the error message as your notification that the process is finished and you should call waitpid :-). If that's true, then testing purposes we could do something like spawning a sleep 9999, registering the kqueue filter, and then killing the process.

@sorcio
Copy link
Contributor Author

sorcio commented Aug 23, 2018

A couple notes on monitoring process termination.

First, it seems this is the correct case:

It's also possible that kqueue simple doesn't let you register a filter for a process that's already dead. Maybe you're expected to take the error message as your notification that the process is finished and you should call waitpid

As I said above, my code totally forgets the reaping part. You have to do waitpid() yourself. When you try to register a NOTE_EXIT event on a zombie pid you will get an OSError. At that point you know you can already reap it.

Another note: some platforms differ on the way they report the exit status of the exited process. On FreeBSD it's included whenever you register NOTE_EXIT, but on Darwin you need to specify NOTE_EXITSTATUS. I didn't check other BSDs but I assume the most cross-platform way would be to ignore the event and just collect the exit status from waitpid().

@njsmith
Copy link
Member

njsmith commented Apr 26, 2019

We should probably still do a pass over the low-level kqueue APIs to clean them up, but this PR has stalled and the discussion is way out of date (e.g. trio mainline is now actually using kqueue to monitor process termination). So I'm gonna close it, and we can always re-open if that turns out to be useful.

@njsmith njsmith closed this Apr 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants