monitoring arbitrary kevents #579

sorcio · 2018-07-29T13:39:19Z

So, this is just to start thinking how we can implement support for arbitrary kevent events (#578). I'm not a kqueue expert at all, I just happen to run macOS. @njsmith is this similar to what you had in mind?

I think there are weird interactions between flags that are not apparent from the documentation. For example it appears that waiting for process exit will get an event with EV_ONESHOT which makes sense (you can't wait for an event from a pid that doesn't exist anymore). So maybe making it actually arbitrary involves some more tricks that I haven't explored yet.

But it seems to work ok to wait for a child process to terminate. I'm afraid I might be missing something for that use case too, though, because there is a race condition between process creation and opening a monitor. If the process has already terminated we get an exception, or we might even monitor the wrong process. I'm not sure this is relevant to this issue. Maybe the API should be changed for the bigger picture but right now I cannot suggest how.

njsmith · 2018-07-30T04:01:19Z

there is a race condition between process creation and opening a monitor. If the process has already terminated we get an exception, or we might even monitor the wrong process. I'm not sure this is relevant to this issue

So this is one of those classic weird and subtle Unix things. You've heard of "zombie processes"? On Unix when a process exits, the kernel cleans up most of it (the address space, the threads, etc.), but it keeps around a little bit of metadata (the process name, the return value) – this is a "zombie process". This metadata stays in the process table under the original process's PID, until someone calls waitpid, which consumes the zombie (om nom nom) and frees up that slot in the process table.

So after you fork a child process, the pid you get back is guaranteed to keep referring to that exact process – or at least, its undead remains – until you call waitpid. This avoids the race condition. (Unless someone else calls waitpid on our process without our knowledge, of course, but hey, that's life.)

This does mean that you have to be careful that for each child process, you call waitpid exactly once, and that once you've called waitpid you can't use EVFILT_PROC. But that's a concern for higher-level code to worry about, not something that we need handle down here in the low-level kevent API.

njsmith · 2018-07-30T04:08:09Z

trio/_core/tests/test_kqueue.py

+
+
+async def test_kqueue_monitor_proc_event():
+    p = subprocess.Popen(['sleep', '0.1'])


So all that stuff about zombies means you don't need this process to sleep. Maybe true and false and then you can also check the exit code?

njsmith · 2018-07-30T06:29:30Z

I started writing more high-level review, but realized it was so high-level it wasn't really about this specific PR at all, so I posted it on the issue instead :-): #578 (comment)

sorcio · 2018-07-30T14:21:48Z

Not sure if Popen does something I don't understand compared to a naked fork/exec, but I cannot register a kevent for a child process pid after that process has terminated. I will definitely look further into it, because I too expected it to provide some way to deal with zombies. Now that I think of it, I was so confused by this that I forgot to reap the child :)

Thanks for the review and the excellent write-up! I will follow up on that later.

codecov · 2018-07-30T22:26:09Z

Codecov Report

Merging #579 into master will decrease coverage by 1.97%.
The diff coverage is 86.66%.

@@            Coverage Diff            @@
##           master    #579      +/-   ##
=========================================
- Coverage   99.27%   97.3%   -1.98%     
=========================================
  Files          89      92       +3     
  Lines       10628   12943    +2315     
  Branches      747    1191     +444     
=========================================
+ Hits        10551   12594    +2043     
- Misses         59     325     +266     
- Partials       18      24       +6

Impacted Files	Coverage Δ
trio/_core/_io_kqueue.py	`86.53% <72.72%> (+6.75%)`	⬆️
trio/_core/tests/test_kqueue.py	`94.73% <94.73%> (ø)`
trio/_core/_windows_cffi.py	`43.75% <0%> (-47.56%)`	⬇️
trio/_core/_io_windows.py	`51.68% <0%> (-25.41%)`	⬇️
trio/hazmat.py	`81.25% <0%> (-18.75%)`	⬇️
trio/_core/tests/test_windows.py	`89.18% <0%> (-10.82%)`	⬇️
trio/_core/__init__.py	`98.21% <0%> (-1.79%)`	⬇️
trio/_core/tests/test_io.py	`99.23% <0%> (-0.77%)`	⬇️
trio/tests/test_socket.py	`99.85% <0%> (-0.15%)`	⬇️
trio/_core/tests/test_run.py	`100% <0%> (ø)`	⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd0d877...48a91ef. Read the comment docs.

njsmith · 2018-07-30T22:51:37Z

I guess subprocess could be automatically reaping the zombie before we notice, but looking at the code I think it would only do this if you let the Popen object get garbage collected, so probably not.

It's also possible that kqueue simple doesn't let you register a filter for a process that's already dead. Maybe you're expected to take the error message as your notification that the process is finished and you should call waitpid :-). If that's true, then testing purposes we could do something like spawning a sleep 9999, registering the kqueue filter, and then killing the process.

sorcio · 2018-08-23T09:19:58Z

A couple notes on monitoring process termination.

First, it seems this is the correct case:

It's also possible that kqueue simple doesn't let you register a filter for a process that's already dead. Maybe you're expected to take the error message as your notification that the process is finished and you should call waitpid

As I said above, my code totally forgets the reaping part. You have to do waitpid() yourself. When you try to register a NOTE_EXIT event on a zombie pid you will get an OSError. At that point you know you can already reap it.

Another note: some platforms differ on the way they report the exit status of the exited process. On FreeBSD it's included whenever you register NOTE_EXIT, but on Darwin you need to specify NOTE_EXITSTATUS. I didn't check other BSDs but I assume the most cross-platform way would be to ignore the event and just collect the exit status from waitpid().

njsmith · 2019-04-26T04:13:50Z

We should probably still do a pass over the low-level kqueue APIs to clean them up, but this PR has stalled and the discussion is way out of date (e.g. trio mainline is now actually using kqueue to monitor process termination). So I'm gonna close it, and we can always re-open if that turns out to be useful.

sorcio added 3 commits July 29, 2018 14:10

monitor_kevent

06fd17a

test monitor_kevent

2fcca09

test monitor_kevent on kqueue platforms only

48a91ef

njsmith reviewed Jul 30, 2018

View reviewed changes

njsmith mentioned this pull request Jul 30, 2018

Arbitrary kevent support #578

Open

njsmith mentioned this pull request Feb 6, 2019

Windows: improve usability for low-level IOCP operations #26

Open

3 tasks

njsmith mentioned this pull request Feb 20, 2019

Remove undocumented public API usage of UnboundedQueue #937

Closed

njsmith closed this Apr 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring arbitrary kevents #579

monitoring arbitrary kevents #579

sorcio commented Jul 29, 2018

njsmith commented Jul 30, 2018

njsmith Jul 30, 2018

njsmith commented Jul 30, 2018

sorcio commented Jul 30, 2018

codecov bot commented Jul 30, 2018 •

edited

Loading

njsmith commented Jul 30, 2018

sorcio commented Aug 23, 2018

njsmith commented Apr 26, 2019



		async def test_kqueue_monitor_proc_event():
		p = subprocess.Popen(['sleep', '0.1'])

monitoring arbitrary kevents #579

monitoring arbitrary kevents #579

Conversation

sorcio commented Jul 29, 2018

njsmith commented Jul 30, 2018

njsmith Jul 30, 2018

Choose a reason for hiding this comment

njsmith commented Jul 30, 2018

sorcio commented Jul 30, 2018

codecov bot commented Jul 30, 2018 • edited Loading

Codecov Report

njsmith commented Jul 30, 2018

sorcio commented Aug 23, 2018

njsmith commented Apr 26, 2019

codecov bot commented Jul 30, 2018 •

edited

Loading