Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-23057: add loop self socket as wakeup fd for signals #11135

Merged
merged 12 commits into from
Dec 18, 2018

Conversation

vladima
Copy link
Contributor

@vladima vladima commented Dec 13, 2018

When wakeup fd is set - receiving the signal will also write its number to a fd. This in turn will unblock the proactor and allow to process pending signal handler.

Is this approach valid in general and if yes - what is the best way to test changes like this? // cc @asvetlov

https://bugs.python.org/issue23057

@asvetlov
Copy link
Contributor

A test for making sure that interruption works would be great to have

@asvetlov
Copy link
Contributor

Looks good!
Can you take a look on failed CI builds?
They are Windows agents, the failure worries me.

@@ -489,6 +490,9 @@ def __init__(self, proactor):
self._accept_futures = {} # socket file descriptor => Future
proactor.set_loop(self)
self._make_self_pipe()
self_no = self._csock.fileno()
if isinstance(self_no, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious when fileno() doesn't return int?
Especially taking into account that self._csock is created by asyncio itself, it is not a user-provided object.

Copy link
Contributor Author

@vladima vladima Dec 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this happening when _csock was mocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix all mocks from asyncio test suite and don't care if a third-party library will use mocking incorrectly,

@vladima
Copy link
Contributor Author

vladima commented Dec 15, 2018

Yes, I'll take a look at CI failures during the weekend - looks odd since all tests were passing locally.
Also after thinking about this issue a bit more I'm inclined to make implementation a bit different. Currently because of wakeup fd being set for a signal - receiving ctrl-c unblocks the proactor and allows signal processing logic in the interpreter loop to kick in and break out of run_forever. At glance it looks exactly what we need however the downside of this - it leaves event loop in an inconsistent state since we have no idea at what point KeyboardInterrupt was raised. For example:

>>> import asyncio
>>> l = asyncio.get_event_loop() # ProactorEventLoop()
>>> l.run_forever() # at this point everything is stuck
# press ctrl-c to break out of `run_forever` - works
>>> l.run_forever() # run it again using the same loop
# press ctrl-c again - now it does not work because proactor does not receive anything from the self socket

I'm not sure how important is this use-case and what are general expectations for the internal state of event loop after breaking out of run_forever but first impression of this behavior - "it does not look right". I think a better option would be instead of relying on default handler of SIGINT we need to set custom handler that will record the fact of receiving ctrl-c and raise KeyboardInterrupt at some well defined point.

@asvetlov
Copy link
Contributor

Selector based loops use the same technique for waking a loop.
But the loop is responsible to read data from wakeup fd.
Does proactor loop do it?

@vladima
Copy link
Contributor Author

vladima commented Dec 17, 2018

Proactor loop does it as well - it schedules reading when self pipe is created. _loop_self_reading will get the future from the recv method of the proactor and subscribe for a completion of the future. Once data is received - _loop_self_reading will call recv and repeat everything.
The issue with ctrl-c - sending data to wakeup fd unblocks the proactor (that is sitting at GetQueuedCompletionStatus of _poll and interpreter loop will react on pending signal by raising KeyboardInterrupt). This breaks out of run_forever but since future returned from the recv was never actually completed, completion callbacks will not be invoked and _loop_self_reading will not be called again so nobody will be reading the data from the self socket - subsequent attempt to call run_forever on the same loop and break out of it will fail.
I think that in order to support multiple calls to run_forever we might:

  • set an empty handler for SIGINT so instead of raising KeyboardInterrupt somewhere in interpreter loop we'll use the signal number read from self socket and if necessary raise exception from _loop_self_reading.
  • ensure that reading from self socket is scheduled prior to 'forever' part of run_forever - it might not be scheduled if previous call to run_forever was interrupted.

@asvetlov
Copy link
Contributor

Thanks for the explanation.

Changing signal handlers is not an option for this particular pull request. I have a feeling that if we change signal handler for one loop implementation -- we should do it for all others. I see a motivation for handling interruption signals differently but the change is a subject for another discussion.

The second option can work.
Can we use a sort of try/finally to cancel incomplete recv future?

@vladima
Copy link
Contributor Author

vladima commented Dec 18, 2018

in last update I've wrapped call to run_forever into try/except that:

  • will cancel self reading future in case if call was interrupted by ctrl-c
  • before starting everything will check if reading from self socket has been started and will start it if necessary

@vladima
Copy link
Contributor Author

vladima commented Dec 18, 2018

@asvetlov please let me know if there is anything else that needs to be addressed.

@asvetlov
Copy link
Contributor

I still recommend removing isinstance(fd, int) check and fix all mocks in asyncio test suite if needed.

@vladima
Copy link
Contributor Author

vladima commented Dec 18, 2018

will do

# and self-reading routine is not hooked up now and needs
# to be restarted
if (self._self_reading_future is not None and
self._self_reading_future.cancelled()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace .cancelled() with .done().
Interruption can occur when the recv() called fut.set_result() but before loop._loop_self_reading() callback executed on the next loop iteration.

self._self_reading_future = None
self.call_soon(self._loop_self_reading)
super().run_forever()
except KeyboardInterrupt:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace except with finally.
Canceling self-reading doesn't make harm if run_forever() was stopped by another exception (or just normal execution finishing).
We will re-schedule self-reading again on next run_forever() call.

@asvetlov
Copy link
Contributor

Or even better:

  1. drop self.call_soon(self._loop_self_reading) from _make_self_pipe()
  2. Implement run_forever() in the following way:
def run_forever(self):
    assert self._self_reading_future is None
    try:
        super().run_forever()
    finally:
        if self._self_reading_future is not None:
            self._self_reading_future.cancel()
            self._self_reading_future = None

What do you think?

@vladima
Copy link
Contributor Author

vladima commented Dec 18, 2018

I think looks plausible if we add call_soon before the super().run_forever(), otherwise nobody will read the data from the self socket.

@asvetlov
Copy link
Contributor

Excellent!

@asvetlov asvetlov merged commit b5c8cfa into python:master Dec 18, 2018
@bedevere-bot
Copy link

@asvetlov: Please replace # with GH- in the commit message next time. Thanks!

@vladima vladima deleted the ctrl-c-proactor branch December 18, 2018 22:39
@jkloth
Copy link
Contributor

jkloth commented Dec 20, 2018

This change broke the AMD64 Windows7 SP1 3.x builder (https://buildbot.python.org/all/#/builders/40). Note that this builder runs as a service so no console is attached.

@jkloth
Copy link
Contributor

jkloth commented Dec 20, 2018

Seems related but different error on x86 Windows7 3.x (https://buildbot.python.org/all/#/builders/58). Both builders are running on Windows 7, which may make a difference as well.

@vladima
Copy link
Contributor Author

vladima commented Dec 20, 2018

will take a look

@vladima
Copy link
Contributor Author

vladima commented Dec 21, 2018

I think #11274 should address the issue.

  • I don't have similar configuration with the builder, is it possible to ask it to do a test build out of the branch?
  • Now PR is mark as failed because it does not have associated bpo and NEWS file, do I need to create them given that this change is in tests?

@vladima
Copy link
Contributor Author

vladima commented Dec 22, 2018

Got my hands on Win7 VM, reproduced the issue and verified that it is fixed by #11274

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants