Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to ThreadedChildWatcher and test #5877

Merged
merged 9 commits into from
Jul 28, 2021

Conversation

sweatybridge
Copy link
Contributor

@sweatybridge sweatybridge commented Jul 14, 2021

What do these changes do?

Switch back to using ThreadedChildWatcher (originally introduced in #5862). Adds a unit test to ensure loop can be setup from non-main thread by pytest-xdist.

Are there changes in behavior for the user?

NA

Related issue number

#5852

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jul 14, 2021
@codecov
Copy link

codecov bot commented Jul 14, 2021

Codecov Report

Merging #5877 (b99f35c) into master (8c5bb41) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #5877   +/-   ##
=======================================
  Coverage   96.75%   96.75%           
=======================================
  Files          44       44           
  Lines        9851     9852    +1     
  Branches     1591     1591           
=======================================
+ Hits         9531     9532    +1     
  Misses        182      182           
  Partials      138      138           
Flag Coverage Δ
unit 96.65% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiohttp/test_utils.py 99.68% <100.00%> (ø)
aiohttp/client.py 94.00% <0.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c5bb41...b99f35c. Read the comment docs.

@AustinScola
Copy link
Contributor

Hey @sweatybridge, can you help me understand this change please? My understanding is that pytest-xdist uses sub-processes and each of the sub-processes run their own event loop. And asyncio.ThreadedChildWatcher handles this scenario better than asyncio.MultiLoopChildWatcher does? Is that an accurate description?

@sweatybridge
Copy link
Contributor Author

Hi @AustinScola, pytest-xdist uses execnet to run workers in subprocesses. According to this issue pytest-dev/pytest-xdist#469, modern versions of execnet no longer guarantees that every worker is run on the main thread of its own subprocess. As a result, when these tests are scheduled occasionally on non-main thread by pytest-xdist, the test will crash due to MultiLoopChildWatcher trying to register signal handlers, which requires main thread. Using ThreadedChildWatcher avoids this issue because it doesn't register signal handlers.

@AustinScola
Copy link
Contributor

@sweatybridge, that makes sense. Thank you for the wonderful explanation!

@sweatybridge
Copy link
Contributor Author

Happy to help! I'm looking forward to xdist support.

@AustinScola
Copy link
Contributor

Is this ready to be merged?

@AustinScola AustinScola mentioned this pull request Jul 18, 2021
5 tasks
@sweatybridge
Copy link
Contributor Author

@webknjaz could you take a look at this and merge it if possible?

CHANGES/5877.bugfix Outdated Show resolved Hide resolved
def test_setup_loop_non_main_thread() -> None:
def target() -> None:
with loop_context():
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we run something in this loop and maybe process some signal? This is what may go wrong potentially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting up the loop context in python 3.7 still uses SafeChildWatcher, which in turn registers signals. I've updated the test to expect failure on python 3.7 and success in 3.8+. This way we have both code paths covered.

Subsequently after merging pytest-xdist, we might see occasional failures on python 3.7 tests in CI for the same reason. This could happen with any tests that use the asyncio event loop, not specific to this test alone. This will be ok once we deprecate 3.7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather vendor a backport and use it under Python 3.7. Also, it's not yet time to drop it anyway. aiohttp 3.8 still supports Python 3.6 even. It'll be EOL in 5 months but for CPython 3.7 there are 2 years to go.
Being a framework/library forces us to support a wider range of versions, unlike projects that are just apps bound to a single env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, agree that backporting is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the backport with manually resolved merge conflicts #5919

try:
with loop_context() as loop:
assert asyncio.get_event_loop() is loop
loop.run_until_complete(test_subprocess_co(loop))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this calls subprocesses but what about registering signal handlers? This is what actually fails on old watchers.

Copy link
Contributor Author

@sweatybridge sweatybridge Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering signal handlers will always fail in non-main thread, regardless of whether we are using old or new watcher. The new watcher avoids this issue by not registering signal handlers at all (using waitpid as alternative).

What we want to ensure in this test is that child processes can be watched, not whether signals can be successfully registered. In other words, since the library no longer registers signal handler on python 3.8, it's not relevant to cover signal registering in tests.

Am I misunderstanding something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The original problem with xdist was exactly the fact that there's a problem with the signal handlers that happens in an inconsistent manner. This is what blocks #5431 and all the previous attempts. I was sure that the new watchers were supposed to fix this based on what @asvetlov mentioned to me privately. Are you sure it's still problematic?

Copy link
Contributor Author

@sweatybridge sweatybridge Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm saying is signal handling is an independent problem from watching child process. The old watcher uses signal handling and will crash when mixed with xdist. The new watcher doesn't use signal handling and therefore is not problematic. Our test only need to verify that child process is watched, it doesn't matter whether it's implemented signals or not.

For reference, here's cpython's implementation for ThreadedChildWatcher https://github.com/python/cpython/blob/3.8/Lib/asyncio/unix_events.py#L1296. It starts a new thread and calls waitpid.

Compare that with SafeChildWatcher which inherits from BaseChildWatcher https://github.com/python/cpython/blob/3.8/Lib/asyncio/unix_events.py#L927. It tries to register signal handler and hence will raise an exception in non-main thread.

Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this "as is" for now. It's an improvement over what we had before. But it'd be really great to tackle that problem with the signals in order to make xdist enabled by default.

@webknjaz webknjaz merged commit 33a38b8 into aio-libs:master Jul 28, 2021
@patchback
Copy link
Contributor

patchback bot commented Jul 28, 2021

Backport to 3.8: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 33a38b8 on top of patchback/backports/3.8/33a38b8c358011fc8bc9198cd62a2a50b69bbc14/pr-5877

Backporting merged PR #5877 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.8/33a38b8c358011fc8bc9198cd62a2a50b69bbc14/pr-5877 upstream/3.8
  4. Now, cherry-pick PR Switch to ThreadedChildWatcher and test #5877 contents into that branch:
    $ git cherry-pick -x 33a38b8c358011fc8bc9198cd62a2a50b69bbc14
    If it'll yell at you with something like fatal: Commit 33a38b8c358011fc8bc9198cd62a2a50b69bbc14 is a merge but no -m option was given., add -m 1 as follows intead:
    $ git cherry-pick -m1 -x 33a38b8c358011fc8bc9198cd62a2a50b69bbc14
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Switch to ThreadedChildWatcher and test #5877 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.8/33a38b8c358011fc8bc9198cd62a2a50b69bbc14/pr-5877
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@webknjaz
Copy link
Member

@sweatybridge could you backport this manually?

sweatybridge added a commit to sweatybridge/aiohttp that referenced this pull request Jul 28, 2021
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
(cherry picked from commit 33a38b8)
asvetlov added a commit that referenced this pull request Oct 28, 2021
* Switch to ThreadedChildWatcher and test (#5877)

Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
(cherry picked from commit 33a38b8)

* Fix event loop access for py310

Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants