-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_asyncio: test_start_tls_server_1() fails randomly #76639
Comments
It seems test_asyncio fails sporadically on AppVeyor: ====================================================================== Traceback (most recent call last):
File "C:\projects\cpython\lib\test\test_asyncio\test_sslproto.py", line 284, in test_start_tls_server_1
asyncio.wait_for(main(), loop=self.loop, timeout=10))
File "C:\projects\cpython\lib\asyncio\base_events.py", line 440, in run_until_complete
return future.result()
File "C:\projects\cpython\lib\asyncio\tasks.py", line 398, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError ====================================================================== Traceback (most recent call last):
File "C:\projects\cpython\lib\test\test_asyncio\functional.py", line 43, in tearDown
self.fail('unexpected calls to loop.call_exception_handler()')
AssertionError: unexpected calls to loop.call_exception_handler() |
I'm leaving on a two-weeks vacation today. To avoid risking breaking the workflow, I'll mask this tests on AppVeyor. I'll investigate this when I get back. |
Please keep this issue open. |
After fresh update and debug rebuild on master(3.7), with Win10 patched last night, I got Traceback (most recent call last):
File "F:\dev\3x\lib\test\test_asyncio\test_events.py", line 290, in test_call_later
self.assertTrue(0.08 <= t1-t0 <= 0.8, t1-t0)
AssertionError: False is not true : 0.07800000000861473 I have not seen this before. Four subsequent runs passed. |
Hmm, the scheduled call was executed sooner than expected. BTW I recall an issue with Windows when call_later() was executed sooner already. |
Note, I think this failure is still happening occasionally on the windows7 buildbot; for example: http://buildbot.python.org/all/#/builders/111/builds/68/steps/3/logs/stdio |
test_start_tls_server_1() just failed on my Linux. It likely depends on the system load. |
It would be nice to fix this bug before Python 3.7.0 final: either skip the test, or fix it. Flaky tests can be very annoying. For example, the full test suite is run to build a Red Hat package. If a single test fails, the package build fails and should be retried whereas it's slow. I understood that fixing the root cause might require to rewrite the test, so I don't expect a quick fix on this test. The issue is open since last December... Maybe we can skip the test but fix it in Python 3.7.1? |
I'm OK to skip it for now. Writing functional tests is super hard because some buildbots are super slow and unpredictable. |
Would you mind to write a PR to skip the PR? So you will feel guilty and will try to remind to fix it! |
Not at my computer right now, can do it tomorrow. |
Yury, are you still planning to address this? |
Last time I looked into this I couldn't reproduce the failures on my Windows 10 VM, so it seems like an AppVeyor-specific problem. I'll take another look on Monday. Andrew, if you have a Windows environment, could you please try to run this test? |
But in msg317468 Victor asserts that it sometimes fails on Linux, too? |
Hm, I missed that. I'll definitely take a look. |
It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet. |
I have not seen test_asyncio fail on AppVeyor since about Sunday. With my local repository 32-bit debug build of master branch: If I run python -m test.test_asyncio -v, If I run python -m test test_asyncio, I get
F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x052DA4D0>
source=self)
F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x0476A118>
source=self)
Exception in callback _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>)
handle: <Handle _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>)>
Traceback (most recent call last):
File "F:\dev\3x\lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "F:\dev\3x\lib\asyncio\proactor_events.py", line 222, in _loop_reading__data_received
self._closing)
AssertionError
test_asyncio passed in 35 sec
1 test OK. With -v, above is scattered in verbose output, but test_start_tls_server_1 passed both times. Specifically, I ran the following 10 times with no failures. |
I presume the above is using the month-old openssl-bin-1.1.0h. |
Terry, you somehow deleted Christian from the nosy list. Christian,
On Linux or Windows? |
Linux |
'deleted somehow': Christian added himself while I had this page open, so when I submitted without refreshing, the nosy list did not include him. When I saw the red warning, I did not notice the nosy list change; I should have. |
Christian, I believe bd17a55 fixes the issue with OpenSSL 1.1.1 (socket.shutdown was missing and asyncio's protocol didn't receive EOF, instead the connection was terminated (as expected though)) |
I just reproduced the issue on Linux. Open 3 terminals and run the commands in parallel: (1) ./python -m test test_asyncio -m test_start_tls_server_1 -F It's a race condition which doesn't depend on the OS, but on the system load. |
Yeah, I agree. The current timeout for the test is 5 seconds and it tries to transfer 2mb of data, which isn't that much actually. I'll bump the timeout and reduce the amount of data transferred in #7130 Thing is, these are the very first *functional* tests for asyncio. Before it was mostly tested with mocks. So we'll need some adjustment period to learn how slow/loaded the CI servers are and what kind of timeouts are safe to use. |
What is the purpose of a timeout of 5 seconds? Some buildbots are really slow. If the purpose is to detect when the test hangs: use a timeout of 5 minutes, or at least 1 minute (60 seconds). |
OK, sounds good, will do that. |
Race condition! I created #7157 to increase the timeout from 5 seconds to 60 seconds. |
#7130 is in. It includes many tests for TLS tests as well as bugfixes to asyncio code. Hopefully this all will make tests more stable. |
While Yury seems unable to reproduce the bug, it's easy for me to reproduce it on Linux. After 2 hours of debugging, I found the root issue: a race condition not in the test, but in asyncio itself! => bpo-33674 "asyncio: race condition in SSLProtocol". |
Should we close this issue now? |
I'm not 100% sure that all issues are fixed, but the tests seem much more reliable yet. I close the issue. If the bug reoccurs, I will reopen the issue or open a new one. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: