-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test-net-connect-reset-until-connected #43446
Comments
Seems to have gone on strike June 9th, only on https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos20-64/buildTimeTrend |
Unfortunately doesn't seem to be just |
Yes, some other tests are also timing out. I think there is something wrong with that machine/system. @nodejs/build is it possible to restart it? |
ping @nodejs/build this build continues to be a problem |
I saw this test failing over and over in #43366 so rebased against the main branch to see if it fixed it, and it did (based on a single run anyway). So maybe try that if all else fails? |
Not sure if there is something wrong in #43112 since the server side RST test Recent days I'm waiting the ISP starts my services plan. I will investigate the root casuse once my internet service is back (Expected Jul 4.). :D |
This test is also flaky on freebsd, could I mark it as flaky on freebsd also?
not ok 2139 parallel/test-net-connect-reset-until-connectedduration_ms: 120.81 |
Refs: #43446 PR-URL: nodejs/node#43449 Refs: nodejs/node#43446 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Fixes a race condition in test that causes the test to randomly timeout on Solaris 11.4, SmartOS and potentially also FreeBSD. The client resets the connection using conn.resetAndDestroy(). This call is asynchronous and if it's effect occurs before server's listening socket accepts the connection, the test hangs. The fix is to put a synchronization barrier that resets the connection only after it is established on both server and client side. Below is a little bit more about the root cause. I show positive (test works) and negative (when test hangs) scenarios. The output contains only relevant system / library calls that were collected using truss(1). Without the fix the test randomly hangs. With the fix the test completes thousands of runs without single issue. Race condition scenario: ``` connect(23, 0x7FFFBFFF7F10, 32, SOV_XPG4_2)Err#150 EINPROGRESS ^ client socket connects to server close(23)= 0 ^ client closes the socket too early accept(21, 0x00000000, 0x00000000, SOV_DEFAULT)Err#130 ECONNABORTED accept(21, 0x00000000, 0x00000000, SOV_DEFAULT)Err#11 EAGAIN ^ accept on listening socket fails ... test hangs and times out... ``` Working (good) scenario: ``` connect(23, 0x7FFFBFFF7F00, 32, SOV_XPG4_2)Err#150 EINPROGRESS ^ client socket connects to server accept(21, 0x00000000, 0x00000000, SOV_DEFAULT)= 24 ^ server accepts client connection on listening socket close(23)= 0 ^ client socket closes read(24, 0x046B6010, 65536)Err#131 ECONNRESET ^ test gets so much wanted error while reading on accepted FD close(24)= 0 ^ accepted FD closes ... test completes, passes ... ``` Fixes: nodejs#43446
Test
test-net-connect-reset-until-connected
Platform
smartos
Console output
Build links
Additional information
No response
The text was updated successfully, but these errors were encountered: