-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: investigate flakiness of parallel/test-async-wrap-getasyncid
#13020
Comments
parallal/test-async-wrap-getasyncid
(on windows)
@addaleax @AndreasMadsen is it possible that the anonymous scope + eager |
As far as I understand |
The only explanation that makes sense to me, is that the client connects, sends data, and shutdown itself before the actual server received any connection. Thus the libuv loop doesn't know that there a connection incoming and will just exit the process. We could wait until the server received its connection before running In any case, the |
Yep...
👍 |
We need to wait for both |
@refack was this issue solved? I don't remember. |
I think someone somewhere said something about #12854 might solve this... 🤔 |
I think the script path was spelled wrong, rerunning: https://ci.nodejs.org/job/node-stress-single-test/1240/ |
✔️ More rigorous stress (-j 16 parallel/test-async-wrap-getasyncid X 100): https://ci.nodejs.org/job/node-stress-single-test/1242/nodes=win10/ |
Closing |
Hello form smartos15-64
|
parallal/test-async-wrap-getasyncid
(on windows)parallel/test-async-wrap-getasyncid
(on windows)
parallel/test-async-wrap-getasyncid
(on windows)parallel/test-async-wrap-getasyncid
I am getting a timeout of this test today on MacOS 10.12.4 and can reproduce this throughout the day (I have not rebooted my machine yet):
|
Do you have a way of knowing which case without breaking the spell? (especially interested if it's the one starting at L160) |
node-test-binary-windows > #9523 > 0,vs2015,win2008r2:
|
@refack Bumped into this one again, this line
causes a hang on my machine when ipv6 is disabled automatically since my current ISP does not support that. I guess I could not stably reproduce this before because I keep moving around... BTW |
I'll get to it after my nap 😪 |
Well I napped on it, and I'm now thinking there's probably a non-async-hooks bug somewhere in there, so I'll copy the flaky block to a new file, as I don't want to make this disappear without a solution. Maybe do a lite pummel in the new test (loop a 1000 times or something) |
On CI, with this recent recurrence (perhaps since disabling snapshots?), we're (so far) only seeing it on SmartOS, right? /cc @nodejs/platform-smartos |
Nope. macOS fail is post snapshotpocalypse https://ci.nodejs.org/job/node-test-commit-osx/11049/nodes=osx1010/ |
@misterdjules of course. Even Amazing. This issue has been there for a 2 months with relatively low frequency of failures. And we're stumped. |
From a quick look, it seems to me that there is a race between the client and the server. In the test's current form, the server is closed by the client, because I would think the test currently assumes that if the client successfully wrote data on the client connection and closed it, then the server must have called its connection callback. However, it is not necessarily the case, and I would think the server instance should be closed only when we can be sure that the server ran its connection callback which is...in the connection callback. I'm currently stress testing the following patch: diff --git a/test/parallel/test-async-wrap-getasyncid.js b/test/parallel/test-async-wrap-getasyncid.js
index 6b9d33b..778ca4c 100644
--- a/test/parallel/test-async-wrap-getasyncid.js
+++ b/test/parallel/test-async-wrap-getasyncid.js
@@ -160,6 +160,7 @@ if (common.hasCrypto) {
const stream_wrap = process.binding('stream_wrap');
const tcp_wrap = process.binding('tcp_wrap');
const server = net.createServer(common.mustCall((socket) => {
+ server.close();
socket.on('data', (x) => {
socket.end();
socket.destroy();
@@ -176,7 +177,6 @@ if (common.hasCrypto) {
sreq.oncomplete = common.mustCall(() => {
handle.close();
- server.close();
});
wreq.handle = handle; on SmartOS and I'll report results back. In the meantime I'd appreciate initial feedback on the analysis above. |
|
The test used to assume that if the client successfully writes data to the server and closes the connection, then the server must have received that data and ran its connection callback wrapped by common.mustCall. However, it is not necessarily the case, and as a result the test was flaky. With this change, the server is closed only once the server's connection callback has been called, which guarantees that the server can accept connections until it actually accepted the single connection that this test requires to accept, making the test not flaky. Fixes: nodejs#13020
PR submitted at #14329. |
FWIW, I ran |
Fixed by 2da1af0. |
The test used to assume that if the client successfully writes data to the server and closes the connection, then the server must have received that data and ran its connection callback wrapped by common.mustCall. However, it is not necessarily the case, and as a result the test was flaky. With this change, the server is closed only once the server's connection callback has been called, which guarantees that the server can accept connections until it actually accepted the single connection that this test requires to accept, making the test not flaky. PR-URL: #14329 Fixes: #13020 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Wyatt Preul <wpreul@gmail.com>
The test used to assume that if the client successfully writes data to the server and closes the connection, then the server must have received that data and ran its connection callback wrapped by common.mustCall. However, it is not necessarily the case, and as a result the test was flaky. With this change, the server is closed only once the server's connection callback has been called, which guarantees that the server can accept connections until it actually accepted the single connection that this test requires to accept, making the test not flaky. PR-URL: #14329 Fixes: #13020 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Wyatt Preul <wpreul@gmail.com>
Windows VS2017 is back.
|
And ARM:
https://ci.nodejs.org/job/node-test-binary-arm/10781/RUN_SUBSET=1,label=pi2-raspbian-wheezy/console |
@refack I'm thinking that it might be beneficial to open separate issues for these 2 new occurrences. The reason is that the windows failure is a crash, and so seems like a different problem than the original issue. The failure on ARM seems more similar to the original issue, but it's still different (2 actual calls vs 0 actual call in the original issue), and so the cause might be different. Being able to discuss these separate problems in separate issues could help make discussions less confusing. |
master
smartos15-64
++file:
https://github.com/nodejs/node/blob/master/test/parallel/test-async-wrap-getasyncid.js
falakings
TAP output
Hypothesis
Seems like a
net.server
sometimes doesn't complete it's creation before the test exits...const server = net.createServer(common.mustCall((socket) => {
The text was updated successfully, but these errors were encountered: