-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix test-inspector-port-zero-cluster #13373
test: fix test-inspector-port-zero-cluster #13373
Conversation
Does this fix a bug in the test or is this more of a workaround for a bug in Node.js? Seems like the latter to me, but maybe I'm wrong? |
Stress (fedora24 / -j 64 |
} | ||
process.on('exit', () => { | ||
ports.sort(); | ||
serialFork().then(serialFork).then(serialFork).then(common.mustCall(() => { | ||
assert.strictEqual(ports.length, 3); | ||
assert(ports.every((port) => port > 0)); | ||
assert(ports.every((port) => port < 65536)); | ||
assert.strictEqual(ports[0] + 1, ports[1]); // Ports should be consecutive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're in here fixing up the test, I wonder if we want to change this? It's not really guaranteed that the ports are sequential, is it? If nothing else, won't it wrap around to lower port numbers after 65535? I wonder if we want to change this to a check that we have three unique port numbers rather than sequential port numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are supposed to be sequential #13343 (comment)
It's a bug/feature of cluster
https://github.com/nodejs/node/blob/master/lib/internal/cluster/master.js#L113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooof, thanks for the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test might be more fragile since with --inspect=0
we ask for ports from the ephemeral range, and with --inspect=3993
it's a less used range.
The CI output in #13343 does not show two workers grabbing the same port so I'm not sure this is going to solve the issue. Maybe the worker that fails is conflicting with a pre-existing process on the host? In which case, I'm not sure there's anything we can do about it in the test itself? |
} else { | ||
process.on('message', (message) => { | ||
process.on('message', common.mustCall((message) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.mustCall()
failing in a worker won't do anything, so I wouldn't include it here because it gives the reader a false sense that it's an effective check. (You can test this by passing common.mustCall(fn, 1000)
or something like that.)
I think it narrows down the test-case, to make sure the ports are assigned and are sequential.
Mark as flaky? Add a known issue (create a server listening to |
} else { | ||
process.on('message', (message) => { | ||
process.on('message', common.mustCall((message) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.mustCall() failing in a worker won't do anything, so I wouldn't include it here because it gives the reader a false sense that it's an effective check. (You can test this by passing common.mustCall(fn, 1000) or something like that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at in again and added cluster.on('exit')
with assertions that the worker exited properly.
Maybe an OK workaround would be to check to see if the I'm +1 on creating a (Thanks for tackling this, by the way. I'd probably be going down a completely wrong rabbit hole.) |
Yes, makes sense.
I was in the neighborhood with #12941 |
@Trott I did a mix-and-match
|
P.S. test is still flaky unless we allow for more than 1 fail |
I don't feel so strongly about this that I'd stop this from landing, but: I think blocking one of the ports with a listening socket intentionally kind of changes what this test is about. This is happy-path testing and if we want to check for problems, we should probably set up a different test. |
@nodejs/testing |
Oops, never mind! It is an additional test case. Ignore me. Sorry for the noise. |
Accept for asserting current behavior, I'm also not sure what this test is looking for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #13343 (comment), if my hypothesis is correct, I don't think this PR will materially fix that while rather obscuring the actual feature under test.
cluster.on('online', common.mustCall((worker) => worker.send('debugPort'), 2)); | ||
cluster.on('error', (err) => assert.fail(err)); | ||
cluster.on('exit', (worker, code, signal) => { | ||
assert.ok(!signal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.strictEqual(signal, null)
?
assert.strictEqual(ports[1] + 1, ports[2]); | ||
const oneFailSentinal = common.mustCall(); | ||
cluster.on('online', common.mustCall((worker) => worker.send('debugPort'), 2)); | ||
cluster.on('error', (err) => assert.fail(err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be shortened to cluster.on('error', assert.fail)
but is this event actually emitted? If yes, the documentation in cluster.md is incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
} else { | ||
const sentinal = common.mustCall(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: sentinel (ditto a few lines up.)
if (message !== 'debugPort') return; | ||
process.send({ debugPort: process.debugPort }); | ||
sentinal(); | ||
process.exit(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't call process.exit(), call process.disconnect() and let the worker exit naturally.
Ack. Re: #13373 (comment) |
2b44385
to
90dadd9
Compare
Gate is green, Full CI: https://ci.nodejs.org/job/node-test-pull-request/8458/ |
common.skipIfInspectorDisabled(); | ||
// Assert that even when started with `--inspect=0` workers are assigned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix comment after CI finished
@bnoordhuis @Trott @jasnell @nodejs/testing I've split this in two:
|
I think @bnoordhuis is saying that the behavior is not a bug but rather expected behavior. If so a |
So as I see it:
I agree that (2) is an edge case (the real issue is #12941), and (3) has a fix so if you want I could kick them out. |
If there's consensus that we really do want to provide a way for users to opt out, then my preference for 2 would be that we keep it after all but include a comment that explains something along the lines of "known limitation, currently working as intended, but we need a way to allow the user to opt out of sequential port assignment". If another PR is going to fix the issue in #3, then the test can go right into that PR, but I don't object to it being here instead. |
|
||
// There is an issue when forking workers that try to bind a port for inspector. | ||
// If the port is not free the worker will fail. | ||
// This test should fail with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this comment be clarified a bit? This makes it sound like "If Node.js were working correctly and also addressing this issue, then this test should fail." Maybe instead of "should", something like "This test currently fails with:" or maybe something even more verbose like:
With the current behavior of Node.js (at least as late as 8.1.0), this test fails with the following error:
AssertionError [ERR_ASSERTION]: worker 2 failed to bind port
Ideally, there would be a way for the user to opt out of sequential port assignment and this test would then pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
I can't say that it would fix this specific test flakiness, but what if This approach still has some probability of race-conditions: even if we would guarantee that [checking the port and starting the test] would be atomic, tests will not start to listen on the port right away. But if the port is taken by some system or long-enough-running process, it could do the trick. Separating tests that use zero port and tests that use My opinion is we need to find a way to provide a stable infrastructure to the tests, not to rewrite the tests to the point they would be aware of the problem. |
We will never be able to completely guarantee that a specified port is available for a test before the test actually tries to use it. So the task at hand is to determine a Good Enough solution. I think the solution we have now is adequate and adding more layers of engineering will likely make things worse, not better. Solution now is: If you are using If someone slips up and puts the test in |
PR changed considerably after last review
Wait, shouldn't we disable port incrementing behavior in port 0? If we're letting OS decide which port to allocate, incrementing after that port is not a good idea IMHO. User have no control on that port range => it will lead to port collisions. If master started with |
It was discussed and for now we believe that deterministic worker port allocation makes more sense. #13343 (comment) Hopefully your second PR (manual override) will cover the other use cases. [edit] |
Yeah, I think, users will be able to start master with
I thought that whole idea of |
Validating that |
* re-implemented test to parse args instead of post binding (exit 12) * saved failing case as known issue PR-URL: nodejs#13373 Fixes: nodejs#13343 Reviewed-By: James M Snell <jasnell@gmail.com>
81bc47f
to
fe2caf6
Compare
|
Extra stress on |
@mutantcornholio IMHO your input is as valuable as anyone else's 👍 |
Fixes: #13343
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
test,cluster,inspector