Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix test-net-* error code check for getaddrinfo(3) #5099

Closed
wants to merge 1 commit into from

Conversation

ncopa
Copy link
Contributor

@ncopa ncopa commented Feb 5, 2016

getaddrinfo(3) may (should?) return EAI_AGAIN on DNS errors. Some
systems apparently returns 'ENOTFOUND' so we check both.

This makes the tests work with musl libc which correctly return
EAI_AGAIN.

For valid errors see:
http://pubs.opengroup.org/onlinepubs/009619199/getad.htm#tagcjh_05_06_05

@r-52 r-52 added net Issues and PRs related to the net subsystem. test Issues and PRs related to the tests. labels Feb 5, 2016
@mscdex mscdex added the dns Issues and PRs related to the dns subsystem. label Feb 5, 2016
@Trott
Copy link
Member

Trott commented Feb 6, 2016

You've actually had these tests fail on Alpine Linux or whatever? It's my impression that the (unfortunate) ENOTFOUND is injected by Node and does not come form any operating system. See, for example,

node/lib/dns.js

Lines 19 to 20 in 4897f94

// FIXME(bnoordhuis) Remove this backwards compatibility nonsense and pass
// the true error to the user. ENOTFOUND is not even a proper POSIX error!

@ncopa
Copy link
Contributor Author

ncopa commented Feb 8, 2016

Yes, this test failed:

Path: parallel/test-net-connect-immediate-finish
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: 'EAI_AGAIN' === 'ENOTFOUND'
    at Socket.<anonymous> (/home/ncopa/src/node/test/parallel/test-net-connect-immediate-finish.js:14:10)
    at Socket.<anonymous> (/home/ncopa/src/node/test/common.js:395:15)
    at Socket.g (events.js:277:16)
    at emitOne (events.js:91:13)
    at Socket.emit (events.js:183:7)
    at connectErrorNT (net.js:990:8)
    at nextTickCallbackWith2Args (node.js:484:9)
    at process._tickCallback (node.js:398:17)

@ncopa
Copy link
Contributor Author

ncopa commented Feb 8, 2016

I suppose this is a better fix then:

diff --git a/lib/dns.js b/lib/dns.js
index bcf6aae..c139a78 100644
--- a/lib/dns.js
+++ b/lib/dns.js
@@ -20,6 +20,7 @@ function errnoException(err, syscall, hostname) {
   // the true error to the user. ENOTFOUND is not even a proper POSIX error!
   if (err === uv.UV_EAI_MEMORY ||
       err === uv.UV_EAI_NODATA ||
+      err === uv.UV_EAI_AGAIN ||
       err === uv.UV_EAI_NONAME) {
     err = 'ENOTFOUND';
   }

It makes both tests pass.

But as the comment there says, it should probably return the real error instead of inject ENOTFOUND.

@Trott
Copy link
Member

Trott commented Feb 8, 2016

But as the comment there says, it should probably return the real error instead of inject ENOTFOUND.

Agreed. That change would be semver-major though so you'd probably want that in a separate commit (and separate PR).

@ncopa ncopa force-pushed the fix-test-net-error-codes branch from 37808f0 to 79e7919 Compare February 11, 2016 11:05
ncopa added a commit to ncopa/node that referenced this pull request Feb 11, 2016
Return the real system error to user insetad of injecting 'ENOTFOUND'
which is not a proper POSIX error.

nodejs#5099 (comment)

Also fix the test suite to check for the corresponding error message.

The hostname '...' was replaced with '***' because '...' returns
inconsistent error message on different system. On GNU libc it intantly
returns EAI_NONAME but on musl libc it returns EAI_AGAIN after a
timeout.
@estliberitas estliberitas force-pushed the master branch 2 times, most recently from 7da4fd4 to c7066fb Compare April 26, 2016 05:23
@ncopa
Copy link
Contributor Author

ncopa commented May 19, 2016

What prevents this from being merged?

@Trott
Copy link
Member

Trott commented May 19, 2016

What prevents this from being merged?

If possible, this could use a test for the code change that is not dependent on the operating system. Since we don't run Alpine Linux in CI (continuous integration), it means that someone could change the code to break it again and we'd never know.

Required: At least one collaborator knowledgable enough about the lib/dns.js code to take responsibility for signing off on it with an LGTM ("looks good to me"). Let's try: @bnoordhuis, @cjihrig, and/or @trevnorris...

It also needs a CI run. Any collaborator giving an LGTM can do that.

@trevnorris
Copy link
Contributor

getaddrinfo(3) may (should?) return EAI_AGAIN on DNS errors.

Commit messages should usually be a little more definitive. ;-)

For testing take a look at test/parallel/test-net-connect-options-ipv6.js. That has example usage. Since there's been a code comment in there since forever that we should remove the node specific error message, not sure we want to add another. Maybe should just take the jump and get rid of it? /cc @bnoordhuis

@cjihrig
Copy link
Contributor

cjihrig commented May 20, 2016

+1 to getting rid of it.

@bnoordhuis
Copy link
Member

I agree, let's get rid of it.

@ncopa
Copy link
Contributor Author

ncopa commented May 20, 2016

@trevnorris @bnoordhuis getting rid of the node specific error message is handled in separate PR, #5196 where it is said that we should deprecate it in v6 and remove it in v7. #5196 (comment)

@cjihrig
Copy link
Contributor

cjihrig commented May 20, 2016

#5196 has already been marked as semver major. You could argue that this is semver major too though. I wonder if we could combine the two by adding the check from this PR and starting the deprecation process.

@ncopa
Copy link
Contributor Author

ncopa commented May 24, 2016

ok. I agree that it is better to fix it properly instead of adding more mess.

How about we change the testcase:

diff --git a/test/parallel/test-net-connect-immediate-finish.js b/test/parallel/test-net-connect-immediate-finish.js
index a540373..6114233 100644
--- a/test/parallel/test-net-connect-immediate-finish.js
+++ b/test/parallel/test-net-connect-immediate-finish.js
@@ -3,14 +3,14 @@ const common = require('../common');
 const assert = require('assert');
 const net = require('net');

-const client = net.connect({host: '...', port: common.PORT});
+const client = net.connect({host: '***', port: common.PORT});

 client.once('error', common.mustCall(function(err) {
   assert(err);
   assert.strictEqual(err.code, err.errno);
   assert.strictEqual(err.code, 'ENOTFOUND');
   assert.strictEqual(err.host, err.hostname);
-  assert.strictEqual(err.host, '...');
+  assert.strictEqual(err.host, '***');
   assert.strictEqual(err.syscall, 'getaddrinfo');
 }));

This is also a completely invalid hostname, but it will give same error (EAI_NONAME) on the different platforms. (I have tested musl libc, gnu libc and osx)

@ncopa ncopa force-pushed the fix-test-net-error-codes branch 2 times, most recently from 8dcdfdd to 78d9224 Compare May 24, 2016 08:44
Replace '...' as invalid hostname with '***', which will give a more
consisten error message on different systems. The hostname '...' returns
EAI_AGAIN on musl libc and EAI_NONAME on most other systems.

By changing the testcase we get same restult on all known platforms.
@ncopa ncopa force-pushed the fix-test-net-error-codes branch from 78d9224 to 519793d Compare May 24, 2016 09:22
@ncopa
Copy link
Contributor Author

ncopa commented May 24, 2016

I rebased the patch and changed the testcase instead. Using '***' as invalid hostname goes around the entire problem and we can let the ugly node specific error message alone.

make testwill pass all tests on alpine linux after this.

@trevnorris
Copy link
Contributor

@ncopa '...' and '***' internally return different error codes, correct? So the point would be to ensure they're both still converted to ENOTFOUND.

@ncopa
Copy link
Contributor Author

ncopa commented May 24, 2016

@trevnorris that is what my original patch did (#5099 (comment)) but the discussion went in a loop and people are too scared to touch the lib/dns.js so nothing happens, and based on how the #5196 discussion goes, I have lost hope that it will ever happen.

trying to resolve the invalid hostname '...' gives different internal error code on different platforms. (EAI_NONAME or EAI_AGAIN). Trying to resolve the invalid hostname '***' gives same internal error code on all platforms I have tested on.

The ENOTFOUND message is bad in the first place, so I think it is no point to ensure that another internal message that appears on some platforms are converted to the ENOTFOUND for a cornercase hostname nobody should use.

@Trott
Copy link
Member

Trott commented May 24, 2016

If I understand correctly, the situation is this:

  • ... does not return ENOTFOUND on Alpine Linux because, under the hood, the error returned is not one of the three that Node.js converts to ENOTFOUND
  • In contrast *** returns ENOTFOUND everywhere.
  • There is another PR for getting rid of the invalid and unfortunate ENOTFOUND but that's kind of a big deal change with a lot of probable ecosystem breakage, so it won't actually be in a release for a long time, if ever.

Sooooooo..... our options here are:

  • Try to do a fix for Alpine Linux to insure that ... returns ENOTFOUND there. That would basically translate a fourth error into ENOTFOUND. I don't like this because I don't think we want to increase the situations where ENOTFOUND is returned. If we're going to alter the conditions that trigger ENOTFOUND, let's reduce them and not increase them.
  • Do a fix for everything else so that ... doesn't return ENOTFOUND. That's risky and that whole conversation is better suited to the other PR about getting rid of ENOTFOUND entirely, because at least the risk there is obvious.
  • Ignore the problem and say that we don't fully support Alpine Linux.
  • Change the two corner-case tests so that we use *** instead of .... This is the approach taken in this PR.

Among the options, I think the one taken in this PR is the best. LGTM, but I'd like to get a validating LGTM from someone who has worked more with the DNS code (which is why I originally looped in @bnoordhuis, @cjihrig, and @trevnorris).

@ncopa
Copy link
Contributor Author

ncopa commented May 25, 2016

@Trott Thank you for a nice summary. I think you completely understand.

  • Try to do a fix for Alpine Linux to insure that ... returns ENOTFOUND there. That would basically translate a fourth error into ENOTFOUND. I don't like this

This was my original approach, and I agree. I don't like this either.

  • Do a fix for everything else so that ... doesn't return ENOTFOUND. That's risky

Correct. It is by far less risky to use another invalid hostname that gives expected result everywhere.

I will try to also fix this in musl libc, because it does not really make sense to try again on the invalid hostname ....

@rvagg
Copy link
Member

rvagg commented Jun 15, 2016

We have Alpine in CI at the moment (won't stay if we can't get this sorted out), the EAI_AGAIN errors are the only ones left standing: https://ci.nodejs.org/job/node-test-commit-linux/3804/nodes=ubuntu1604_docker_alpine34-64

@mhart
Copy link
Contributor

mhart commented Jun 15, 2016

@rvagg can you cherry-pick this commit and double check everything passes?

@rvagg
Copy link
Member

rvagg commented Jun 15, 2016

only one failure https://ci.nodejs.org/job/node-test-commit/3747/ and that was thanks to Jenkins.

@rvagg
Copy link
Member

rvagg commented Jun 15, 2016

I'm going to lgtm this, it's not very invasive and the logic is sound as far as I'm concerned.

rvagg pushed a commit that referenced this pull request Jun 15, 2016
Replace '...' as invalid hostname with '***', which will give a more
consisten error message on different systems. The hostname '...' returns
EAI_AGAIN on musl libc and EAI_NONAME on most other systems.

By changing the testcase we get same restult on all known platforms.

PR-URL: #5099
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Rod Vagg <rod@vagg.org>
@rvagg
Copy link
Member

rvagg commented Jun 15, 2016

landed @ 1a1ff77, thanks @ncopa

And when we get this backported to v4.x the Alpine CI machine won't be regularly red!

@rvagg rvagg closed this Jun 15, 2016
@ncopa ncopa deleted the fix-test-net-error-codes branch June 15, 2016 20:09
ncopa added a commit to alpinelinux/aports that referenced this pull request Jun 15, 2016
needed for fixing dns tests in nodejs test suite
nodejs/node#5099
evanlucas pushed a commit that referenced this pull request Jun 16, 2016
Replace '...' as invalid hostname with '***', which will give a more
consisten error message on different systems. The hostname '...' returns
EAI_AGAIN on musl libc and EAI_NONAME on most other systems.

By changing the testcase we get same restult on all known platforms.

PR-URL: #5099
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Rod Vagg <rod@vagg.org>
@evanlucas evanlucas mentioned this pull request Jun 16, 2016
ncopa added a commit to alpinelinux/aports that referenced this pull request Jun 17, 2016
needed for fixing dns tests in nodejs test suite
nodejs/node#5099

fixes #5725

(cherry picked from commit c0bd1e4)
MylesBorins pushed a commit that referenced this pull request Jun 27, 2016
Replace '...' as invalid hostname with '***', which will give a more
consisten error message on different systems. The hostname '...' returns
EAI_AGAIN on musl libc and EAI_NONAME on most other systems.

By changing the testcase we get same restult on all known platforms.

PR-URL: #5099
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Rod Vagg <rod@vagg.org>
@MylesBorins
Copy link
Contributor

just backported, aiming to have it in 4.4.7

MylesBorins pushed a commit that referenced this pull request Jun 27, 2016
Replace '...' as invalid hostname with '***', which will give a more
consisten error message on different systems. The hostname '...' returns
EAI_AGAIN on musl libc and EAI_NONAME on most other systems.

By changing the testcase we get same restult on all known platforms.

PR-URL: #5099
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Rod Vagg <rod@vagg.org>
@MylesBorins MylesBorins mentioned this pull request Jun 27, 2016
@mhart
Copy link
Contributor

mhart commented Jun 28, 2016

Heh – looks like @ncopa fixed it in Alpine (3.4.1) too: http://bugs.alpinelinux.org/issues/5725 👍

@MylesBorins
Copy link
Contributor

MylesBorins commented Jun 30, 2016

So it would appear that backporting this to v4.4.7 has cause some serious weirdness for local testing... not sure why it didn't come up until after the release.

I have opened a PR to revert #7503

edit: this was a network issue... please ignore me 😄

@Trott
Copy link
Member

Trott commented Aug 11, 2017

@ncopa RFC 2606 reserves the .invalid TLD. Any chance you can confirm that change *** in these tests to host.is.invalid or something like that still works?

@ncopa
Copy link
Contributor Author

ncopa commented Aug 12, 2017

Yes. This works as expected, and IMHO host.is.invalid is much cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dns Issues and PRs related to the dns subsystem. net Issues and PRs related to the net subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.