Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel Happy Eyeballs as described in RFC 8305 #48145

Open
tniessen opened this issue May 23, 2023 · 14 comments
Open

Implement parallel Happy Eyeballs as described in RFC 8305 #48145

tniessen opened this issue May 23, 2023 · 14 comments
Labels
feature request Issues that request new features to be added to Node.js. net Issues and PRs related to the net subsystem. never-stale Mark issue so that it is never considered stale

Comments

@tniessen
Copy link
Member

What is the problem this feature will solve?

autoSelectFamily is enabled by default in Node.js 20. However, it does not properly implement parallel connection attempts, which are an integral part of the Happy Eyeballs algorithm. This can cause timeouts to occur that did not occur in versions prior to Node.js 20. See, for example, this discussion.

What is the feature you are proposing to solve the problem?

Implement the algorithm as described in Section 5 of RFC 8305:

Starting a new connection attempt does not affect previous attempts, as multiple connection attempts may occur in parallel.

What alternatives have you considered?

Disabling autoSelectFamily and restoring the pre-20 behavior, see #47644 (comment).

@tniessen tniessen added net Issues and PRs related to the net subsystem. feature request Issues that request new features to be added to Node.js. labels May 23, 2023
@ShogunPanda
Copy link
Contributor

I originally went for the proper parallel approach as states in RFC8305. I chose not to implement it as I had issues with the code.

Consider that implementing proper RFC8305 will imply that all of sudden Node.js starts opening multiple connections instead of one in case of multiple IP returned by DNS, which is dangerous.

The unwanted timeouts are not properly handled in #47860, which will be available in Node.js 20 once I fix #48000 and #47822.

We can of course disable the feature completely by default, but it was my understanding that this was longly awaited to help beginners struggling with poorly configured dual stack networks.

@tniessen
Copy link
Member Author

Consider that implementing proper RFC8305 will imply that all of sudden Node.js starts opening multiple connections instead of one in case of multiple IP returned by DNS, which is dangerous.

Is that a fact? Is there absolutely no way to prevent responding to more than a single address with a TCP ACK packet after having received TCP SYN-ACK?

@ShogunPanda
Copy link
Contributor

Nope, we don't control the connection at that level, AFAIK.

@nodejs/libuv Am I wrong?

@silverwind
Copy link
Contributor

silverwind commented Jun 8, 2023

I think it should be considered disabling this autoSelectFamily by default. It's breaking stuff all over the place. Stabilize first behind a --experimental-auto-select-family or something, but keep this broken stuff out of the upcoming LTS please.

@ShogunPanda
Copy link
Contributor

After 20.3.0 all issues should be fixed. Can we please wait for this release before rushing into decisions?

@silverwind
Copy link
Contributor

The ENETUNREACH issue mentioned here is still unresolved, right? As I understand it, it can only be fixed with parallel connection attempts.

@ShogunPanda
Copy link
Contributor

Not really. We can add a new feature to opt-out A or AAAA in the list of addresses to try when the machine has no IPv4 and IPv6 connectivity respectively.
Note that ENETUNREACH will happen even in previous versions if the DNS only returns AAAA records.

Additionally, to mitigate the problem, a simple fix would be to raise the minimum attempt timeout to 500ms to reduce the possibility that IPv4 fails.

@HinataKah0
Copy link
Contributor

HinataKah0 commented Jun 10, 2023

I was interested so I tried reading the RFC as well.
Just giving my 2 cents of what I understand since I saw the discussion...

TBH I was also confused at first how you can make both "non-simultaneous" and "may occur in parallel" connections.
The first paragraph seems conflicting. 😅

But the key is in 2nd paragraph:

A simple implementation can have a fixed delay for how long to wait before starting the next connection attempt.

So, it's something like this:
image

Which is "non-simultaneous" but also "may occur in parallel".
An alternative/a more sophisticated delay is by using exponential backoff with min & max delay (described in paragraph 3).

@bnoordhuis
Copy link
Member

Is there absolutely no way to prevent responding to more than a single address with a TCP ACK packet after having received TCP SYN-ACK?

Not in a portable fashion. I'm not even sure you could do it reliably if all you cared about was Linux.

The only way I'm aware of is polling getsockopt(TCP_INFO) and checking the tcp_info.tcpi_state field but that's a) timing sensitive, and b) not a viable approach for an asynchronous runtime like Node.js.

@ShogunPanda
Copy link
Contributor

@HinataKah0 Yes, I interpreted in that way as well.
But, as @bnoordhuis said, since we cannot really control the socket once is started, I chose to implement in completely non-overlapped way so that we try to avoid issues and race conditions.

This comment has been minimized.

@tniessen
Copy link
Member Author

On a side note, the implementation in Node.js 20.13.1 also appears to break some node-gyp builds that now fail to download headers from our very own domain nodejs.org (tniessen/node-pqclean#3 (comment)):

gyp http GET https://nodejs.org/download/release/v20.13.1/node-v20.13.1-headers.tar.gz
gyp http fetch GET https://nodejs.org/download/release/v20.13.1/node-v20.13.1-headers.tar.gz attempt 1 failed with ETIMEDOUT
gyp WARN install got an error, rolling back install
gyp ERR! configure error
gyp ERR! stack FetchError: request to https://nodejs.org/download/release/v20.13.1/node-v20.13.1-headers.tar.gz failed, reason:
gyp ERR! stack at ClientRequest.<anonymous> (/usr/lib/node_modules/npm/node_modules/minipass-fetch/lib/index.js:130:14)
gyp ERR! stack at ClientRequest.emit (node:events:519:28)
gyp ERR! stack at _destroy (node:_http_client:880:13)
gyp ERR! stack at onSocketNT (node:_http_client:900:5)
gyp ERR! stack at process.processTicksAndRejections (node:internal/process/task_queues:83:21)

Setting --no-enable-network-family-autoselection seems to fix node-gyp builds.

@ShogunPanda
Copy link
Contributor

Rather than disabling the Happy Eyeballs altogether, you could use --network-family-autoselection-attempt-timeout and see if the situation improves.

@nwalters512
Copy link

nwalters512 commented Sep 4, 2024

Meant to comment this on #54359, sorry. Leaving below for posterity.

This has been discussed elsewhere before, e.g. #52216 and https://github.com/orgs/nodejs/discussions/48028.

I very much agree that Node should provide sensible defaults; I can't see how the current defaults are sensible. As a possibly-helpful datapoint, curl and Python are both able to fetch from https://nodejs.org/download/release/v20.17.0/node-v20.17.0-headers.tar.gz on a machine with high latency, and both (AFAICT) have Happy Eyeballs enabled. That same request fails under Node with the error described on these two threads.

I don't know enough about curl/Python/other runtimes to know what they're doing differently, but it's clear that better default behavior is possible, and I hope that it can work its way into Node, even if it has to wait for a future major version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. net Issues and PRs related to the net subsystem. never-stale Mark issue so that it is never considered stale
Projects
Status: Awaiting Triage
Development

No branches or pull requests

7 participants
@silverwind @ShogunPanda @bnoordhuis @nwalters512 @tniessen @HinataKah0 and others