-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/1.1 server aborts upload request in Node.js 18 #47421
Comments
Thanks for the test case but ECONNRESET means the other end closed the connection. It's pretty much guaranteed that this isn't a node bug. My best guess is you're hitting some kind of firewall rule on Actions and CircleCI. |
@bnoordhuis Thank you so much for thinking about the issue! I'll try to investigate some kind of firewall rule. I captured packets using tcpdump. Here are the last packets: The server sends I ran the two commands:
I hope this would be helpful. Here is the whole pcap file in zip: |
Oh, I understand what you mean now. Node v18.x added a |
Thanks! I'm not sure The client data The solution in the first comment is like: # Wait 5 min
sleep 300
# Run server
node server1.js &
# <sufficient wait for starting the server>
# <the curl command> I just waited 5 min before starting the server in GitHub Actions. But anyway, I figured out |
Those "on data: 1B" events suggest it's feeding the data byte by byte but anyway, happy to hear it's working for you now. I'll close out the issue. |
@bnoordhuis Thank you so much for dealing with the issue. However, my last word may be not accurate. I just found a workaround for the error. |
I can see how such magic incantations can be unsatisfactory. Just so we're on the same page: you agree those "on data: 1B" events mean the data is arriving one byte at a time? Because it makes sense node's slowloris protection kicks in under such conditions, and it also explains why a zero (but not a non-zero) |
I am having the same issue in our upload logic since updaing to node 18. It works fine in node 16. The exact stack trace is this:
Disabling requestTimeout: 0 and headersTimeout: 0 sorts out the issue. But there is a bug in the default server - I have set those timeouts to high numbers but they have no effect (headersTimeout: 5 * 60 * 1000, requestTimeout: 4 * 60 * 60 * 1000) . What's curious is that this does not happen all the time. I can reproduce this often and reliably, it will be reproducible for a couple of hours and then it will start working! I am going crazy. |
@bnoordhuis @ShogunPanda (cc'ed Paolo since he seems to have touched this code after node 17). OK, I have some more concrete information after debugging this a bit. I compiled node from source and put some debug messages in the Expired function - https://github.com/nodejs/node/blob/main/src/node_http_parser.cc#L1114 The code is:
When the upload fails, I got:
The headersTimeout (5mins) and requestTimeout (4 hours) are correctly printed above. I don't understand what now() is. It doesn't match epoch and the libuv docs say it is against an arbitrary reference point. OK. Well, |
request_deadline should always be positive as it subtracted "now" to the "timeout". I think that I never checked if request_timeout (or headers_timeout) might be greater than uv_hrtime(). In that case I think the check is simply not applicable and therefore will be implicitly disabled. We should add the check and the warning for the user if this happen. |
@ShogunPanda I can give it a shot (but give me a day or two). Can you open this bug in the meantime? |
@nwtgck mentioned this only happened in CI environments but not locally. It could be time base-related but it could also be something else. I suggest opening a new issue specifically for the bug you've identified. |
@bnoordhuis my bug report is similar situation (I guess?). This is reproducible only inside the docker container created by our CI. We have a home grown CI that creates VMs on Digital Ocean. Then, one of the services which uses nodejs runs in a docker container, where this bug is seen. @bnoordhuis @ShogunPanda also, can someone tell me what uv_hrtime is ? Is this just a monotonically increasing clock? Can it be say 1 or 2 or any random number on nodejs start up? (as you can see, I am only a node user. I have little idea about node code itself :-) ). |
Yes. The only guarantee at startup is that it's somewhere between 0 and 2**64-1 inclusive. :-) |
@bnoordhuis @ShogunPanda I am trying to understand the code at https://github.com/nodejs/node/blob/main/src/node_http_parser.cc#L1110
If the intent is to prevent DoS attacks, then shouldn't the code be based on "entire request" time and not "last message" time ? |
@gramakri I know is misleading, but in the C++ parser a In other words, |
I'm seeing this as well when deploying an Express app to @superfly. The specific use case is for Express to proxy multi-file large uploads to Cloudinary via stream. # cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/" When using v18.16.0, I'll experience the error. However, rolling back to v16.20.0.
I've confirmed that there's no memory leak as the memory usage stays steady through uploading 70+ files. I wasn't able to replicate this locally on my Mac M1 Pro. |
This can also be replicated using v20.2.0. |
@bnoordhuis Is there a way to reopen this ticket for further investigation? I'm not sure what the workaround would be. |
The same error can be replicated on v16.20.0 if you close the browser during the file transfer. I can replicate this remotely and locally. |
I can reopen but what is there to investigate? If you've identified a bug, you should send a pull request with a fix. |
So how is one supposed to upload large files with nodejs without setting a timeout for the whole server? This is the only place I've found talking about the same issue. At first I thought it was a browser issue but it is not. Trying to upload a big >5GB file to localhost using XMLHttpRequest on the browser with a infinite timeout. This is the ONLY fix I found for it Server.requestTimeout = 2 * 60 * 60 * 1000; How come this can't be set per connection or route? These don't fix it at all by the way Server.on(`connection`, function (socket) {
socket.setTimeout(2 * 60 * 60 * 1000);
});
Server.keepAliveTimeout = 2 * 60 * 60 * 1000;
Server.timeout = 2 * 60 * 60 * 1000;
Server.setTimeout(2 * 60 * 60 * 1000); |
You can't setup a timer on the single request due to how |
Version
18.0.0, 18.15.0
(14.21.3, 16.20.0, 17.9.1 works fine)
Platform
GitHub Actions ubuntu-20.04
Here is
uname -a
:Subsystem
http
What steps will reproduce the bug?
First of all, it is a super strange bug but I made a simple code for reproducing it.
Here is the whole code:
nwtgck/public-code@02b852b...6f51243
Run the following code by
node server1.js
. The server consumes the request body, responds "Finished!" and ends when the request body is read.Upload a text sequence of 1-100 with 10 byte limit as follows.
After approximately 23~26 seconds, the curl stops suddenly.
Here is the server-side stdout which outputs ECONNRESET error.
How often does it reproduce? Is there a required condition?
Use GitHub Actions to reproduce it. My local Ubuntu 20.04 actual machine, GitHub Codespace, M1 Mac and Vagrant on Intel Mac did not reproduce it.
GitHub Actions (without sleep 300)
Here is the result:
The result means Node 14-17 works but Node 18 does not work more than 80%.
The result was created by re-run GitHub Actions 10 times:
https://github.com/nwtgck/public-code/actions/runs/4602240359
GitHub Actions (with sleep 300)
Here is the super strange part. I added 5-minite wait and the server on Node 18 works fine. Both Node 18.0.0 and 18.15.0 worked well 10 times (0/10 failed).
All I did is adding
sleep 300
at the very begining before checkout:nwtgck/public-code@04d99e2
(whole code: nwtgck/public-code@02b852b...04d99e2)
The sleep-result was also created by re-run GitHub Actions 10 times:
https://github.com/nwtgck/public-code/actions/runs/4602334265
What is the expected behavior? Why is that the expected behavior?
The expected behavior is curl uploading successfully:
Additional information
I confirmed CircleCI also has the same issue before simplifying a reproducing code.
machine.image
isubuntu-2004:2023.02.1
andubuntu-2004:202010-01
. I will create a simple CircleCI reproducing code If a Node.js member needs it.The text was updated successfully, but these errors were encountered: