-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destroy sessions when an error occurs #97
Comments
The code at using
And the codes I added to
|
Hi @sttk ! We are experiencing an issue where got sometimes ends up in a state where none of the requests reach the backend. Instead, requests timeout with timings that look like this:
This looks fishy to us, as "request" might be undefined, but should never be null to our understanding. We suspect that this has something to due with http2 and it sounds similar to what you're describing here:
However, we haven't been able to reproduce the issue locally yet. Did you find a way to reliably reproduce this issue? If so, how did you do that? |
Hi @sttk and @szmarczak ! I was able to reproduce the issue now, please find an example in this repository: https://github.com/thesauri/http2-wrapper-timeout-bug In the example, the got client ends up in a state where requests stop working completely:
To verify that issue is in http2-wrapper, and not got, I tried removing the Also, as a further observation, reducing the number of non-responding requests to a smaller amount (say n=100) does resolve the issue in the example. Unfortunately, we do run into this issue every once in a while in our production environment running with real-world flakiness. The only solutions we've found so far is either disabling HTTP2 completely or restarting the server when this happens. |
Interesting... Commenting out
|
Hi @szmarczak, I have created a reproduction of a similar issue here: https://github.com/mantysalo/http2-issue This only seems to manifest itself when using Got with retry enabled so I opened an issue in the Got repository as well. |
I managed to reproduce this by using just the http2-wrapper library so I closed sindresorhus/got#2310 I have changed the reproduction repository to only use http2-wrapper and node:http2 module. There are some interesting things I've noticed while debugging this: Issue only happens if the Send-Q of the socket is fullWhat I mean by full is that the value under Send-Q in See netstat logs below 👇 Netstat logs
This explains why I saw the issue with got, as got adds extra headers to the request, thus making the Send-Q fill up quicker. If the Send-Q is not full and the socket is closed by the OS, following debug logs are printed:
However, when the Send-Q is full, no such logs are printed and the Http2Session keeps using the socket even though it has been closed by the OS (Weird that this does not produce any errors?) Issue happens with
|
I created a BFF with
got
which sends requests to a backend server on GCP Cloud Run, and encountered unrecovered repeatedETIMEOUT
errors. (ThisETIMEOUT
error was caused by got and was passed tohttp2-wrapper
'sclientRequest._destroy
.)While this error occurred, all request
got
sent did not reach to a backend.According to the logs I embedded in a forked
http2-wrapper
, while the error occurred, sessions were always gotten fromagent.sessions
.On the other hand, when an error occurred but a connection recovered, the sessions was newly created.
I could achieve this recovering behavior forcedly by adding the codes in
clientRequest._destroy
, which destroys sessions inclientRequest.agent.sessions
, delete entries inclientRequest.agent.queue
, and delete a session inclientRequest.agent.tlsSessionCache
by an origin (clientRequest[kOrigin]
).However this modification is not perfect because there was a time when unrecovered errors occurred. (at that time, sessions were gotten from
agent.sessions
.)Is it not necessary to add such process to destroy sessions and remove them from queue, etc. when an error occurs?
P.S. This unrecovered errors never occurred when setting
.agent = false
ofhttp2.auto
's options.The text was updated successfully, but these errors were encountered: