-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Http2 Segmentation fault #29902
Comments
Here is a stack which I catch with debug node:
|
@lomaster1 Thanks for the report! I assume this would fix the debug mode crash, but maybe not the original crash? Can you verify? diff --git a/src/node_http2.cc b/src/node_http2.cc
index 6bd282593e28..0ba88946fc5c 100644
--- a/src/node_http2.cc
+++ b/src/node_http2.cc
@@ -2100,6 +2100,7 @@ void Http2Stream::SubmitRstStream(const uint32_t code) {
void Http2Stream::FlushRstStream() {
if (IsDestroyed())
return;
+ HandleScope handle_scope(env()->isolate());
Http2Scope h2scope(this);
CHECK_EQ(nghttp2_submit_rst_stream(**session_, NGHTTP2_FLAG_NONE,
id_, code_), 0); |
@addaleax thanks for the fix. I applied it and got the same error as in production.
|
@lomaster1 Can you print the contents of |
@addaleax I added this before
And got crash with stack:
In logs:
That means that stream_ is null in some cases. I didn't use valgrind before. All with valgrind will be work too slow then debug node. To reproduce the bug I send a lot of requests to service. |
Okay, thanks for the info so far – that’s definitely helpful. |
Some times I got this error:
on the same steps when I'm repeating segfaults. Maybe it's related to issue. Some additional information: we're using TlsSocket around our own Duplex stream. We wrote own protocol to send/receive data from our proxy server (Http2->TlsSocket->Duplex->Proxy). I send a lot of requests to the service and server CPU is about 95-100% (maybe it's because node is in debug). and all works slow. I noticed when Duplex closed, for example, by timeout, this error or segfault happened. There are also caught errors in the log:
|
That makes it likely that looking at your code would be very helpful – would you be willing to share it, or at least the parts that implement that Duplex stream? |
I can't share the code. |
New information: we have a timer to cancel a long query and we're calling |
In production, this solution (no http2stream.close if timeout) worked not good - segfaults returned. |
I created wrapper (new duplex) around our duplex stream to the proxy and segfaults gone. I'm passing with a wrapper to tls socket and then tls socket to http2.
Also, I have been testing with |
I'm having a similar-looking segfault issue in my server, only when using HTTP/2. @lomaster1 / @addaleax did you ever get to the bottom of this? Reproducible for me in node 10.15.3, 10.21.0 and 12.18.3. I'm also wrapping streams up as sockets in some cases in my app, using the built-in Some example traces, via segfault-handler:
Any clues would be much appreciated! I'm still doing some fairly funky networking tricks in places (unwrapping CONNECT requests, peeking at first bytes of connections), I'm going to remove each of those one by one and see if I can hunt down a clearer culprit. For my case, all the code involved is open-source (see https://github.com/httptoolkit/httpolyglot/blob/master/lib/index.js and https://github.com/httptoolkit/mockttp/tree/h2, especially http-combo-server) but it's tricky to set up - currently my most reliable option for reproducing this is by proxying connections from Tiktok on my Android phone for 2 minutes 😄. |
Ok, I'm not sure if I've narrowed down the root cause, but I've found a reliable workaround. Currently, I accept CONNECT requests over HTTP/1, and then call h1Server.on('connect', (req, socket) => {
socket.write('HTTP/' + req.httpVersion + ' 200 OK\r\n\r\n', 'utf-8', () => {
h2Server.emit('connection', socket);
});
}); If I add Not sure why, I can only guess that using the stream wrapper moves it to the simpler (and slower) stream path, avoiding the underlying bug. Regardless of the root cause, that seems to be sufficient as a workaround for me for now. |
I'm now fairly sure this was fixed by #41185 in v16.14+ and this can be closed. I no longer see this issue in production despite heavy use of all the above. |
You're almost certainly right. Thanks, closing. |
I got it on production (many times per day):
Can you help me investigate why it happens?
What can I do to provide you with information to fix it? I can't use debug node in production, because it's very slow.
The text was updated successfully, but these errors were encountered: