Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: use after free in tls_wrap #18860

Merged
merged 1 commit into from
Feb 24, 2018
Merged

Conversation

kfarnung
Copy link
Contributor

@kfarnung kfarnung commented Feb 19, 2018

The root cause is that req_wrap is created in StreamBase::Write
and passed to TLSWrap::DoWrite. In the TLS case the object gets
disposed and replaced with a new instance, but the caller's pointer is
never updated. When the StreamBase::Write method returns, it returns
a pointer to the freed object to the caller. In some cases when the
object memory has already been reused an assert is hit in
WriteWrap::SetAllocatedStorage because the pointer is non-null.

Refs: #18676

This was introduced in @addaleax recent PR (#18676) and hasn't propagated to any release branches yet, so I'm going ahead and opening a PR here.

In node-chakracore this was causing a pretty consistent crash only on macOS.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

tls

@kfarnung kfarnung requested a review from addaleax February 19, 2018 03:39
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Feb 19, 2018
@kfarnung
Copy link
Contributor Author

kfarnung commented Feb 19, 2018

@addaleax
Copy link
Member

addaleax commented Feb 19, 2018

Hi, thanks for catching this! This would work, but the original approach there was kind of hacky anyway, and if it doesn’t work I’d probably not try to accomodate it by changing the API so that it does…

One alternative would be to not re-use the WriteWrap object, and instead potentially create a new one that’s suited for the underlying stream (which makes more sense semantically anyway):

Diff in the fold
diff --git a/src/tls_wrap.cc b/src/tls_wrap.cc
index f2a84b83f32d..8bd15fca01f4 100644
--- a/src/tls_wrap.cc
+++ b/src/tls_wrap.cc
@@ -303,10 +303,12 @@ void TLSWrap::EncOut() {
 
 
 void TLSWrap::OnStreamAfterWrite(WriteWrap* req_wrap, int status) {
-  // Report back to the previous listener as well. This is only needed for the
-  // "empty" writes that are passed through directly to the underlying stream.
-  if (req_wrap != nullptr)
-    previous_listener_->OnStreamAfterWrite(req_wrap, status);
+  if (current_empty_write_ != nullptr) {
+    WriteWrap* finishing = current_empty_write_;
+    current_empty_write_ = nullptr;
+    finishing->Done(status);
+    return;
+  }
 
   if (ssl_ == nullptr)
     status = UV_ECANCELED;
@@ -572,18 +574,17 @@ int TLSWrap::DoWrite(WriteWrap* w,
     // However, if there is any data that should be written to the socket,
     // the callback should not be invoked immediately
     if (BIO_pending(enc_out_) == 0) {
-      // We destroy the current WriteWrap* object and create a new one that
-      // matches the underlying stream, rather than the TLSWrap itself.
-
-      // Note: We cannot simply use w->object() because of the "optimized"
-      // way in which we read persistent handles; the JS object itself might be
-      // destroyed by w->Dispose(), and the Local<Object> we have is not a
-      // "real" handle in the sense the V8 is aware of its existence.
-      Local<Object> req_wrap_obj =
-          w->GetAsyncWrap()->persistent().Get(env()->isolate());
-      w->Dispose();
-      w = underlying_stream()->CreateWriteWrap(req_wrap_obj);
-      return stream_->DoWrite(w, bufs, count, send_handle);
+      CHECK_EQ(current_empty_write_, nullptr);
+      current_empty_write_ = w;
+      StreamWriteResult res =
+          underlying_stream()->Write(bufs, count, send_handle);
+      if (!res.async) {
+        env()->SetImmediate([](Environment* env, void* data) {
+          TLSWrap* self = static_cast<TLSWrap*>(data);
+          self->OnStreamAfterWrite(self->current_empty_write_, 0);
+        }, this, object());
+      }
+      return 0;
     }
   }
 
diff --git a/src/tls_wrap.h b/src/tls_wrap.h
index afd19c027e70..245a6d518ac5 100644
--- a/src/tls_wrap.h
+++ b/src/tls_wrap.h
@@ -152,6 +152,7 @@ class TLSWrap : public AsyncWrap,
   std::vector<uv_buf_t> pending_cleartext_input_;
   size_t write_size_;
   WriteWrap* current_write_ = nullptr;
+  WriteWrap* current_empty_write_ = nullptr;
   bool write_callback_scheduled_ = false;
   bool started_;
   bool established_;

What do you think?

(Edit: The nicenesses of this approach are that Dispose() and DoWrite() aren’t supposed to be called by stream users anyway, whereas Write() is intended only for stream users. The downside is the added field on the TLSWrap structure)

@kfarnung
Copy link
Contributor Author

Thanks Anna, that fix looks a lot cleaner. I'm still reviewing the content, but in the meantime I've applied it to node-chakracore and kicked off a CI run to verify (https://ci.nodejs.org/job/chakracore-test/247/).

@kfarnung kfarnung force-pushed the wrap-use-after-free branch from 0e7c314 to 75ea02d Compare February 19, 2018 18:32
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a little pointless to say it, but this LGTM 😉

@kfarnung
Copy link
Contributor Author

Applied changes to this PR and started a CI: https://ci.nodejs.org/job/node-test-pull-request/13281/

The original changes are still preserved in a commit that can be squashed out.

@kfarnung
Copy link
Contributor Author

Looks like we have clean CIs. Thanks for the suggestion @addaleax, I wasn't feeling great about my hack, but I figured it was a good starting point. This looks a lot cleaner.

@kfarnung kfarnung force-pushed the wrap-use-after-free branch 2 times, most recently from 3bef7f8 to bf569ad Compare February 19, 2018 21:06
@kfarnung kfarnung self-assigned this Feb 20, 2018
@kfarnung
Copy link
Contributor Author

/cc @apapirovski @bnoordhuis @jasnell

Could I get a couple more eyes on this change? I'd like to land it in the next day or so.

@kfarnung kfarnung force-pushed the wrap-use-after-free branch from bf569ad to 0fb138f Compare February 20, 2018 23:21
@kfarnung kfarnung changed the title src: use after free in tls_wrap tls: use after free in tls_wrap Feb 20, 2018
@kfarnung
Copy link
Contributor Author

@BridgeAR BridgeAR requested a review from jasnell February 21, 2018 16:42
@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 21, 2018
@kfarnung kfarnung force-pushed the wrap-use-after-free branch from 0fb138f to b40293b Compare February 22, 2018 01:13
@devsnek
Copy link
Member

devsnek commented Feb 22, 2018

i just came across this crash today (on mac) on 9.5.0 release (and master i built this morning). is it possibly related/fixed by this?

node[26422]: ../src/tls_wrap.cc:621:virtual int node::TLSWrap::DoWrite(node::WriteWrap *, uv_buf_t *, size_t, uv_stream_t *): Assertion `(current_write_) == (nullptr)' failed.
 1: node::Abort() [/usr/local/bin/node]
 2: node::InternalCallbackScope::~InternalCallbackScope() [/usr/local/bin/node]
 3: node::TLSWrap::DoWrite(node::WriteWrap*, uv_buf_t*, unsigned long, uv_stream_s*) [/usr/local/bin/node]
 4: node::http2::Http2Session::SendPendingData() [/usr/local/bin/node]
 5: node::http2::Http2Session::MaybeScheduleWrite()::$_1::__invoke(node::Environment*, void*) [/usr/local/bin/node]
 6: node::Environment::RunAndClearNativeImmediates() [/usr/local/bin/node]
 7: node::Environment::CheckImmediate(uv_check_s*) [/usr/local/bin/node]
 8: uv__run_check [/usr/local/bin/node]
 9: uv_run [/usr/local/bin/node]
10: node::Start(v8::Isolate*, node::IsolateData*, int, char const* const*, int, char const* const*) [/usr/local/bin/node]
11: node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) [/usr/local/bin/node]
12: node::Start(int, char**) [/usr/local/bin/node]
13: start [/usr/local/bin/node]
fish: 'node' terminated by signal SIGABRT (Abort)

@addaleax
Copy link
Member

@devsnek Maybe, but not necessarily. Do you have a reproduction, core dump or something of that sort?

@devsnek
Copy link
Member

devsnek commented Feb 22, 2018

@addaleax it happened while i was working on code for node-fetch, and i'm not familiar enough with the source to really know whats up with it. if it helps i added a wrap with tls.connect which should be automatically choosing between http/1.1 and h2 and i haven't updated the rest of the code base for handling an http2 response stream yet so it would be treating it as regular stream and something might be being called with the wrong arity or types (thats my guess)

@addaleax
Copy link
Member

something might be being called with the wrong arity?

I might be wrong but it doesn’t look that way to me.

The stack trace suggests that HTTP/2 tries to schedule a write on a TLS socket while another write is currently under way, which shouldn’t be happening, and I haven’t seen that pop up anywhere.

It’s most likely a bug that’s caused by me, so I’d be more than interested to figure out how that is happening.

@addaleax
Copy link
Member

@devsnek Also, do you know whether this just started happening with 9.5.0? Or is it older than that?

@devsnek
Copy link
Member

devsnek commented Feb 22, 2018

@addaleax i can go back a few versions

@devsnek
Copy link
Member

devsnek commented Feb 22, 2018

9.4.0/9.5.0/9.6.0/master -> SIGABRT

9.3.0 ->

{ FetchError: request to https://httpbin.org/get failed, reason: socket hang up
    at ClientRequest.<anonymous> (/Users/gus/Desktop/misc/node-fetch/lib/index.js:1447:15)
    at ClientRequest.emit (events.js:159:13)
    at TLSSocket.socketCloseListener (_http_client.js:362:9)
    at TLSSocket.emit (events.js:164:20)
    at _handle.close (net.js:568:12)
    at TCP.done [as _onclose] (_tls_wrap.js:379:7)
  message: 'request to https://httpbin.org/get failed, reason: socket hang up',
  type: 'system',
  errno: 'ECONNRESET',
  code: 'ECONNRESET' }

request to httpbin.org would also mean its using https.request not http2.connect.request

@addaleax
Copy link
Member

To give an update here, we pretty much figured out what’s going on in IRC; it’s not related to this PR.

I’d land this PR by tomorrow evening if nobody objects to that, it fixes a real bug and it would be nice to have that in master.

@kfarnung
Copy link
Contributor Author

Thanks @addaleax! I'll plan to land this tomorrow afternoon unless there are any objections.

@kfarnung kfarnung force-pushed the wrap-use-after-free branch 3 times, most recently from 20f91e3 to 438dae3 Compare February 24, 2018 00:07
@kfarnung
Copy link
Contributor Author

One last CI before landing: https://ci.nodejs.org/job/node-test-pull-request/13353/

The root cause is that `req_wrap` is created in `StreamBase::Write`
and passed to `TLSWrap::DoWrite`. In the TLS case the object gets
disposed and replaced with a new instance, but the caller's pointer is
never updated. When the `StreamBase::Write` method returns, it returns
a pointer to the freed object to the caller. In some cases when the
object memory has already been reused an assert is hit in
`WriteWrap::SetAllocatedStorage` because the pointer is non-null.

PR-URL: nodejs#18860
Refs: nodejs#18676
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@kfarnung kfarnung force-pushed the wrap-use-after-free branch from 438dae3 to 743f890 Compare February 24, 2018 01:11
@kfarnung kfarnung merged commit 743f890 into nodejs:master Feb 24, 2018
@kfarnung
Copy link
Contributor Author

Landed in 743f890

@kfarnung kfarnung deleted the wrap-use-after-free branch February 24, 2018 01:12
addaleax pushed a commit to addaleax/node that referenced this pull request Feb 27, 2018
The root cause is that `req_wrap` is created in `StreamBase::Write`
and passed to `TLSWrap::DoWrite`. In the TLS case the object gets
disposed and replaced with a new instance, but the caller's pointer is
never updated. When the `StreamBase::Write` method returns, it returns
a pointer to the freed object to the caller. In some cases when the
object memory has already been reused an assert is hit in
`WriteWrap::SetAllocatedStorage` because the pointer is non-null.

PR-URL: nodejs#18860
Refs: nodejs#18676
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@addaleax addaleax mentioned this pull request Feb 27, 2018
@addaleax addaleax added tls Issues and PRs related to the tls subsystem. dont-land-on-v6.x and removed author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Apr 4, 2018
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
The root cause is that `req_wrap` is created in `StreamBase::Write`
and passed to `TLSWrap::DoWrite`. In the TLS case the object gets
disposed and replaced with a new instance, but the caller's pointer is
never updated. When the `StreamBase::Write` method returns, it returns
a pointer to the freed object to the caller. In some cases when the
object memory has already been reused an assert is hit in
`WriteWrap::SetAllocatedStorage` because the pointer is non-null.

PR-URL: nodejs#18860
Refs: nodejs#18676
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants