Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tracking issue] src: clean up resources on Environment teardown #19422

Closed
gireeshpunathil opened this issue Mar 18, 2018 · 15 comments
Closed

[tracking issue] src: clean up resources on Environment teardown #19422

gireeshpunathil opened this issue Mar 18, 2018 · 15 comments
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. embedding Issues and PRs related to embedding Node.js in another project.

Comments

@gireeshpunathil
Copy link
Member

  • Version: NA
  • Platform: NA
  • Subsystem: NA

This is to track #19377 exclusively, gets resolved when that PR lands. I am expecting a few runtime errors to come up, and opening this to leave the PR discussions free from debugging discussions.
I have developed a small test case that invokes node::Environment::RunCleanup at some arbitrary program control point. To start with, I have this code:

#cat test.js

setTimeout(() => {
  require('./term').t()
}, 1000)

where term is small module that wraps a method t through term::Terminate with the sole purpose of calling node::Environment::RunCleanup() . I will come up with some scenarios to invoke this through a variety of means that mimics launch and re-lauch of node in different possible ways - goal is to have good degree of resilience and stability for emedders.

/cc @addaleax

@gireeshpunathil gireeshpunathil added embedding Issues and PRs related to embedding Node.js in another project. c++ Issues and PRs that require attention from people who are familiar with C++. labels Mar 18, 2018
@gireeshpunathil
Copy link
Member Author

Issue 1. SIGSEGV in node::Environment::CleanupHandles()

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x0000000100a3e782 libnode.62.dylib`node::Environment::CleanupHandles() [inlined] node::ReqWrap<uv_req_s>::Cancel(this=0x510000230d3b87f2) at req_wrap-inl.h:48 [opt]
    frame #1: 0x0000000100a3e782 libnode.62.dylib`node::Environment::CleanupHandles(this=0x0000230d3b8822e1) at env.cc:234 [opt]
    frame #2: 0x0000000100a3eb3d libnode.62.dylib`node::Environment::RunCleanup(this=0x0000230d3b8822e1) at env.cc:315 [opt]
    frame #3: 0x00000001023dfd3f term.node`term::Terminate(args=<unavailable>) at term.cc:20 [opt]

Please let me know what to look for.

@gireeshpunathil
Copy link
Member Author

here is the debug dump - basically we got a bad piece of memory when iterating over req_wrap_queue_.

(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 76788 exited with status = 9 (0x00000009) 
Process 76851 launched: '../node_g' (x86_64)
recycling node!
Process 76851 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001019ef2e9 libnode.62.dylib`node::Environment::RunCleanup(this=0x00003782c44822e1) at env.cc:315
   312 	}
   313 	
   314 	void Environment::RunCleanup() {
-> 315 	  CleanupHandles();
   316 	
   317 	  while (!cleanup_hooks_.empty()) {
   318 	    // Copy into a vector, since we can't sort an unordered_set in-place.
Target 0: (node_g) stopped.
(lldb) step
Process 76851 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001019ee3f9 libnode.62.dylib`node::Environment::CleanupHandles(this=0x00003782c44822e1) at env.cc:233
   230 	}
   231 	
   232 	void Environment::CleanupHandles() {
-> 233 	  for (ReqWrap<uv_req_t>* request : req_wrap_queue_)
   234 	    request->Cancel();
   235 	
   236 	  for (HandleWrap* handle : handle_wrap_queue_)
Target 0: (node_g) stopped.
(lldb) p req_wrap_queue_
(node::Environment::ReqWrapQueue) $28 = {
  head_ = {
    prev_ = 0xe100003782c44822
    next_ = 0x5100003782c44822
  }
}
(lldb) n
Process 76851 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x00000001019ee46c libnode.62.dylib`node::Environment::CleanupHandles(this=0x00003782c44822e1) at env.cc:234
   231 	
   232 	void Environment::CleanupHandles() {
   233 	  for (ReqWrap<uv_req_t>* request : req_wrap_queue_)
-> 234 	    request->Cancel();
   235 	
   236 	  for (HandleWrap* handle : handle_wrap_queue_)
   237 	    handle->Close();
Target 0: (node_g) stopped.
(lldb) p request
(node::ReqWrap<uv_req_s> *) $29 = 0x5100003782c447f2
(lldb) me read 0x5100003782c447f2
error: memory read failed for 0x5100003782c44600
(lldb) 

@gireeshpunathil
Copy link
Member Author

Issue 2. v8::Platform pointer not recycled

This is currently shielded by issue 1. Looks like the platform_ needs to be released as part of RunCleanup() which is not happening currently.

#./foo test.js
starting 1th iteration.
recycling node!
recycling complete!
completed 1th iteration.
starting 2th iteration.


#
# Fatal error in ../deps/v8/src/v8.cc, line 96
# Check failed: !platform_.
#
Illegal instruction: 4
#lldb ./foo
(lldb) target create "./foo"
Current executable set to './foo' (x86_64).
(lldb) r test.js
Process 2067 launched: './foo' (x86_64)
starting 1th iteration.
recycling node!
recycling complete!
completed 1th iteration.
starting 2th iteration.


#
# Fatal error in ../deps/v8/src/v8.cc, line 96
# Check failed: !platform_.
#
Process 2067 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x0000000101d969e1 libnode.62.dylib`v8::base::OS::Abort() at platform-posix.cc:372
   369 	
   370 	void OS::Abort() {
   371 	  if (g_hard_abort) {
-> 372 	    V8_IMMEDIATE_CRASH();
   373 	  }
   374 	  // Redirect to std abort to signal abnormal program termination.
   375 	  abort();
Target 0: (foo) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x0000000101d969e1 libnode.62.dylib`v8::base::OS::Abort() at platform-posix.cc:372
    frame #1: 0x0000000101d84a57 libnode.62.dylib`V8_Fatal(file="../deps/v8/src/v8.cc", line=96, format="Check failed: %s.") at logging.cc:136
    frame #2: 0x0000000101505e4d libnode.62.dylib`v8::internal::V8::InitializePlatform(platform=0x0000000105300070) at v8.cc:96
    frame #3: 0x0000000100362655 libnode.62.dylib`v8::V8::InitializePlatform(platform=0x0000000105300070) at api.cc:6131
    frame #4: 0x0000000101a34673 libnode.62.dylib`node::$_0::Initialize(this=0x0000000102dec4f0, thread_pool_size=4) at node.cc:290
    frame #5: 0x0000000101a34039 libnode.62.dylib`node::Start(argc=2, argv=0x0000000105300000) at node.cc:4631
    frame #6: 0x0000000100000eca foo`main(argc=2, argv=0x00007fff5fbffaf0) at foo.cc:14 [opt]
    frame #7: 0x0000000100000e7c foo`start + 52
(lldb) f 2
frame #2: 0x0000000101505e4d libnode.62.dylib`v8::internal::V8::InitializePlatform(platform=0x0000000105300070) at v8.cc:96
   93  	
   94  	
   95  	void V8::InitializePlatform(v8::Platform* platform) {
-> 96  	  CHECK(!platform_);
   97  	  CHECK(platform);
   98  	  platform_ = platform;
   99  	  v8::base::SetPrintStackTrace(platform_->GetStackTracePrinter());
(lldb) p platform_
(v8::Platform *) $0 = 0x0000000105200170

@addaleax
Copy link
Member

@gireeshpunathil It’s hard to tell you where to look without knowing how the code looks like…

Regarding the second item, it sounds like Node’s trying to initialize the V8 platform again? That shouldn’t happen anyway, it’s basically a per-process thing, not per-Environment.

@gireeshpunathil
Copy link
Member Author

@addaleax - code is trivial, a native addon that just calls node::Environment::RunCleanup()

void Terminate(const FunctionCallbackInfo<Value>& args) {
  Isolate* isolate = args.GetIsolate();
  fprintf(stderr, "recycling node!\n");
  node::Environment *env = static_cast<node::Environment*>(
      isolate->GetCurrentContext()->GetAlignedPointerFromEmbedderData(33));
  env->RunCleanup();
  fprintf(stderr, "recycling complete!\n");
} 

and the addon is invoked as:

setTimeout(() => {
  require('./term').t()
}, 1000)

For the second: I just want to point out that embedders may invoke node through a plurality of means. In the absence of a clear guideline, I could call node::Start, I could mimic a portion of what node::Start() does and then invoke node::Init() so on and so forth.

  • can we, under this PR, define what is the endorsed way of embedding node?

  • In the simplest scenario where embedders don't know anything about node's internals and invoke node under single-env model, it is desirable to have RunCleanup as a static method which works on the single prevalent environment.

  • In the same scenario, it is desirable to be able to tear down node completely (force return from node::Start) and re-invoke again - the effect of node::Dispose() being invoked.

@addaleax
Copy link
Member

isolate->GetCurrentContext()->GetAlignedPointerFromEmbedderData(33)

Shouldn’t that be 32?

can we, under this PR, define what is the endorsed way of embedding node?

I would like to get there, yes, but I think it’s out of scope for #19377 – there’s a lot more to be done to have a proper API. :/

Currently, embedders would basically have to re-implement the Start(Isolate* isolate, IsolateData* isolate_data, …) overload of this terribly-named function.

In the simplest scenario where embedders don't know anything about node's internals and invoke node under single-env model, it is desirable to have RunCleanup as a static method which works on the single prevalent environment.

Let’s not introduce any more pseudo-global state to Node. If we want to have this, We should provide a way to get the current Environment*, and expose it as a function that acts on such a an Enviroment* pointer.

@gireeshpunathil
Copy link
Member Author

thanks @addaleax - with NODE_CONTEXT_EMBEDDER_DATA_INDEX instead of NODE_CONTEXT_SANDBOX_OBJECT_INDEX the first issue is resolved.

For the second: without the ability to (and test) recycle node in-process, we cannot claim a proper tear-down IMO. I guess it may be just the re-initializing of platform_ object. As Environment->RunCleanup() is the not the right place for tearing down the platform, how can one achieve that, if at all?

@addaleax
Copy link
Member

@gireeshpunathil Platform management is the embedder’s responsibility, neither Node nor an addon can do anything about that.

For the second: without the ability to (and test) recycle node in-process, we cannot claim a proper tear-down IMO.

Could you describe what you mean by “recycle” and “tear down”?

@gireeshpunathil
Copy link
Member Author

teardown: bring down the machine state (including the resources) prior to node::Start
recycle: being able to re-enter node::Start

@addaleax
Copy link
Member

recycle: being able to re-enter node::Start

I mean, for that we’d probably want to just create another environment instance, right?

@addaleax
Copy link
Member

I think it sounds like what you want isn’t really a way to call RunCleanup() from JS, but rather a way to force the event loop to stop, and RunCleanup() being called at its end automatically?

@gireeshpunathil
Copy link
Member Author

@addaleax - I believe this is where our conceptual difference:

  • For you, the tear-down and re-launch is confined into an Environment.
  • For me, the tear-down and re-lauch is for the entire node runtime, callable through the most outer entry point method possible, with all the code excluded under #if defined (NODE_SHARED_MODE) being re-used.

If I understand correctly, effectively Environment teardown === node tear down, in the most common use case for node (single-env) right? So for the embedders that does not have sufficient insight and control over node internals, the easy thing to perform is to recycle everything and start with fresh configuration.

Example: An embedder offloaded some I/O bound workload to node under normal conditions. After monitoring the resource usage and the throughput, it would want to re-spin node with modified input (configuration and arguments.) So rather thean working with Environment, the more easy thing would be to work at the platform level itself.

What do you think?

@addaleax
Copy link
Member

If I understand correctly, effectively Environment teardown === node tear down, in the most common use case for node (single-env) right?

We should provide embedders with ways to achieve both, I think. But generally, Node maintains only a really small portion of its state in a global manner instead of per-Environment, so it might be good to know what kind of options you'd have in mind for this?

After monitoring the resource usage and the throughput, it would want to re-spin node with modified input (configuration and arguments.)

It's come up in other PRs that we should probably be separating between options that are per-Environment and options that are per-Isolate (or even global). All of these exist, and we don't really have a consistent story around those. (I've been sketching some ideas around that, but that's still very much an early work in progress.)

So, if you're concerned about options, it probably depends on which you think an embedder might tune for re-use...?

@gireeshpunathil
Copy link
Member Author

gireeshpunathil commented Mar 19, 2018

@addaleax - thanks for the clarification / guidance.

Currently, embedders would basically have to re-implement the Start(Isolate* isolate, IsolateData* isolate_data, …)

...

Platform management is the embedder’s responsibility, neither Node nor an addon can do anything about that.

When I did a debug walk-through, I see that for an embedder to manage this, it will end up doing:

  1. define signal handling
  2. define exit handling
  3. define stdios
  4. register builtin modules
  5. parse env and argv
  6. initialize v8 platform
  7. initialize performance counter
  8. define the uv loop
  9. create Array buffer allocator
  10. create isolate
  11. bind isolate, loop, v8, allocator ...

or copy the code from node.

I believe this will be too much to expect from an embedder because:

  • it assumes that the embedder knows the fine grained details of the internal components
  • it necessitates the embedder to repeat node's orchestration of these components
  • It will look like as though it embeds each of these components separately, not node as a whole.

Ideally, an embedder will be interested in:

  • retain uv loop, node platform and v8, but respin environments
    • reconfigure and restart app
    • retain / reuse a number of things (isolate, loop, inspector, ...)
  • retain node platform alone and respin the rest
    • reconfigure v8, and restart the app
    • retain /reuse few things (C env, built in modules, stdios, ...)
  • respin everything
    • reconfigure and restart node
    • reuse only the loaded DLL text!

I understand the scope of #19377 does not cover these, but my original issue #19365 does.

I've been sketching some ideas around that, but that's still very much an early work in progress.

thanks for that, so at the moment, to satisfy #19365 minimally, is it possible to:

  • provide an easy way for cleaning up Environment. I used the error-prone hack here because Environment::GetCurrent() in an addon code throws up cyclic dependancy in the compiler.

  • provide a way to shutdown the node platform, in conjunction with env->RunCleup() , to be invoked from outside of node.dll

@gireeshpunathil
Copy link
Member Author

this should have been closed long back; as its stated purpose is achieved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. embedding Issues and PRs related to embedding Node.js in another project.
Projects
None yet
Development

No branches or pull requests

2 participants