Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: don't call into VM from AsyncWrap destructor #9467

Closed
wants to merge 2 commits into from

Conversation

bnoordhuis
Copy link
Member

@bnoordhuis bnoordhuis commented Nov 4, 2016

R=@trevnorris?

See #8216 and #9465, it's making node crash.

It might be possible to retain the destroy hook by maintaining a per-environment list + a uv_idle_t handle or something like that but it's a lot of work and there might be edge cases so I'm opting for simply removing the hook. Whoever disagrees volunteers to do the hard work. :-)

CI: https://ci.nodejs.org/job/node-test-pull-request/4782/

It is not allowed anymore to call JS code when collecting weakly
persistent handles, it hits the assertion below:

    # Fatal error in ../deps/v8/src/execution.cc, line 103
    # Check failed: AllowJavascriptExecution::IsAllowed(isolate).

Remove the call into the VM from the AsyncWrap destructor.  This commit
breaks the destroy hook but that cannot be helped.

Fixes: nodejs#8216
@bnoordhuis bnoordhuis added async_wrap async_hooks Issues and PRs related to the async hooks subsystem. labels Nov 4, 2016
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Nov 4, 2016
@mscdex
Copy link
Contributor

mscdex commented Nov 4, 2016

Shouldn't the other destroy hook-related code be removed also (e.g. setting of the destroy hook in async-wrap.cc and the relevant persistent string in env.h)?

@AndreasMadsen
Copy link
Member

The destroy hook is a rather important feature in async_wrap. Not just from a performance perspective (the usecase in trace), but tools like dprof would simply not be possible without the destroy hook.

Given that async_wrap is an undocumented API I don't think that removing the destroy hook is urgent and would rather consider it a last resort. If somebody uses async_wrap, one should also accept the associated risk. In trace for example, I say that using it production is strongly discouraged.

/cc @nodejs/diagnostics

@bnoordhuis
Copy link
Member Author

bnoordhuis commented Nov 4, 2016

Shouldn't the other destroy hook-related code be removed also (e.g. setting of the destroy hook in async-wrap.cc and the relevant persistent string in env.h)?

I can do that, I'll update the PR. EDIT: https://ci.nodejs.org/job/node-test-pull-request/4787/

Given that async_wrap is an undocumented API I don't think that removing the destroy hook is urgent and would rather consider it a last resort. If somebody uses async_wrap, one should also accept the associated risk.

Yeah, I don't buy that. People have filed bug reports about it twice now and in both cases it was a module somewhere in their dependency chain, not something they were using directly.

@AndreasMadsen
Copy link
Member

Yeah, I don't buy that. People have filed bug reports about it twice now and in both cases it was a module somewhere in their dependency chain, not something they were using directly.

I wellcome a diffrent perspective on the policy regarding undocumented API. But trace is not something one would have deep in the dependency chain. It's something that is used directly, it actually doesn't have an API (beyond require('trace')).

@holm
Copy link

holm commented Nov 4, 2016

I reported one of the bugs. We use cls-hooked, which in turn uses async-hook, which then uses AsyncWrap. Maybe I didn't read the readme's very carefully, but as far as I can see there is nothing there warning about it being unstable or not fit for production use.

We use it to track transactions through requests, and it's not in anyway optional for us. We haven't observed any issues in production on 6.x.

In my opinion it's not a good idea to have unstable API's enabled by default, even if they are not documented. Similar to V8 experimental features, I would think a flag should be set, so users are clearly aware they are using features not recommended for production use.

@addaleax
Copy link
Member

addaleax commented Nov 4, 2016

Given that async_wrap is an undocumented API I don't think that removing the destroy hook is urgent and would rather consider it a last resort.

#8216 has been open for quite a while with no sign of movement, and as it stands, the number of people willing to work on the async_hooks parts of the codebase seems to be rather overseeable. So, yeah, it sucks, but right now I don’t see a better way than this.

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a green CI

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Nov 4, 2016

Could you maybe explain how this fixes the issue? The modules that uses destroy are still going to fail, just for another reason. I don't think the main issue is that destroy doesn't work, it's that people are using undocumented API without being aware of it. I would suggest that we print a warning when setupHooks() is called, similar to deprecated functionality.

edit: I have published a new version of async-hook which prints a warning when used. The documentation already said it, but now indirect users will know it too.

#8216 has been open for quite a while with no sign of movement, and as it stands, the number of people willing to work on the async_hooks parts of the codebase seems to be rather overseeable.

I'm a little sad that @nodejs/diagnostics wasn't cc'ed on this. This makes it difficult for us to function as a working group.

@addaleax
Copy link
Member

addaleax commented Nov 4, 2016

I'm a little sad that @nodejs/diagnostics wasn't cc'ed on this. This makes it difficult for us to function as a working group.

Sorry, yeah. So far the only person I associated with async_hooks are Trevor (and later you); here’s a PR to tell people to @mention the diagnostics team: #9471

@Qard
Copy link
Member

Qard commented Nov 4, 2016

From APM perspective, this is probably fine. The destructor is nice for tracking handle lifetimes, but doesn't really matter for transaction tracing. I feel like there definitely should've been more care put into async_wrap in regard to runtime warnings and/or flag gating though. If a feature can be used, it will be used, even when not documented. All the people running into issues using async/await flag or generators in early koa days are plenty proof of that.

Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this remove the destroy hook? Will this impact the async_hooks PR?

In either case, removing the hook renders async_wrap/async_hooks virtually useless in many cases and should be considered a last resort.

I would strongly prefer not to merge this until @trevnorris reviews it. He's out atm I think but I'll try to make sure he's on this at least first thing Monday.

@bnoordhuis
Copy link
Member Author

Could you maybe explain how this fixes the issue? The modules that uses destroy are still going to fail, just for another reason. I don't think the main issue is that destroy doesn't work, it's that people are using undocumented API without being aware of it.

The problem is that it's categorically unsafe to call into the VM during GC, which is what the AsyncWrap destructor does when it calls the destroy hook. It's never been safe but it slipped through review before V8 started enforcing it.

@Qard
Copy link
Member

Qard commented Nov 4, 2016

Triggering the destroy hook in the next tick using uv_idle_t seems like a reasonable approach to me. As long as the handle itself is not made accessible in any way within the destroy hook, I can't see any edge cases to worry about.

Does that sound reasonable? The queued destroy hook would only be aware of the id to notify the JS side about.

@jasnell
Copy link
Member

jasnell commented Nov 7, 2016

+1 to waiting to land until @trevnorris can have an opportunity to review.

@trevnorris
Copy link
Contributor

Back. I'll look into this more today, but the short answer is I've already been working on removing destroy on GC for a different issue. Though at that point it's more like "done" or "complete". Which I'm fine changing the name to.

@bnoordhuis for a short term solution we should be able to use the same uv_idle_t that setImmediate uses. The destructor will check if the handle is weak. If so then place on the list, if not execute the callback immediately. Sound conceptually sane?

@bnoordhuis
Copy link
Member Author

I don't think "non-weak == can call into the VM" is a safe assumption. Hanging it off a uv_idle_t: seems okay but what if env->async_hooks_destroy_function() changes between time of collection and time of dispatch?

@addaleax
Copy link
Member

addaleax commented Nov 8, 2016

but what if env->async_hooks_destroy_function() changes between time of collection and time of dispatch?

Is there a meaningful difference from the way things are right now? I mean, my impression is that V8 can basically run GC whenever it wants to, so there would be no real way to tell a delayed execution of the hook from a delayed invocation of the GC?

I don't think "non-weak == can call into the VM" is a safe assumption.

Yeah, I’d just queue up all destroy hooks in the uv_idle_t.

@trevnorris
Copy link
Contributor

@bnoordhuis

I don't think "non-weak == can call into the VM" is a safe assumption.

Fair enough. All destroy callbacks can be placed in the uv_idle_t. Makes things more uniform anyway.

Hanging it off a uv_idle_t: seems okay but what if env->async_hooks_destroy_function() changes between time of collection and time of dispatch?

Using some flag magic. Each active set of hooks is assigned an id. If a hook is added/removed then increment the id. Likewise have a flag that indicates whether a destructor ran and depends on the current state of hooks. If this flag is set that a destructor has been called when a hook is added/removed then make a clone of the array of hooks. The only pairing needed is the id of the handle calling destroy, and the id of the hook's state. On the next loop, run through the array of hooks and call them for any associated id.

Cost of this if no hooks are active is zero, and not noticeable even if there are active hooks.

@addaleax

[...] there would be no real way to tell a delayed execution of the hook from a delayed invocation of the GC?

Can you elaborate on this?

@addaleax
Copy link
Member

addaleax commented Nov 9, 2016

Can you elaborate on this?

Mh, basically: If we decided to delay the invocation of the destroy hook until the next event loop iteration, how would the async_hooks consumer (or whatever the right word for that is) be able to tell whether the destroy hook was called later because we artificially delayed it or whether the GC run itself “accidentally” occurred at that later point in time?

@bnoordhuis
Copy link
Member Author

Is there a meaningful difference from the way things are right now?

I don't think so but I figured I'd bring it up anyway.

@trevnorris
Copy link
Contributor

Actually, @bnoordhuis, just realized I combined two angles of approach. Thing is that GC can no longer be allowed to trigger destroy. Because users may have references to the resource. In which case GC won't be able to clean them up, and never allow destroy to fire. So basically it always need to be triggered manually. Makes the name destroy seem poor, but we can change that (how about just done?).

With this in mind, what measurable circumstance would there be for when the destructor can't call into JS? I'd like to setup a test and begin tracing when it may not be appropriate for the destructor to call destroy.

@addaleax

how would the async_hooks consumer (or whatever the right word for that is) be able to tell whether the destroy hook was called later because we artificially delayed it or whether the GC run itself “accidentally” occurred at that later point in time?

If we need to go the route of delaying destroy calls, then we'll delay all of them. This shouldn't affect the utility of the call much because:

  • The most important reason for destroy was to notify the user when, say, a Map of id's and associated resources could be cleaned up. For this, exact timing is not necessary.
  • If timing is important then we could simply record the uv_hrtime() when the destructor is called and pass that in as the second argument to destroy. It would only trigger if any hooks were active, so no additional cost for those not using it.

@bnoordhuis
Copy link
Member Author

We got another report, #9599. If there isn't any progress on an alternative pull request in the next few days, I'm going to go ahead and land this. I'd really like to see this fixed before the next release.

@danscales
Copy link

@bnoordhuis Not sure if this is the best place to ask, but is it possible that the changes to V8 that caused this bug (a change in how/when v8 is collecting weakly persistent handles) might also affect the correctness of the node-weak module (https://github.com/TooTallNate/node-weak) for newer versions of v8/node? We are running into segfaults in the garbage collector in our stressful node application that uses node-weak when we run on node 6.9.1, but we didn't have any such issue when running on node v0.12. This is true even if we get rid of any use of node-weak callback functions. We may also be getting such segfaults in node 4.5.0, but if so, they are much, much rarer.

I realize that the gc test code in the node distribution actually uses node-weak, but that is a very simple, unstressful test case.

@trevnorris
Copy link
Contributor

trevnorris commented Nov 16, 2016

@bnoordhuis I didn't consider my last comment as an approved approach. I'd still like to know when the JS callback shouldn't be run when the destructor is manually triggered? After review it seems like we should be able to manually delete classes that are now weak. Because of the safety mechanisms in place, detaching the C++ class won't cause JS to segfault. Thus, we could completely remove weak handles/requests. Thoughts?

EDIT: Couldn't we also attach the call to JS through WeakCallbackInfo::SetSecondPassCallback() for safety?

@bnoordhuis
Copy link
Member Author

@danscales Yes, node-weak does the same thing.

@trevnorris Like we discussed in today's meeting, making everything non-weak sounds great.

@jasnell
Copy link
Member

jasnell commented Nov 18, 2016

@trevnorris @bnoordhuis ... I just want to make sure I understand the "making everything non-weak" part of this as it would impact some work that I'm doing: does this mean entirely avoiding the use of MakeWeak() in new AsyncWrap subclasses?

@bnoordhuis
Copy link
Member Author

@jasnell Correct.

@trevnorris
Copy link
Contributor

@bnoordhuis sorry for the delay. i'm able to remove the weak handles for everything (even non asyncwrap inheriting classes) for everything except for StatWatcher. so i'll implement the fix for this using uv_idle_t to fix this. Afterward i'll submit another PR to remove as much weak handle support that i've already worked on.

@trevnorris
Copy link
Contributor

Alternate PR at #9753

addaleax added a commit to addaleax/node that referenced this pull request Nov 26, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: nodejs#9467 (comment)
addaleax added a commit that referenced this pull request Dec 5, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: #9467 (comment)
PR-URL: #9471
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
@addaleax
Copy link
Member

addaleax commented Dec 6, 2016

This should no longer be necessary since #9753 landed, so I’m closing this

@addaleax addaleax closed this Dec 6, 2016
Fishrock123 pushed a commit that referenced this pull request Dec 6, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: #9467 (comment)
PR-URL: #9471
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
jmdarling pushed a commit to jmdarling/node that referenced this pull request Dec 8, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: nodejs#9467 (comment)
PR-URL: nodejs#9471
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
MylesBorins pushed a commit that referenced this pull request Dec 20, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: #9467 (comment)
PR-URL: #9471
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
MylesBorins pushed a commit that referenced this pull request Dec 20, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: #9467 (comment)
PR-URL: #9471
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
MylesBorins pushed a commit that referenced this pull request Dec 21, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: #9467 (comment)
PR-URL: #9471
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
MylesBorins pushed a commit that referenced this pull request Dec 21, 2016
Add a group of people to the “Who to CC in issues” list
as the maintainers of `async_hooks`.

Ref: #9467 (comment)
PR-URL: #9471
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. c++ Issues and PRs that require attention from people who are familiar with C++.
Projects
None yet
Development

Successfully merging this pull request may close these issues.