Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enable Node.js --expose-gc via require('expose-gc') #3207

Merged
merged 2 commits into from
Jun 2, 2021

Conversation

michaelfig
Copy link
Member

As mentioned by @warner in #3197 (comment) this is a package-based enablement of --expose-gc since NODE_OPTIONS=--expose-gc is rejected by Node.js:

node: --expose-gc is not allowed in NODE_OPTIONS

@michaelfig michaelfig added the cosmic-swingset package: cosmic-swingset label May 28, 2021
@michaelfig michaelfig requested a review from warner May 28, 2021 20:11
@michaelfig michaelfig self-assigned this May 28, 2021
@mhofman
Copy link
Member

mhofman commented May 28, 2021

Lurk: Would it make sense to use the function version of this where needed, to avoid adding to the global where it's not needed?

@michaelfig
Copy link
Member Author

Lurk: Would it make sense to use the function version of this where needed, to avoid adding to the global where it's not needed?

We'd have to plumb it in as function arguments all the way through SwingSet. I'll consider doing that when refactoring the SwingSet controller, but for now I prefer not to make such an intrusive change.

@warner
Copy link
Member

warner commented May 28, 2021

Lurk: Would it make sense to use the function version of this where needed, to avoid adding to the global where it's not needed?

That's nearly what we're doing, at least with respect to all the non-start compartments (where the entire kernel runs). The controller (which runs in the start compartment) imports gc-and-finalize.js, which pulls gc from the global, and wraps it in gcAndFinalize. Then the controller passes gcAndFinalize into the kernel (via kernelEndowments, I think in something named gcTools that also includes WeakRef and FinalizationRegistry). Everything beyond that point is distributed through function arguments, in a properly ocap-confined way.

We could choose to changegc-and-finalize.js to instead return a makeGCAndFinalize(gc), and make controller.js take responsibility for plucking gc from the global (or whatever) and passing it into that constructor, but it seemed unnecessary. I know @dckc is (correctly) always reminding me of how I can do "more ocap", and this might be an instance of that, but I don't know where to draw the line between "this module expects to run in the start compartment and can thus import or globalThis.foo anything it needs", and "this module is pure and the caller must do import/etc and give it the following N things".

Also, maybe I missed something, I didn't think there was an importable form of gc().. the only one I knew about was the global. If controller.js could get gc from some special Node.js import, then yeah, for sure it would be better to do that than to require --expose-gc.. that'd be way more usable.

@dckc
Copy link
Member

dckc commented May 29, 2021

...

We could choose to changegc-and-finalize.js to instead return a makeGCAndFinalize(gc),

Yes, I suppose I neglected to say something like that in a review.

... and make controller.js take responsibility for plucking gc from the global (or whatever) and passing it into that constructor, but it seemed unnecessary. I know @dckc is (correctly) always reminding me of how I can do "more ocap", and this might be an instance of that, but I don't know where to draw the line between "this module expects to run in the start compartment and can thus import or globalThis.foo anything it needs", and "this module is pure and the caller must do import/etc and give it the following N things".

All exports should be powerless; only main scripts and tests can use powerful globals, to summarize discussion in #2160 to date.

The imports in this PR are indeed in main scripts, but gc is then passed via ambient authority to controller.js, so I made a couple suggestions to add a WARNING about the use of ambient authority and a TODO to undo this increased technical debt.

@michaelfig
Copy link
Member Author

Also, maybe I missed something, I didn't think there was an importable form of gc().. the only one I knew about was the global. If controller.js could get gc from some special Node.js import, then yeah, for sure it would be better to do that than to require --expose-gc.. that'd be way more usable.

As @mhofman alluded, that's possible with import gc from 'expose-gc/function';

I will rework this PR to do that in controller.js instead of the --expose-gc compatibility mode it currently uses (globalThis.gc).

@michaelfig michaelfig force-pushed the mfig-expose-gc branch 3 times, most recently from df1167a to 9879bcc Compare May 29, 2021 22:55
@michaelfig michaelfig requested a review from FUDCo May 30, 2021 00:06
@michaelfig
Copy link
Member Author

@warner, @FUDCo R4R.

@FUDCo
Copy link
Contributor

FUDCo commented Jun 1, 2021

I feel like I'm missing something fundamental here. I found the very fact that you can enable the --expose-gc option via an import to be terrifying. I just looked at how the expose-gc package works, and now I'm even more terrified; apparently there are low-level v8 APIs that let you fiddle with everything. I presume we automatically withhold that stuff outside the start compartment the same way we withhold, say, fs or process, but still...

I had been thinking whole point of node having this command line flag was to withhold the power unless it had been delivered from the outside, but it appears to simply be to keep you from using the power unless you want to. In that case, am I correct in understanding that the point of these changes is just to avoid the inconvenience of an apparently pointless command line ritual?

Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, although I'm slightly concerned about:

  • Does expose-gc really work? https://github.com/legraphista/expose-gc/blob/master/function.js shows that it's creating a new V8 context (is that their name for a Realm?) and and grabbing the gc global from it with an eval-equivalent. I suspect this also adds gc to the globals of all subsequent new contexts, which might include threads (aka NodeWorkers). And I think we're relying upon the gc from one context working to collect garbage across all contexts (although, if these are basically Realms, then since Realms can still talk to each other, they necessarily have a single common GC domain, so that's probably safe).
  • expose-gc populates global.gc, just like --expose-gc did
    • so I suspect expose-gc is good for removing the need for command-line flags, but not for reducing authority of the start compartment or unpolluting a namespace

If it's possible to shuffle the test to build gcAndFinalize in a different function than the one where the test object is created and released, that'd make me feel slightly better. But if that's too difficult, feel free to land without it.

@@ -22,8 +16,9 @@ function setup() {
finalized[0] = 'finalizer was called';
});
const wr = new WeakRef(victim);
const gcAndFinalize = makeGcAndFinalize(engineGC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a separate function to prepare gcAndFinalize independently of where we setup the object that's about to be deleted. I'd like the number of objects that go out of scope during the execution of setup() to be as small as possible, so we aren't accidentally triggering GC and thus masking a faulty gcAndFinalize().

@@ -81,7 +76,8 @@ test(`can provoke gc on xsnap`, async t => {
const opts = options();
const vat = xsnap(opts);
const code = `
${gcAndFinalize}
${makeGcAndFinalize}
const engineGC = globalThis.gc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems somewhat ironic that we must extract gc from the global after going through so much work to avoid putting it on the global. Although I suppose the main benefit of using the expose-gc NPM module is that we don't need command-line flags, rather than reducing authority added to the start-compartment global.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No irony. On Node.js, import engineGC from 'expose-gc/function'; does not modify globalThis.

The code above is evaluated under xsnap, to which the C code has deliberately endowed globalThis.gc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right, I assumed their function.js was only reachable through their index.js, but of course you can deep-import it directly

);
}
return;
export function makeGcAndFinalize(gcPower) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda glad to see that alreadyWarned go away.. I'm not sure how it worked on our unit test (which extracted the source code of gcAndFinalize for evaluating in xsnap, which would lose the let alreadyWarnred = false; line). Maybe it was just relying upon unbound names being treated as properties of the global.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know how that worked.

@@ -4,13 +4,14 @@
import '@agoric/install-ses';
import { parentPort } from 'worker_threads';
import anylogger from 'anylogger';
import engineGC from 'expose-gc/function';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised this works, since it means the NodeWorker thread can require('v8') and require('vm'), and I thought NodeWorkers were slightly less powerful than that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeWorkers can require(ANYTHING).

@michaelfig
Copy link
Member Author

am I correct in understanding that the point of these changes is just to avoid the inconvenience of an apparently pointless command line ritual?

That's the first point. The second point is not to introduce an ambient globalThis.gc authority in the start compartment.

@michaelfig
Copy link
Member Author

Seems good, although I'm slightly concerned about:

Yes, a V8 context is roughly a Realm. Indeed, that's how realms-shim is implemented on V8.

I suspect this also adds gc to the globals of all subsequent new contexts, which might include threads (aka NodeWorkers).

That's true. I worked around this by creating swingset/src/engine-gc.js, which cleans up after itself.

  • expose-gc populates global.gc, just like --expose-gc did

But expose-gc/function (and engine-gc.js) does not populate globalThis.gc.

If it's possible to shuffle the test to build gcAndFinalize in a different function than the one where the test object is created and released, that'd make me feel slightly better. But if that's too difficult, feel free to land without it.

I've done that, however you should note that your use (and holding onto) a FinalizationRegistry as a result from that function may be affecting the collection of the victim. Maybe this is why there needs to be multiple setImmediates?

@michaelfig michaelfig enabled auto-merge June 2, 2021 02:12
@warner
Copy link
Member

warner commented Jun 2, 2021

If it's possible to shuffle the test to build gcAndFinalize in a different function than the one where the test object is created and released, that'd make me feel slightly better. But if that's too difficult, feel free to land without it.

I've done that, however you should note that your use (and holding onto) a FinalizationRegistry as a result from that function may be affecting the collection of the victim. Maybe this is why there needs to be multiple setImmediates?

Cool, thanks.

In my tests, if the FinalizationRegistry got collected, its callbacks wouldn't fire, so I'm pretty sure the FR is a necessary return.

@michaelfig
Copy link
Member Author

I've done that, however you should note that your use (and holding onto) a FinalizationRegistry as a result from that function may be affecting the collection of the victim. Maybe this is why there needs to be multiple setImmediates?
In my tests, if the FinalizationRegistry got collected, its callbacks wouldn't fire, so I'm pretty sure the FR is a necessary return.

I don't think the FR is collected, I just meant that it should be created outside the function that creates the victim.

However, I'm probably confusing that with V8's collection of captured variables in a closure (which are released as a frame all at once, not when just one (or a few) of them goes out of scope).

Victim is not a captured variable, so nvm. 😃

@michaelfig michaelfig deleted the mfig-expose-gc branch June 2, 2021 16:25
@mhofman
Copy link
Member

mhofman commented Jun 2, 2021

I don't think the FR is collected, I just meant that it should be created outside the function that creates the victim.

Or at least that the victim is nulled inside that init function. But yeah, I'd recommend a single global FinalizationRegistry, it's easier to reason about.

Edit: technically that registry callback doesn't reference anything in its scope, so the engine could collect the victim, however that's relying on the ability of the engine to do hard core optimizations (detect which names from parent lexical scopes are actually used).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants