Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should internal modules be affected by monkey-patching? #18795

Closed
hashseed opened this issue Feb 15, 2018 · 49 comments
Closed

Should internal modules be affected by monkey-patching? #18795

hashseed opened this issue Feb 15, 2018 · 49 comments
Assignees
Labels
discuss Issues opened for discussions and feedbacks. lib / src Issues and PRs related to general changes in the lib or src directory.

Comments

@hashseed
Copy link
Member

Many internal modules call methods defined on prototype objects that can be modified (or monkey-patched) in userland. For example, by overriding Object.prototype.getOwnProperty, it is possible to affect async hooks.

One way to fix this is to keep copies of these functions at boostrap, and use these untampered copies inside internal modules.

Optionally, these copies can be exported through a new internal module so that user code can get hold of untampered copies if they intend to.

Advantages of this approach:

  • The way internal modules are implemented remain implementation details and can be changed without risking breaking user code that somehow rely on monkey-patching.
  • Internal modules behave predictably.
  • Usage of JavaScript builtins no longer introduce hidden APIs to override behavior to internal modules.

Advantages of current code:

  • Internal modules can be monkey-patched, e.g. for instrumentation.
  • Internal modules implementation are easier to read.
  • No additional overhead during bootstrap.

I think it is desirable to have consensus over whether internal modules should be affected by monkey-patching. This does not have to be a either/or decision.

Coming from the JavaScript spec, there are JavaScript builtins that are affected by monkey-patching too, intentionally. For example, by overriding RegExp.prototype.exec, the behavior of other RegExp builtins can be changed. Yet another example is the species constructor. Generally though, JavaScript builtins call into internal builtins that cannot be altered.

I think it is important to clearly specify and document which global state (i.e. result of monkey-patching) affect behavior of internal modules rather than having implicit APIs though monkey-patching. Personally, I think that - unless specified otherwise - monkey-patching should not affect internal modules. It needs to be a conscious design choice.

FWIW this problem would become very apparent if Node.js had a formal specification for its internal modules :)

Ref: #18773
CC: @benjamingr @devsnek

@hashseed hashseed changed the title Should internal modules be affected by monkey-patching. Should internal modules be affected by monkey-patching? Feb 15, 2018
@hashseed hashseed added discuss Issues opened for discussions and feedbacks. lib / src Issues and PRs related to general changes in the lib or src directory. labels Feb 15, 2018
@benjamingr
Copy link
Member

benjamingr commented Feb 15, 2018

I think one of the most important points that you made in #18773 is:

All APIs that V8 offer in C++ use the original builtin. Including v8::Object::GetOwnPropertyNames. Let's say, just hypothetically, Node.js core wants to move the implementation of some functions from JS to C++, it then must not use v8::Object::GetOwnPropertyNames, but instead load and call global.Object.getOwnPropertyNames. Otherwise it would observably change behavior.

That is, every time we move code from/to C++ we break any patching userland has for the said call.

Now, another good thing to consider is that there have been cases where overriding built-ins has been tremendously useful when polyfilling - for example https://github.com/drses/weak-map/blob/master/weak-map.js (the WeakMap polyfill).

@devsnek
Copy link
Member

devsnek commented Feb 15, 2018

if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how fs.readFile works

@cjihrig
Copy link
Contributor

cjihrig commented Feb 15, 2018

I'm in favor of Node core's internals being resistant to monkey patching. I don't think we need to expose the untampered functions to users though.

@benjamingr
Copy link
Member

@devsnek

if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how far.readFile works

Probably, but it's more subtle than that - going back to the the WeakMap example case - users changed GetOwnPropertyNames since a new ("hidden") key was added to an object. That key had to be "secret" and not impact other methods - so if util.inspect was "tamper proof" then it would have been impossible to polyfill WeakMap entirely.


Note: Another thing we can do is to explicitly specify that changing built ins is dangerous and doing so is not supported in Node.js .

I'm not sure that's a good idea as it violates "least surprise" and the current behavior is unspecified anyway. Note that fixing this and ensuring builtins are used should be a pretty big code change and will create overhead in maintaining the code since basically built-in instance methods won't be used.

If we do this we need to make sure it doesn't impact performance (or improves it, since technically it's provable which function it is now), and that it's not problematic maintenance wise.

@mscdex
Copy link
Contributor

mscdex commented Feb 15, 2018

I feel like this is a deep rabbit hole. Anyone could override anything, so now any and all object/prototype methods need to be saved up front, which is crazy because there are so many and they're used throughout the codebase. For example: string.trim(), string.toLowerCase(), string.slice(), Object.keys(), Number.isNaN(), etc.

Even if it was decided it's a good idea to do this, we'd need to also maintain a giant list of methods we use anywhere in node core.

@hashseed
Copy link
Member Author

@mscdex are you concerned about memory use or time spent upon startup? One way to do it is to simply shallow clone these prototype objects.

@mscdex
Copy link
Contributor

mscdex commented Feb 15, 2018

@hashseed I'm not even considering performance implications at this point. Just having to touch almost every line of code in node core, getting new collaborators to understand the change, maintaining these methods/prototypes, etc.

@benjamingr
Copy link
Member

I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.

@bnoordhuis
Copy link
Member

Just having to touch almost every line of code in node core

I don't have a problem with that if it's for the greater good. Most of it is rote search-and-replace, perhaps even mechanical replace.

@jasnell
Copy link
Member

jasnell commented Feb 15, 2018

Could make for a good Code-and-Learn exercise :-)

@apapirovski
Copy link
Member

apapirovski commented Feb 15, 2018

My position on this aligns with @mscdex. There's some parts that are not completely unintuitive because people have likely seen hasOwnProperty.call(obj, 'something') before, but then there's stuff like Promise.prototype.then monkey-patching and needing to have PromiseThen that is honestly going to be nightmare to explain to new contributors.

Also, it's very likely that uncurryThis carries a performance penalty so switching over the entirety of core to use it will lead to a broad performance penalty for the whole of Node.js.

if we're going down this route, my preference is to try & find a path that leads to the core being able to lock or have our own versions of these primitives, as mentioned by @benjamingr.

I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.

I know this is a more complicated route (that might not be feasible with what we have available right now) but at least it would keep the code maintainable and easy to understand for new contributors. Having to write PromiseThen(promise, fn); is — to me — not an appealing proposition.

(And if we're comparing this to the C++ layer, having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions.)


Also, to further elaborate, this would still need to go hand in hand with being more defensive on anything being passed in from the users. For example, if a user is allowed to pass in a settings object then we shouldn't assume that it has hasOwnProperty or that it even has a non-null prototype.

This aligns with the recent transition to using Reflect.apply on user provided callbacks.

@hashseed
Copy link
Member Author

having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions

This does not work, if you take user-defined objects as arguments and leak internal versions via return values.

@apapirovski
Copy link
Member

This does not work, if you take user-defined objects as arguments

As I mention above, we would still need to be more defensive for user-provided objects but that's a much lesser level of change than having to be defensive about the entirety of internal Object, String, etc. usage. User-provided callbacks were one of the main culprits and then we have some infrequent hasOwnProperty access, but not much more than that.

leak internal versions via return values.

Hence me saying this might not be possible currently. That said, if our internal versions could be frozen but without the currently present performance impact, then it would technically be possible, no?


The fact is that it's hard to know where to draw the line. What about Buffer? It's something that's provided by node itself but it's crucial to the inner workings of at least half of core. And if Buffer is included, what else is?

@devsnek
Copy link
Member

devsnek commented Feb 15, 2018

in my mind i kind of split this into two example cases:

a) util.inspect getting property keys
b) repl using string replacement

for A i would use unsaved prototype methods to allow users to heck with it, but for B there is no benefit to the repl (in fact it can become impossible to use) if someone overrides String.prototype.replace for some reason, so we would use a saved proto method here.

like hashseed said its ok to continue to have planned monkey patching but it shouldn't just randomly happen.

@benjamingr
Copy link
Member

benjamingr commented Feb 15, 2018

in my mind i kind of split this into two example cases:

I think @hashseed's point is that this is generally underspecified in our whole API and that we should probably consider specifying this. (feel free to comment if I misrepresented your intention)

That is, that an effort to go over Node's API in its entirety and specify (at least for internal use) what methods may impact it would be beneficial. I agree with that.

I think we should also discuss how we should do this (split work?) and get some initial data.

b) repl using string replacement

I'm really not sure, I honestly can't think of a case where I'd want to override this but I wouldn't forsee the WeakMap for util.inspect either.

The issue is that this effects older versions of Node (new versions of Node rarely need polyfills).

@jasnell
Copy link
Member

jasnell commented Feb 15, 2018

One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.

@mscdex
Copy link
Contributor

mscdex commented Feb 15, 2018

I don't know how the shallow cloning of prototypes would work, but currently even if you save hasOwnProperty early, someone could still override Function.prototype.call after the fact and still completely disrupt your hasOwnProperty.call(...).

@benjamingr
Copy link
Member

@mscdex call(hasOwnProperty, ...) :D Although I see your point

@mscdex
Copy link
Contributor

mscdex commented Feb 15, 2018

One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.

Yes, this was kind of what I was hinting at earlier with having to touch the entire codebase.

@mscdex
Copy link
Contributor

mscdex commented Feb 15, 2018

@mscdex call(hasOwnProperty, ...) :D Although I see your point

Right, and having to code like that would be pretty absurd IMHO, even if you wrote a wrapper for it.

@devsnek
Copy link
Member

devsnek commented Feb 15, 2018

as a reference I created https://gist.github.com/devsnek/76aab1dcd169c96b952fbe8c74404475. around 350 functions are saved. L27 exposes it to users.

@apapirovski
Copy link
Member

We would also need to store the constructors to avoid global.Array = function() {/* user code */} and pass them into the internal wrappers.

@hashseed
Copy link
Member Author

Another way I can think of is to store copies of builtin methods under symbols. In internal modules you would invoke builtins via symbols.

@jasnell
Copy link
Member

jasnell commented Feb 15, 2018

One possible approach that carries along a whole range of it's own complexities is to use a separate context (e.g. as in vm.runInNewContext()) for internal code vs. user code. While it would be disruptive in a number of cases, it would prevent a number of issues being discussed here.

@hashseed
Copy link
Member Author

One possible approach that carries along a whole range of it's own complexities is to use a separate context

Unfortunately that does not work. When you call arg.getOwnProperty, you call the one passed with the arg object on the prototype, not the one on the object prototype on from the current context.

@hashseed
Copy link
Member Author

I just checked with @bmeurer that TurboFan handles Reflect.apply. With the following micro-benchmark I observe some regression on Node 8, but no difference at all on current Node master.

const ReflectApply = Reflect.apply;

function uncurryThis(func) {
  return (thisArg, ...args) => ReflectApply(func, thisArg, args);
}
var StringStartsWith = uncurryThis(String.prototype.startsWith);

var s = "Hi Matteo";
function test1() {
  var r = 0;
  console.time("monkey-patchable");
  for (var i = 0; i < 1E7; i++) {
    if (s.startsWith(i)) r++;
  }
  console.timeEnd("monkey-patchable");
  return r;
}

function test2() {
  var r = 0;
  console.time("cached original");
  for (var i = 0; i < 1E7; i++) {
    if (StringStartsWith(s, i)) r++;
  }
  console.timeEnd("cached original");
  return r;
}

test1();
test2();

Is it really that much more unreadable to rename the function and move the receiver to the first argument? You don't have to use some .call pattern here, and there seems to be no performance penalty.

Having to write PromiseThen(promise, fn); is — to me — not an appealing proposition.

I'd argue that you should be writing async/await in most cases anyways.

@devsnek
Copy link
Member

devsnek commented Feb 28, 2018

In the context of this conversation i think primordials are the big global objects that javascript (and we) create (Array, Symbol) and generics are the methodology from js1.6 where you could use Array.every (which doesn't exist anymore) on anything iterable. In this case generics are going to be much slower than the uncurryThis we are looking at currently because v8 appears has specific perf case for uncurryThis and generics have to work with many types calling down to other behaviors

@jdalton
Copy link
Member

jdalton commented Feb 28, 2018

I'm referring to primordials in the context of the Node helpers written to handle handle methods from unknown origins. The term used for those helpers was primordials. I pointed to the docs for the source of the name generics, not using the actual Array.every (that doesn't exist in v8). I'm saying I implemented generics in the same way as uncurryThis. Who knew the name would throw folks off here 😋

@benjamingr
Copy link
Member

I think I'll pick this up at the next open source event we're doing and get 10 or so community members messing with functions and reporting how Node behaves in different scenarios - as well as searching GitHub and NPM for initial data.

I think that should get us the initial data we need to present to the TSC in order to make a push for a more consistent API in terms of overriding builtins.

This will likely be mid-late March (and sounds like a fun activity people unfamiliar with our code base can help with), if anyone has an earlier time frame feel free to "steal" this.

@benjamingr
Copy link
Member

benjamingr commented Feb 28, 2018

I'm going to self assign, I don't think this needs TSC attention until we have said data but feel free to disagree or bring other opinions.

@devsnek
Copy link
Member

devsnek commented Apr 22, 2018

@benjamingr is this something you're interested in doing at a community event?

@benjamingr
Copy link
Member

@devsnek still interested in this but have been a little overwhelmed.

@hashseed
Copy link
Member Author

hashseed commented May 31, 2018

There is a proposal by @domenic that would address this.

@benjamingr
Copy link
Member

@hashseed thanks, that's useful. I should have probably gotten around to this by now but I ended up working on promise-use-cases in the last event. I'll do my best to get to this in the next event (June 16th).

(Random note: I think domenic asked not to be pinged in the nodejs repos by the way - so we should probably not ping him without checking)

@ljharb
Copy link
Member

ljharb commented May 31, 2018

That proposal isn’t solely necessary, fwiw - node could load a single file at the start of the process that caches all the intrinsics and provides a module to expose them internally. The challenge - with that or with domenic’s proposal - would be enforcing the pattern’s usage via linting and code review.

@benjamingr
Copy link
Member

@ljharb see domenic/get-originals#14 - I'm thinking about automatic conversion here rather than changing our code.

The proposal is something we'd potentially want to use after we've mapped how our internals react to these changes.

@ljharb
Copy link
Member

ljharb commented Jun 1, 2018

I'm not sure how you could reliably automatically convert prototype methods; babel has the same challenge.

@benjamingr
Copy link
Member

@ljharb I'm not sure what you mean? If you have a concern regarding automatic conversion please do speak up in the get-originals repo.

@Trott
Copy link
Member

Trott commented Oct 26, 2018

Should this remain open? Or is this a discussion that has run its course and can be closed?

@ljharb
Copy link
Member

ljharb commented Oct 26, 2018

I’d still like to see a policy decision that trends towards robustness, if possible.

targos pushed a commit that referenced this issue Feb 10, 2019
This patches changes the `safe_globals` internal module into a
script that gets run during bootstrap and saves JavaScript builtins
(primordials) into an object that is available for all other builtin
modules to access lexically later.

PR-URL: #25816
Refs: #18795
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gus Caplan <me@gus.host>
@jasnell
Copy link
Member

jasnell commented Jun 26, 2020

Given that there has not been any further discussion on this in 1.5 years and given that we've started moving to the use of primordials and protecting key paths from monkeypatching, I believe this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. lib / src Issues and PRs related to general changes in the lib or src directory.
Projects
None yet
Development

No branches or pull requests