-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for feedback: Pointer compression in Node.js #790
Comments
Is there any aggregate data on heap-size in the community? Do we have any way to quantify the impact of restricting heaps to 4GB? It's also worth considering the added complexity in the downstream ecosystem if there are two official binaries, for example:
Whichever approach is taken I'd also be interested in the impact on library authors, would they face a similar burden to the node release and test team? Would two binaries or a flag ~2x the testing burden on library authors? Would they only test on the default and therefore leave a big area of untested entropy in the node eco-system when running on the non-default option? |
Not really, no. If you have suggestions for how to gather such data, please feel free to share!
I think it’s safe to answer all these questions with “no” in 99 % of cases – this would only become relevant when libraries would use large heap sizes themselves, I think. Typical npm packages, including native addons, would be unaffected. |
raises hand interested. Is there a way to try this yet? We run with 4-8 GB heaps and I'm currently trying to resolve an issue that I think is due to v8's GC not performing well when the heap usage gets that high. My guess is this wouldn't remedy the issue (doesn't change object count) but it shrinking the heap would be nice regardless. For internal strings, does this still reduce memory usage by ~30%? I thought pointer compression only reduces the object handle size. (When we use more than 4GB it's usually lots of internal strings.) As far as this breaking native add-ons, is that just a matter of rebuilding? We use a custom build of Node.js, so the default wouldn't matter to us. I'm guessing most users would benefit from PC and don't set the max old space size, so that seems like a reasonable default. |
Could platform providers provide anonymised, aggregate heap sizes for Node workloads on their platforms? It would be a limited, point in time perspective, but it might give us some idea. Some of the Microsoft/Google/Amazon/IBM folks in the community might be able to help broker this from their public cloud colleagues! |
@zbjornson Well, at least there’s going to be a way soon: nodejs/node#30463. For now it breaks ABI compatibility for non-N-API native addons, as you noted.
I would assume that too, yes.
The memory reduction estimates are based on statistics gathered in Chrome for typical websites. String sizes would remain the same. However, one possible strategy to reduce the impact of the lower heap limit that was mentioned by the V8 team would be adding something along the lines of an “externalize all strings” step that would move strings off the heap if one would otherwise be running out of heap, as I understand it.
You’d have to set the two gyp variables changed in nodejs/node#30463 manually in the binding.gyp, I think. We could maybe figure out something to make this easier, however, as the ABI incompatibility is intended to be temporary, that’s probably not worth the effort.
At our last Node.js collaborator summit, the idea was floated that we could gather some kinds of usage statistics for Node.js, e.g. feature usage counters etc., that would typically be enabled in CI systems or similar situations. However, no work in that direction has been done so far, I think. As for cloud platform providers gathering this data themselves, I guess that would be possible with their buy-in, yes? |
Some observations: A good number of people seem to have trouble getting Few user applications that I've seen in past support experiences exceed the default limit. Those that do usually have systemic memory leaks from misuse of language features, particularly Promises. (Or simply unbounded caches. Heh.) As is likely obvious, retaining a massive heap in a single node application is quite an anti-pattern when examining Node's strengths. Most of the things you'd want to store that would take up a lot of memory are better stored some other way than in Node heap itself. Additionally, a sizeable about of memory in Node applications is in Buffers, which are unaffected by this. I think this will benefit a vast majority of users. It's possible that some people may run into issues, but I think the largest percentile of those cases will run into issues already regardless. (Consider: many deployment options these days have low-ish memory limits per cost. Saving people space there would actually be a very large win. Particularly in FaaS e.g. lambda.) |
Often inside enterprise CI systems, builds will suddenly start to fail as the codebase scales to be "too big". While the "best" fix is indeed to address the root cause, often the only practical escape hatch to unblock the entire company's workflow, without disabling critical steps (linting, the build process, tests) is to use It would be unfortunate, I think, if the removal of this ability meant that when an organization scales to a certain size, all work has to stop while the appropriate team tries to discover the root cause. |
@ljharb Thank you for your input! Just for a bit of context, until Node.js 12 the default |
@addaleax totally fair! I believe Airbnb's jest tests were taking 8GB the last time I checked prior to leaving, as an example. |
We use WebAssembly extensively and that is where most of our memory usage is. So as long as this does not impact WebAssembly memory (which could go into 10s of GB) this would probably be OK. |
I once wrote a node server that would handle JSOM messages from a queue, each one about 1 MB in size. The operation it did benefited a lot from batching, and doubling the batch size increased the clock run time per batch by only about 10%, so it would handle 10000 messages at a time. If I weren't able to increase my heap size to 16 GB, I'd lose a great deal of performance by limiting to 2500 messages at a time. |
A potentially useless, off-topic feedback from a developer with Java background.
This is probably a question for V8 team, but what prevents from having heaps larger that 4GB with enabled compressed pointers? A similar optimization is present in JVMs for many years and in Java 7+ it's enabled by default when the maximum heap size is less than 32GB. If V8 also adds paddings to objects stored on the heap, a similar trick could be used.
In mainstream JVMs there are several approaches that decrease memory footprint of strings. I'd like to name a couple of them. The first one is called Compact Strings, which reduces footprint of internal representation of each char from 2 bytes to potentially 1 byte. As many applications store huge amount of strings that consist of ASCII characters, this optimization may be valuable for such apps. The second one is string deduplication (the name is quite straight-forward), but it's bound to concrete GC algorithm. In general, 4GB max heap size sounds like a critical restriction to me. I think that such restriction would prevent the compressed pointers mode from being used in many web applications. But I may be wrong, especially considering V8's default heap size restriction. |
Yes, if padding is being added, the heap limit could be increased – that won’t be worked on until some time next year, but it’s possible. The extra shifts are likely to introduce some performance overhead, though.
I’m not an expert on this topic, but there are definitely a number of cases where V8 already applies this optimization.
This also already happens in V8, afaik. |
Thanks for these insights @addaleax! Personally, I'd vote for a single (larger) binary with compressed pointers mode enabled by default. And if max heap size is explicitly set to >4GB value, 64-bit pointers mode would kick in. To me, memory footprint (and, potentially, performance) benefit is much more valuable than a larger binary size. |
I currently work on a Discord bot using the hydrabolt/discord.js library. This library is built upon "caching everything", so there are large Maps built out of references to other objects and sometimes circularly. I frequently run into issues due to many references to the same object across the application(8GB on the server). I'd be very interested in trying out this functionality to see what the differences would be. I would also be happy to provide some heapdumps that I have to see if I would actually benefit from this. |
Some anecdotal feedback in case that helps: at Netflix we're running at least one application that is fairly critical to our service that sometimes uses more than 5GB of JS heap space. It is quite possible that this is not the optimal trade-off we can come up with but the result of under investing into that aspect of the application's design. We'll try to look into whether using > 4GB of JS heap space is a strong design constraint, or a constraint we can remove without hurting our performance and reliability objectives. I can't commit to a specific deadline before which we will be able to look into this. On the topic of how to expose those JS heap space constraints/parameters to Node.js users: I feel like requiring developers to use a different binary or a different build depending on the heap usage of their application does not provide a good experience, and I would actually echo exactly what @ljharb said above: it would be unfortunate to make an issue that can now be mitigated by passing a command line flag ( |
More anecdotal feedback. One thing to take into account though is that our use case is minor as it basically has an unbounded limit on memory depending on the size of the database we are working with. On the topic on how to distribute both versions I fully agree with @puzpuzpuz here: Having a larger binary that switches mode when needed is preferable than having two different binaries. |
I also agree on the compile/deploy time vs runtime tradeoff. Building and delivering a single binary without changes seems to me like a much better choice than disrupting distribution and multiplying the available versions across platforms and runtimes. It also naturally avoids having to deal with added operational overhead such as:
Bandwidth and storage are cheap, while change management and people's time isn't. The distribution should be "batteries included" and "just work" without disrupting our users. If an organization needs to optimize their distribution, they can always compile their own - but I believe that's the exception, rather than the rule - and that should be the guiding principle here. |
We also plan to run some benchmarks at Netflix on services using < 4GB of heap space to see how much pointer compression will affect performance (positively or negatively), and we intend to share the results here after we done it. |
If we decide to have both versions bundled into one binary, we could mitigate the binary size concern would be to compress both versions during build time (using brotli or zstd, for example), and on startup (after deciding which one to use) we can decompress it and load it dynamically into memory. This would increase complexity and startup time though. Are there any future plans on dropping support for builds with pointer compression disabled on V8? We should probably take this into account as well, because if V8 stops supporting builds without pointer compression there's no point in increasing build and release complexity on our side. |
@mmarchini my understanding is that the V8 team said that they don't have plans to stop supporting builds without pointer compression. They did point out though that it won't get the test coverage from Chrome. At least one V8 team member did indicate though that this should only be coverage for a small focused part of the code (that which does the compression) |
It seems there is a problem with pointer compression and one of the flags that can be used to profile Node.js applications: https://bugs.chromium.org/p/v8/issues/detail?id=10138. It's probably worth checking if other tools are affected before enabling pointer compression by default on our releases (if we decide to do so). |
Just a quick update on this: I started to prepare a pointer compression build for v12 (current LTS), but found a few issues. I asked on v8-dev and apparently the pointer compression implementation on V8 7.8 might contain bugs and is slower than the version released on 8.0. So we'll probably wait until Node.js v14 to run those benchmarks. |
Is this enabled by default in Node 14? |
No, but you could build a Node.js version with that enabled. We'll probably post a Docker image with it soon. |
We just published an alternative build for v14 with pointer compression enabled, it's available here: https://unofficial-builds.nodejs.org/download/release/v14.0.0/. Be aware that this feature is still experimental on Node.js. Also, while testing this, I noticed that V8/Node.js will let users set Edit: It's only available on v14+ due to bugs reported by the V8 team on the pointer compression implementation before V8 8.0. |
I asked @mcollina if the 4 GB limit is per isolate or per process, and it seems that it used to be per isolate, but now defaults to per process to support shared memory across isolates. This can be changed at compile time using |
It's not the case in Node.js. |
We discussed in the TSC meeting today and we agreed that we should close, until if/when we find a champion to move it forward. |
One little data point here: I recently tested Node 17 with and without PC on one of my packages (MessagePack (de)serializer) and seemed to find about 5% better performance without pointer compression. Previously I had done some testing and found I can craft an isolated GC-intense test that shows some performance improvement with pointer compression, but my impression, based on my limited testing, is that generally Node performs a little better without pointer compression, and PC really is more about reducing memory usage than improving overall performance. Anyway, again I would consider this to be a small/limited data point, but just noting it FYI. |
The number of articles that reported this as if it were on by default - and have never been corrected since - is astounding. I was unpleasantly surprised to discover that the heap usage of our app did not decrease upon upgrading to Node 14. Since there are no images to be found in the official download section, even the commentary here is a little confusing. This is not happening, and apparently never did happen? |
I'm not sure what sources told you this happened, could you link them? What happened is a config flag at build time. You could create your build with it if you want. |
It's easy to conflate items in the V8 release notes with the Node feature set because most of the time those features are exposed. https://www.gumlet.com/blog/whats-new-in-node-14/ is the top link on Google for me |
FWIW, as a data point, Electron has V8 pointer compression enabled. We've gotten some complaints about it. We intend to keep it enabled, as Electron's attack surface is more similar to Chrome's than Node's in the general case—i.e. we run untrusted JS sometimes. |
Is there any specific reason why there are no Mac or Windows (there are ARM only) builds with pointer compression provided at https://unofficial-builds.nodejs.org/download/release/ ? Those would be perfect for projects that use Node runtime for quite specific needs that would benefit from lower memory usage (see typescript-language-server/typescript-language-server#472 for example). |
@rchl I believe unofficial-builds only cover Linux at the moment, but contributions would probably be welcome. Are there 32-bit builds available for your platform? Do those not address your concerns? |
It would be nice if those would cover the most popular platforms like mac Intel 64bit and arm 64bit, Windows 64bit and various Linux variants. |
It was not abundantly clear from the first read of the V8 release notes that this functionality was a compile time flag and not a runtime flag, or I might have had a clearer notion of why this might not be on by default. When I tried to unpack this there were some clues, some more obvious than others, but I suppose that's always the problem with bragging about something you worked on. You don't want to spend too much time explaining the ways that it's still very limited in ways others might label as 'broken'. However, this functionality is copying work done in Java about 8 years ago, and that is in fact handled as a run time flag, so I think one could be forgiven for assuming that a copycat implementation would have a degree of feature parity that they didn't deliver. And unfortunately any suggestions I can think of about scanning the arguments before starting and picking one binary or another don't scale past a single flag. |
Just wanted to express my thanks for introducing this feature. We are trying to trim our infra costs, and recompiling our Node.js image with pointer compression effectively halved our memory use. Thanks! |
@laurisvan how? Google is not forthcoming. |
@jdmarshall In our case, we build our own alpine .apk package with our fork of the Alpine package. Our configure options are as follows:
Building the image is an art of its own (especially in our case of creating a customized Alpine .apk package). If you can provide a bit of details about your Node.js runtime environment, I might be able to help a bit. Or maybe write a Medium article for details. |
@laurisvan I'd be interested what environment you are deploying and the use case? |
@mhdawson We run Docker containers in Kubernetes environment on AWS (EKS). We have a pretty heavy GraphQL API that may result in response payloads of ~10Mb, so the memory use is large. Unfortunately the payloads are unique, so caching the responses is a non-option. Luckily the usage is not huge - we have some ~300k API calls per day, but only a few of them are 10Mb monsters. |
@mhdawson what would it take to make pointer compression an official build? Could we do it on Linux alone initially since we already have the unofficial build? |
@cjihrig I've proposed additional builds a few times before. The discussion always come down to the cost of building/testing/shipping releases. We don't have a surplus of people volunteering for the build work and releasers already have a good amount of work for a release (such as the recent security release). To answer the question. What I think it would take is getting the release team to agree and the build team to agree. Part of that is somebody to voluteering to do all of the work required add the builds, watch for failures on them and jump in when there are problems. It might need people volunteeering to do build and release work. If the case was made that it's In absence of people volunteering to help or the project deciding it critical then it is easy to assume its a |
IMHO variations in packaging could and should be pushed downstream - otherwise it only creates extra complexity that downstream maintainers would need to care for. For example, I (and I believe many others, too) are using the Node.js, distributed as an Alpine Linux .apk package. I would assume the same applies for Debian, RedHat etc. too. These packages have separate maintainers and they know heir platform specifics much better. For example, the Alpine .apk package is much smaller than the official Alpine build. By having more variability upstream would only cause headache for downstream maintainers, as they would need to make further choices on what to use as a baseline - and still in the end, end up with one. I believe what would help more is having a well maintained recipe toolbox on "DIY node.js build", with potentially guides on how to test it; whoever does the packaging would take his/her responsibility of checking that it actually works. |
@mhdawson unless I'm missing something, the main branch is looking promising at least on macOS:
Can anyone else confirm? |
Continued in nodejs/build#3204 |
The V8 JavaScript engine, which is used by Node.js and Google Chrome, has started to provide a feature called “pointer compression”. The following aims to explain the impact of this feature on Node.js, and the questions and challenges around implementing it which the Node.js collaborator group is seeking feedback from the community on.
What is pointer compression?
On 64-bit hardware, which includes most modern end-user devices, references such as those from one JavaScript object to another take up 8 bytes each. A large percentage (about 70 %) of the memory used by a JavaScript application consists of such references.
Pointer compression is a way to reduce the size of these references to 4 bytes each. This reduces memory usage significantly and improves performance, at the cost of limiting the size of the JavaScript memory (the “heap”) to 4 GB, which is equivalent to about 6–8 GB of uncompressed memory.
Future changes could increase this limit to e.g. 8, 16, or 32 GB at some performance cost. This is not being worked on in V8 at this point.
Note that currently, the V8 engine limits the heap size to 4 GB as well unless this is explicitly overridden through command line flags (
--max-old-space-size
).ArrayBuffer
s and some JavaScript strings are stored separately and do not count towards this limit.For a more in-depth explanation, watch Benedikt Meurer’s NodeConf EU talk “What’s happening in V8? – Benedikt Meurer”:
How does pointer compression affect Node.js?
Unlike Chrome, Node.js currently does not enforce a hard limit on the size of the JavaScript heap, and so an application can generally use much more than 4 GB of memory on a 64-bit platform if configured to do so.
However, the memory footprint and performance improvements brought by pointer compression have a high potential of benefiting many Node.js users, and the need for large heap sizes is partially reduced through the memory footprint improvements which pointer compression yields.
Currently, enabling pointer compression breaks native addon compatibility for Node.js applications (unless N-API is being used), but this is likely going to change in the near future, making builds of Node.js with and without pointer compression almost fully interchangeable.
What questions do we need to answer?
Do we provide two different modes for Node.js?
Do we officially support and test Node.js builds both with pointer compression enabled and without? How would a higher heap limit at some performance cost affect the answer to this question?
Supporting two different build configurations allows Node.js users to make a choice that matches their needs best, although it is unclear what a typical maximum heap size would be and how important it is to provide Node.js builds supporting unusually large heap sizes.
Additionally, providing two different modes means that the non-default one will receive less usage and thus less overall test coverage.
The V8 team does not expect this to be an issue.
In the past, Node.js has also provided 32-bit binaries that would be usable on 64-bit systems. This option is still available when building Node.js from source, but release builds are not provided anymore. 32-bit binaries are incompatible with 64-bit native addons, but provide similar benefits as pointer compression does.
If we do support two different modes, how do we deliver them to users?
The pointer compression mode is hardcoded into the V8 build used by Node.js, bringing up the question of how to give users a choice if we decide to do so out of the box (as opposed to requiring users who pick the non-default option to build Node.js from source).
There are at least two main options for answering this:
Both options would increase the load on the Node.js release and testing infrastructure non-trivially, increasing the amount of necessary testing and lengthening the release process.
Which one would be the default mode?
If we do provide the two different modes out of the box, which one should be the default? Are we assuming that the heap limit imposed by pointer compression would be sufficient? If there is an extension to the heap limit as suggested above, what heap limit should we pick as the one that Node.js builds and provides to users?
The text was updated successfully, but these errors were encountered: