-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of function section reordering in node binary for performance #16131
Comments
I think this depends on whether/how many of the platforms we support are supported by hfsort. cc @nodejs/build |
Can you expand on what the gold linker is, and how it is different from the one we are using ? |
#1409 might be relevant, that was about PGO, profile-guided optimization. It stranded on a lack of representative benchmarks but that was before we had a benchmarking WG. |
I don't see a problem with adding it as long as:
If you could raise a PR that would be great. Once/if the PR lands we'd need to install the gold linker on test and release platforms, and make sure release builds are using it. |
@uttampawar but that's a build-time dependency right? Anyone downloading the binaries won't need this. |
@gibfahn Yes it is a build time dependency. To get the benefit of code-layout optimization, build needs to be specific for a workload so it means binary is for specific workload/benchmark. If can happen that the a binary is beneficial in general across most/all workloads or at least no observed regression then yes we can provide that binary. |
The changes are entirely in the gyp build files to use gold linker if it exists. Changes will be specific to linux-x64 OS. |
I'm afraid I'm not, people like @bnoordhuis @targos @addaleax @refack might know more. Speaking of which, we should probably have an @nodejs/gyp team for that kind of question, people normally cc/ nodejs/build, but it's not really our area of expertise. |
There are a couple of detections possible:
Also for /CCing on GYP and |
P.S. |
fwiw, I'm +1 on moving forward with this. Obviously I'd prefer to have something that could work across all of the platforms we support but if it's limited to linux then that's ok. I am also slightly concerned about making sure the training benchmark doesn't end up optimizing for a specific type of workload at the cost of regressing on others. I highly doubt that will be a problem but it would be good to have benchmarks that watch for that. In a chat with @uttampawar, he indicated that one option would be to expose a configure flag that allows folks to build with their own training data which would lessen the likelihood of issues there. As long as the process for capturing that training data is also documented, then I'm all for it. Would love to see a PR soon :-) |
Adding support for it in
So people would have to build their own versions of Node to fix performance regressions? If it actually does regress certain use cases I'm not sure "build it yourself" is a reasonable answer. |
Certainly. We should make sure this doesn't add regressions but we can only do so with use cases we can anticipate. |
@gibfahn If we can add this option and watch out for any regression for few use cases then that's great but as @jasnell said we can only do so for few cases. My suggestion is that in addition for every other use case out there we can still provide a build target (use of gold linker) for users who are willing to build, train and rebuild node.js for more performance just for their application. It's like supporting PGO compiler option in our build scripts. |
I'm going to close this out given the lack of movement in over a year and given that the associated PR had been unattended since February. |
Currently we don't use the gold linker with the node build system. The gold linker provides an option
--section-ordering-file
which gives an option to the user to provide the ordering of frequently used functions to the linker.The changes would be:
The hot-function-ordering file itself was generated using the hfsort tool which is part of the Facebook/HHVM project.
This tool uses the Linux perf cycle accurate profile and executes pettis-hansen ordering algorithm on it.
I tried this methodology with the Ghost.js benchmark (which I created) and the acme-air benchmark and saw a speed-up of 2.5% - 3.5 % in throughput.
I wanted to know if it would make sense to make this change in the build system ? I could send a PR and see if there are any adverse affects on the known set of benchmarks, I haven't seen any regression on the micro benchmarks so far.
The text was updated successfully, but these errors were encountered: