-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: improve error performance of sync methods #49593
Conversation
Review requested:
|
0afabd5
to
80c845b
Compare
10cb7e5
to
8a748c4
Compare
4351a5e
to
6de6974
Compare
6de6974
to
6ff6a52
Compare
I wrote a small benchmark which simply loads a small text file: https://github.com/lemire/js-file-loading-benchmark I don't have (yet) the numbers under Linux with this PR, but bun is at least 2x faster than Node.js 20 under Linux. The difference is smaller under macOS. On my mac (ARM silicon), this PR does not seem to help:
|
Results under Linux. Note that Bun is actually 3x faster at loading a small file, so it is quite bad.
Benchmark: https://github.com/lemire/js-file-loading-benchmark |
hi daniel (@lemire). when i run your bench on my laptop (ubuntu 22.04, Intel(R) Core(TM) i7-6560U CPU @ 2.20GHz) i see something similar (but also different for bun). 8.7 µs for bun but the mitata bench seems to run for only 10 iterations, which seems an incredibly short time to get a meaningful result. when i run using my own benchmarking script i get a completely different result for node20 8.5 µs for bun is it possible there is some issue with mitata? or possibly not enough iterations for JIT/fastcalls to kick in? here is a gist with the code i am running. can you reproduce this on your linux machine? also, using my own custom v8-based runtime i can get ~7.8 µs per iteration for this test using v8 fastcalls, so it should in theory be possible to go even faster than bun. i am wondering where the 4.3 µs is coming from in your results above for bun - maybe it is using different syscalls on the linux you are running on? can you do an strace? |
Here are my results under Linux using your script.
strace: Node:
Bun:
|
Thanks @lemire, moving to file descriptors and replacing stat with fstat should improve the performance. I'll update the pull request when I have the chance. |
@lemire thanks for running. those results look odd. i will have a look again to make sure i gave you the right script. |
@billywhizz How do you leverage v8 fast api with non-flat strings? And returning a string is a limitation of the fast api. I'm extremely interested in learning your solution for this limitation. |
@billywhizz I can give you access to the hardware I am using if you'd like. Email me. (Email easy to find.) |
@anonrig i'll see if i can share the code for the runtime over next few days. it works differently to node.js. the code to read the file in JS looks like this. const { fs } = spin
const O_RDONLY = 0
const stat = new Uint8Array(160)
const stat32 = new Uint32Array(stat.buffer)
function readFile (path, flags = O_RDONLY) {
const fd = fs.open(path, flags)
assert(fd > 0)
let r = fs.fstat(fd, stat)
assert(r === 0)
const size = Number(stat32[12])
const buf = new Uint8Array(size)
let off = 0
let len = fs.read(fd, buf, size)
while (len > 0) {
off += len
if (off === size) break
len = fs.read(fd, buf, size)
}
off += len
r = fs.close(fd)
assert(r === 0)
assert(off >= size)
return buf
} and each of the calls to fs.open, fs.fstat, fs.read and fs.close is a fastcall. the overhead on a fastcall is only about 3-4 nanoseconds so it makes pushing a lot of these lower level calls into JS from C++ land feasible. i feel bad for you with all the sh*t you have taken on twitter since the bun release so hopefully i can help you out with some benchmarking and testing in coming weeks. please ignore the idiots and keep doing what you are doing. OSS for the win! 🙏 |
@lemire thanks! i will get in touch. i should have some time to help out over next few weeks. i'd be interested to see why your results are coming out different to mine. this is what i see running the code in that gist.
|
@anonrig @lemire @joyeecheung as you can see above, in all the benchmarks i have been doing bun/JSC is a real memory hog. will be interesting to see if that can be resolved easily. there is apparently a |
In our case, a whole lot of time is spent in UTF-8 processing within v8. I now realize that it is a fundamental difference between our two benchmarks. Append the following to your benchmark: function utf8fn() {
const buf = readFileSync('./README.md', 'UTF-8')
return buf.length
}
console.log(utf8fn())
run('readFileSync', utf8fn, 200000, 10) Here is what I get...
Notice how Node.js 20 does really, really poorly when loading UTF-8 data? |
@lemire that makes sense - my fault for missing that in your original post. sorry for the confusion. i actually did quite a bit of benchmarking on this last year. i'll see if i can dig out my notes. from memory, String::NewFromUtf8 is very slow compared to whatever bun/JSC is doing. i seem to remember String::NewFromTwoByte and String::NewFromOneByte are significantly faster. Maybe we can have a dig into the v8 source and see if a patch would be viable? more info: https://twitter.com/jarredsumner/status/1496455856224972808?lang=en this isn't going to be an easy one i think. |
i have pushed another benchmark to my gist with a specific test for utf8 decoding. it's kinda interesting. for small strings, bun is ~2.5x faster on my setup.
for large strings, it is also ~2.5x faster
but, notice the bun version uses ~10x the memory. when i run bun with
i think bun is doing something "tricky" here in order to win the benchmark game? i've always noticed jarred is very careful never to show memory usage in his benchmarks and from my testing (i've done a lot!) it's usually very high. i don't know much about JSC or Zig but wondering if this is some limitation/issue with JSC garbage collection for strings. if anyone can verify/check my work that would be great - i have kinda rushed through all this, but i think it's correct. i use taskset to pin the benchmark to a single core. if you don't use taskset then bun runs faster (8μs versus 20μs for the big string) but uses a lot more cpu (~1.8 cores on my setup) - i assume doing GC on another thread. my overall feeling with bun 1.0 is it is far away from being a 1.0 stable release and it all feels a little underhand/misleading to me. maybe i will write something up in coming days to this effect once i have verified these findings. |
Linux
|
I have now modified the benchmark so that the README.md file is not-ASCII. You can see Node's performance go down drastically...
|
I am getting that const file_content = readFileSync(filename).toString("utf-8"); is consistently faster than const file_content = readFileSync(filename, "utf-8"); in Node 20. (But not Bun.) Here are the results on my mac...
Here is the code that I am benchmarking... bench("readFileSync UTF-8", () => {
const file_content = readFileSync(filename, "utf-8");
return file_content.length;
});
bench("readFileSync to string", () => {
const file_content = readFileSync(filename).toString("utf-8");
return file_content.length;
}); |
@billywhizz Your benchmark does not hit the fast path. Particularly the second parameter of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the main goal of this PR is to optimize sync calls by avoiding a branch condition in existing CPP methods right?
The main goal is to separate sync/async/callback methods, open the room for optimizations, such as V8 Fast APIs, and to return the errors on C++ side instead of JS, and as a side effect improve the false paths. I'll follow up with more pull requests in the next couple of weeks to move all fs operations to C++ due to performance concerns. |
Landed in 7e12d0e |
I am probably too late here, while I agree with what it wants to do in general I think the particular approach used by this PR to achieve what it wants is problematic, and is leading a bunch of other PRs to amplify the problems..
I think we should move the JS code back to where they were, and introduce a new |
I opened #49913 to move the implementations back and also introduced some helpers that should simplify the fs sync error migration PRs a lot. |
PR-URL: #49593 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Robert Nagy <ronagy@icloud.com>
Using `await fs.access` has couple of downsides. It creates unnecessary async contexts where async scope can be removed. Also, it creates the possibility of race conditions such as `Time-of-Check to Time-of-Use`. It would be nice if someone can benchmark this. I'm rooting for a performance improvement. Some updates from Node.js land: - There is an open pull request to add V8 Fast API to `existsSync` method - nodejs/node#49893 - Non-existing `existsSync` executions became 30% faster - nodejs/node#49593 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
PR-URL: nodejs#49593 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Robert Nagy <ronagy@icloud.com>
This is a work-in-progress pull request. Opened to receive feedback while I navigate through the issues.
It includes some interesting performance improvements. I'll update the following list while I add more functions.
Improvements
Fixes nodejs/performance#106
cc @nodejs/performance