Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: improve error performance of sync methods #49593

Merged
merged 1 commit into from
Sep 17, 2023

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Sep 10, 2023

This is a work-in-progress pull request. Opened to receive feedback while I navigate through the issues.

It includes some interesting performance improvements. I'll update the following list while I add more functions.

Improvements

  • existsSync
fs/bench-existsSync.js n=1000000 type='existing'                 ***      2.45 %       ±1.36% ±1.81% ±2.36%
fs/bench-existsSync.js n=1000000 type='non-existing'             ***     30.63 %       ±1.49% ±1.99% ±2.59%
fs/bench-existsSync.js n=1000000 type='non-flat-existing'         **      2.57 %       ±1.52% ±2.02% ±2.64%
  • accessSync
fs/bench-accessSync.js n=100000 type='existing'                 ***      3.18 %       ±1.41% ±1.88% ±2.45%
fs/bench-accessSync.js n=100000 type='non-existing'             ***    100.88 %       ±3.34% ±4.48% ±5.90%
fs/bench-accessSync.js n=100000 type='non-flat-existing'        ***      3.29 %       ±1.48% ±1.97% ±2.56%
  • copyFileSync
fs/bench-copyFileSync.js n=10000 type='invalid'        ***     83.57 %       ±1.58% ±2.11% ±2.75%
fs/bench-copyFileSync.js n=10000 type='valid'                  -0.89 %       ±1.90% ±2.53% ±3.30%
  • readFileSync
fs/readFileSync.js n=600 path='existing' encoding='undefined'                    -0.48 %       ±2.03% ±2.70% ±3.51%
fs/readFileSync.js n=600 path='existing' encoding='utf8'                          1.38 %       ±1.62% ±2.16% ±2.82%
fs/readFileSync.js n=600 path='non-existing' encoding='undefined'                 1.40 %       ±1.72% ±2.29% ±3.01%
fs/readFileSync.js n=600 path='non-existing' encoding='utf8'             ***     71.66 %       ±2.50% ±3.35% ±4.40%
  • statSync
fs/bench-statSync-failure.js statSyncType='noThrow' n=1000000        ***     33.13 %       ±0.92% ±1.23% ±1.60%
fs/bench-statSync-failure.js statSyncType='throw' n=1000000          ***     89.53 %       ±0.98% ±1.31% ±1.72%
fs/bench-statSync.js statSyncType='fstatSync' n=1000000                      -0.67 %       ±3.56% ±4.75% ±6.19%
fs/bench-statSync.js statSyncType='lstatSync' n=1000000                       1.22 %       ±2.83% ±3.77% ±4.91%
fs/bench-statSync.js statSyncType='statSync' n=1000000                        0.52 %       ±2.62% ±3.49% ±4.54%
  • openSync
fs/bench-openSync.js n=100000 type='existing'                     0.30 %       ±1.45% ±1.94% ±2.53%
fs/bench-openSync.js n=100000 type='non-existing'        ***     92.76 %       ±4.05% ±5.42% ±7.12%

Fixes nodejs/performance#106

cc @nodejs/performance

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Sep 10, 2023
@anonrig anonrig force-pushed the node-fs-sync-rewrite branch 2 times, most recently from 0afabd5 to 80c845b Compare September 10, 2023 19:10
@anonrig anonrig added the performance Issues and PRs related to the performance of Node.js. label Sep 10, 2023
src/node_file_sync.cc Outdated Show resolved Hide resolved
src/node_file_sync.cc Outdated Show resolved Hide resolved
@lemire
Copy link
Member

lemire commented Sep 11, 2023

I wrote a small benchmark which simply loads a small text file:

https://github.com/lemire/js-file-loading-benchmark

I don't have (yet) the numbers under Linux with this PR, but bun is at least 2x faster than Node.js 20 under Linux. The difference is smaller under macOS.

On my mac (ARM silicon), this PR does not seem to help:

$ node bench_loadfile.js
cpu: Apple M2
runtime: node v20.6.1 (arm64-darwin)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md    8.34 µs/iter     (8.12 µs … 8.66 µs)   8.42 µs   8.66 µs   8.66 µs

$ ../node/node bench_loadfile.js
cpu: Apple M2
runtime: node v21.0.0-pre (arm64-darwin)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md    8.31 µs/iter     (8.04 µs … 9.03 µs)    8.4 µs   9.03 µs   9.03 µs

$ bun run bench_loadfile.js
cpu: Apple M2
runtime: bun 1.0.0 (arm64-darwin)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md    6.73 µs/iter     (6.55 µs … 7.05 µs)   6.76 µs   7.05 µs   7.05 µs

@lemire
Copy link
Member

lemire commented Sep 11, 2023

Results under Linux. Note that Bun is actually 3x faster at loading a small file, so it is quite bad.


cpu: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
runtime: node v20.6.1 (x64-linux)
benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md   13.64 µs/iter     (11.49 µs … 1.1 ms)  13.06 µs  18.31 µs  29.59 µs


$ ../node/node bench_loadfile.js
cpu: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
runtime: node v21.0.0-pre (x64-linux)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md   13.79 µs/iter   (11.8 µs … 896.62 µs)  13.17 µs     20 µs   32.1 µs


cpu: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
runtime: bun 1.0.0 (x64-linux)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md    4.36 µs/iter     (4.18 µs … 5.33 µs)   4.35 µs   5.33 µs   5.33 µs

Benchmark: https://github.com/lemire/js-file-loading-benchmark

@billywhizz
Copy link
Contributor

billywhizz commented Sep 11, 2023

hi daniel (@lemire). when i run your bench on my laptop (ubuntu 22.04, Intel(R) Core(TM) i7-6560U CPU @ 2.20GHz) i see something similar (but also different for bun).

8.7 µs for bun
17 µs for node20

but the mitata bench seems to run for only 10 iterations, which seems an incredibly short time to get a meaningful result.

when i run using my own benchmarking script i get a completely different result for node20

8.5 µs for bun
9.4 µs for node20

is it possible there is some issue with mitata? or possibly not enough iterations for JIT/fastcalls to kick in? here is a gist with the code i am running. can you reproduce this on your linux machine?

also, using my own custom v8-based runtime i can get ~7.8 µs per iteration for this test using v8 fastcalls, so it should in theory be possible to go even faster than bun. i am wondering where the 4.3 µs is coming from in your results above for bun - maybe it is using different syscalls on the linux you are running on? can you do an strace?

@lemire
Copy link
Member

lemire commented Sep 11, 2023

@billywhizz

Here are my results under Linux using your script.

running node benchmark
Now using node v20.6.1 (npm v9.8.1)
1819
readFileSync        1220 ms rate     163898 μs/iter       6.10 rss 56072
readFileSync        1248 ms rate     160142 μs/iter       6.24 rss 55512
readFileSync        1164 ms rate     171777 μs/iter       5.82 rss 55948
readFileSync        1132 ms rate     176636 μs/iter       5.66 rss 56256
readFileSync         803 ms rate     248826 μs/iter       4.02 rss 56256
readFileSync         705 ms rate     283397 μs/iter       3.53 rss 56448
readFileSync         707 ms rate     282528 μs/iter       3.54 rss 56448
readFileSync         707 ms rate     282603 μs/iter       3.54 rss 56448
readFileSync         711 ms rate     281229 μs/iter       3.56 rss 56460
readFileSync         705 ms rate     283330 μs/iter       3.53 rss 56468

running bun benchmark

1819
readFileSync         786 ms rate     254297 μs/iter       3.93 rss 121616
readFileSync         667 ms rate     299512 μs/iter       3.34 rss 122852
readFileSync         811 ms rate     246346 μs/iter       4.06 rss 122584
readFileSync         795 ms rate     251526 μs/iter       3.98 rss 122540
readFileSync         807 ms rate     247624 μs/iter       4.04 rss 122460
readFileSync         803 ms rate     248877 μs/iter       4.02 rss 122428
readFileSync         782 ms rate     255430 μs/iter       3.91 rss 122520
readFileSync         542 ms rate     368534 μs/iter       2.71 rss 122640
readFileSync         515 ms rate     387738 μs/iter       2.58 rss 122420
readFileSync         519 ms rate     384778 μs/iter       2.60 rss 119948

strace:

Node:

openat(AT_FDCWD, "./README.md", O_RDONLY|O_CLOEXEC) = 19
statx(19, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1819, ...}) = 0
read(19, "# File benchmarks in JavaScript\n"..., 1819) = 1819
close(19)

Bun:

openat(AT_FDCWD, "./README.md", O_RDONLY|O_NOCTTY) = 12
fstat(12, {st_mode=S_IFREG|0644, st_size=1819, ...}) = 0
read(12, "# File benchmarks in JavaScript\n"..., 1835) = 1819
read(12, "", 16)                        = 0
close(12)

@anonrig
Copy link
Member Author

anonrig commented Sep 11, 2023

Thanks @lemire, moving to file descriptors and replacing stat with fstat should improve the performance. I'll update the pull request when I have the chance.

@billywhizz
Copy link
Contributor

@lemire thanks for running. those results look odd. i will have a look again to make sure i gave you the right script.

@anonrig
Copy link
Member Author

anonrig commented Sep 11, 2023

@lemire thanks for running. those results look odd. i will have a look again to make sure i gave you the right script.

@billywhizz How do you leverage v8 fast api with non-flat strings? And returning a string is a limitation of the fast api. I'm extremely interested in learning your solution for this limitation.

@lemire
Copy link
Member

lemire commented Sep 11, 2023

@billywhizz I can give you access to the hardware I am using if you'd like. Email me. (Email easy to find.)

@billywhizz
Copy link
Contributor

billywhizz commented Sep 11, 2023

@anonrig i'll see if i can share the code for the runtime over next few days. it works differently to node.js. the code to read the file in JS looks like this.

const { fs } = spin

const O_RDONLY = 0
const stat = new Uint8Array(160)
const stat32 = new Uint32Array(stat.buffer)

function readFile (path, flags = O_RDONLY) {
  const fd = fs.open(path, flags)
  assert(fd > 0)
  let r = fs.fstat(fd, stat)
  assert(r === 0)
  const size = Number(stat32[12])
  const buf = new Uint8Array(size)
  let off = 0
  let len = fs.read(fd, buf, size)
  while (len > 0) {
    off += len
    if (off === size) break
    len = fs.read(fd, buf, size)
  }
  off += len
  r = fs.close(fd)
  assert(r === 0)
  assert(off >= size)
  return buf
}

and each of the calls to fs.open, fs.fstat, fs.read and fs.close is a fastcall. the overhead on a fastcall is only about 3-4 nanoseconds so it makes pushing a lot of these lower level calls into JS from C++ land feasible.

i feel bad for you with all the sh*t you have taken on twitter since the bun release so hopefully i can help you out with some benchmarking and testing in coming weeks. please ignore the idiots and keep doing what you are doing. OSS for the win! 🙏

@billywhizz
Copy link
Contributor

@lemire thanks! i will get in touch. i should have some time to help out over next few weeks. i'd be interested to see why your results are coming out different to mine. this is what i see running the code in that gist.

bun files.mjs 
1818
readFileSync        1826 ms rate     109487 μs/iter       9.13 rss 86356
readFileSync        1700 ms rate     117625 μs/iter       8.50 rss 89616
readFileSync        1689 ms rate     118350 μs/iter       8.45 rss 89372
readFileSync        1688 ms rate     118465 μs/iter       8.44 rss 89488
readFileSync        1693 ms rate     118106 μs/iter       8.47 rss 89712
readFileSync        1671 ms rate     119679 μs/iter       8.36 rss 89756
readFileSync        1682 ms rate     118865 μs/iter       8.41 rss 89548
readFileSync        1683 ms rate     118782 μs/iter       8.42 rss 87828
readFileSync        1685 ms rate     118686 μs/iter       8.43 rss 87544
readFileSync        1703 ms rate     117382 μs/iter       8.52 rss 87940
./node files.mjs 
1818
readFileSync        1921 ms rate     104073 μs/iter       9.61 rss 54540
readFileSync        1850 ms rate     108085 μs/iter       9.25 rss 54824
readFileSync        1810 ms rate     110463 μs/iter       9.05 rss 55464
readFileSync        1819 ms rate     109919 μs/iter       9.10 rss 55592
readFileSync        1823 ms rate     109707 μs/iter       9.12 rss 55592
readFileSync        1825 ms rate     109575 μs/iter       9.13 rss 55592
readFileSync        1820 ms rate     109871 μs/iter       9.10 rss 55592
readFileSync        1820 ms rate     109885 μs/iter       9.10 rss 55592
readFileSync        1815 ms rate     110150 μs/iter       9.08 rss 55592
readFileSync        1883 ms rate     106209 μs/iter       9.42 rss 55592

@billywhizz
Copy link
Contributor

billywhizz commented Sep 11, 2023

@anonrig @lemire @joyeecheung as you can see above, in all the benchmarks i have been doing bun/JSC is a real memory hog. will be interesting to see if that can be resolved easily. there is apparently a --smol flag you can pass bun to get it to gc more aggressively.

@lemire
Copy link
Member

lemire commented Sep 11, 2023

@billywhizz

the code to read the file in JS looks like this.

In our case, a whole lot of time is spent in UTF-8 processing within v8. I now realize that it is a fundamental difference between our two benchmarks.

Append the following to your benchmark:

function utf8fn() {
  const buf = readFileSync('./README.md', 'UTF-8')
  return buf.length
}

console.log(utf8fn())

run('readFileSync', utf8fn, 200000, 10)

Here is what I get...

running node benchmark
Now using node v20.6.1 (npm v9.8.1)
1819
readFileSync        1219 ms rate     164003 μs/iter       6.10 rss 56064
readFileSync        1203 ms rate     166145 μs/iter       6.02 rss 55788
readFileSync         892 ms rate     224130 μs/iter       4.46 rss 56332
readFileSync         725 ms rate     275791 μs/iter       3.63 rss 56160
readFileSync         718 ms rate     278244 μs/iter       3.59 rss 56388
readFileSync         723 ms rate     276313 μs/iter       3.62 rss 56252
readFileSync         722 ms rate     276673 μs/iter       3.61 rss 54976
readFileSync         724 ms rate     275949 μs/iter       3.62 rss 56408
readFileSync         720 ms rate     277473 μs/iter       3.60 rss 56180
readFileSync         727 ms rate     274961 μs/iter       3.64 rss 56416
UTF-8:
1780
readFileSync        1957 ms rate     102145 μs/iter       9.79 rss 96960
readFileSync        1924 ms rate     103912 μs/iter       9.62 rss 95644
readFileSync        1936 ms rate     103281 μs/iter       9.68 rss 94544
readFileSync        2207 ms rate      90613 μs/iter      11.04 rss 94696
readFileSync        1911 ms rate     104644 μs/iter       9.56 rss 94656
readFileSync        1920 ms rate     104133 μs/iter       9.60 rss 108232
readFileSync        1918 ms rate     104238 μs/iter       9.59 rss 105836
readFileSync        1912 ms rate     104572 μs/iter       9.56 rss 103984
readFileSync        1911 ms rate     104638 μs/iter       9.56 rss 102228
readFileSync        1924 ms rate     103920 μs/iter       9.62 rss 100600

running bun benchmark

1819
readFileSync         746 ms rate     267817 μs/iter       3.73 rss 110968
readFileSync         501 ms rate     398930 μs/iter       2.51 rss 112756
readFileSync         493 ms rate     405312 μs/iter       2.47 rss 112736
readFileSync         495 ms rate     403799 μs/iter       2.48 rss 112544
readFileSync         492 ms rate     405919 μs/iter       2.46 rss 112460
readFileSync         504 ms rate     396424 μs/iter       2.52 rss 112476
readFileSync         489 ms rate     408277 μs/iter       2.45 rss 112492
readFileSync         490 ms rate     407824 μs/iter       2.45 rss 112396
readFileSync         489 ms rate     408351 μs/iter       2.45 rss 112284
readFileSync         490 ms rate     407714 μs/iter       2.45 rss 110044
UTF-8:
1780
readFileSync         545 ms rate     366522 μs/iter       2.73 rss 112012
readFileSync         539 ms rate     370412 μs/iter       2.70 rss 112004
readFileSync         540 ms rate     370237 μs/iter       2.70 rss 113392
readFileSync         541 ms rate     369662 μs/iter       2.71 rss 113392
readFileSync         539 ms rate     370719 μs/iter       2.70 rss 113372
readFileSync         539 ms rate     370848 μs/iter       2.70 rss 113376
readFileSync         538 ms rate     371312 μs/iter       2.69 rss 113432
readFileSync         539 ms rate     370824 μs/iter       2.70 rss 113332
readFileSync         540 ms rate     370163 μs/iter       2.70 rss 113408
readFileSync         543 ms rate     367987 μs/iter       2.72 rss 112912

Notice how Node.js 20 does really, really poorly when loading UTF-8 data?

@billywhizz
Copy link
Contributor

billywhizz commented Sep 11, 2023

@lemire that makes sense - my fault for missing that in your original post. sorry for the confusion.

i actually did quite a bit of benchmarking on this last year. i'll see if i can dig out my notes. from memory, String::NewFromUtf8 is very slow compared to whatever bun/JSC is doing. i seem to remember String::NewFromTwoByte and String::NewFromOneByte are significantly faster. Maybe we can have a dig into the v8 source and see if a patch would be viable?

more info: https://twitter.com/jarredsumner/status/1496455856224972808?lang=en

this isn't going to be an easy one i think.

@billywhizz
Copy link
Contributor

billywhizz commented Sep 12, 2023

@anonrig @lemire @joyeecheung

i have pushed another benchmark to my gist with a specific test for utf8 decoding. it's kinda interesting.

for small strings, bun is ~2.5x faster on my setup.

taskset --cpu-list 0 bun decode.mjs

decodesmall         1175 ms rate    5106382 ns/iter     195.83 rss 547676
decodesmall         1164 ms rate    5154639 ns/iter     194.00 rss 547796
decodesmall         1172 ms rate    5119453 ns/iter     195.33 rss 547796
decodesmall         1187 ms rate    5054759 ns/iter     197.83 rss 547796
decodesmall         1173 ms rate    5115089 ns/iter     195.50 rss 547772
taskset --cpu-list 0 ./node decode.mjs

decodesmall         2902 ms rate    2067539 ns/iter     483.67 rss 47348
decodesmall         2903 ms rate    2066827 ns/iter     483.83 rss 47348
decodesmall         2891 ms rate    2075406 ns/iter     481.83 rss 47348
decodesmall         2935 ms rate    2044293 ns/iter     489.17 rss 47348
decodesmall         2918 ms rate    2056202 ns/iter     486.33 rss 47348

for large strings, it is also ~2.5x faster

taskset --cpu-list 0 bun decode.mjs

decodebig           1255 ms rate     398406 μs/iter       2.51 rss 571452
decodebig           1305 ms rate     383141 μs/iter       2.61 rss 576904
decodebig           1320 ms rate     378787 μs/iter       2.64 rss 580012
decodebig           1394 ms rate     358680 μs/iter       2.79 rss 580536
decodebig           1260 ms rate     396825 μs/iter       2.52 rss 580152
taskset --cpu-list 0 ./node decode.mjs

decodebig           3451 ms rate     144885 μs/iter       6.90 rss 47348
decodebig           3453 ms rate     144801 μs/iter       6.91 rss 47348
decodebig           3437 ms rate     145475 μs/iter       6.87 rss 47348
decodebig           3491 ms rate     143225 μs/iter       6.98 rss 47348
decodebig           3446 ms rate     145095 μs/iter       6.89 rss 47348

but, notice the bun version uses ~10x the memory.

when i run bun with --smol flag to force more agressive GC, bun is ~3x slower than node for large strings. 😕

taskset --cpu-list 0 bun --smol decode.mjs

decodesmall         1297 ms rate    4626060 ns/iter     216.17 rss 68276
decodesmall         1305 ms rate    4597701 ns/iter     217.50 rss 68276
decodesmall         1291 ms rate    4647560 ns/iter     215.17 rss 68276
decodesmall         1423 ms rate    4216444 ns/iter     237.17 rss 68276
decodesmall         1297 ms rate    4626060 ns/iter     216.17 rss 68276
decodebig          10100 ms rate      49504 μs/iter      20.20 rss 68276
decodebig          10036 ms rate      49820 μs/iter      20.07 rss 68232
decodebig          10071 ms rate      49647 μs/iter      20.14 rss 68096
decodebig           9885 ms rate      50581 μs/iter      19.77 rss 68224
decodebig           9955 ms rate      50226 μs/iter      19.91 rss 68224

i think bun is doing something "tricky" here in order to win the benchmark game? i've always noticed jarred is very careful never to show memory usage in his benchmarks and from my testing (i've done a lot!) it's usually very high. i don't know much about JSC or Zig but wondering if this is some limitation/issue with JSC garbage collection for strings.

if anyone can verify/check my work that would be great - i have kinda rushed through all this, but i think it's correct. i use taskset to pin the benchmark to a single core. if you don't use taskset then bun runs faster (8μs versus 20μs for the big string) but uses a lot more cpu (~1.8 cores on my setup) - i assume doing GC on another thread.

my overall feeling with bun 1.0 is it is far away from being a 1.0 stable release and it all feels a little underhand/misleading to me. maybe i will write something up in coming days to this effect once i have verified these findings.

@lemire
Copy link
Member

lemire commented Sep 12, 2023

Linux perf tells me that Node is spending a lot of time (about half its time in some cases) in a couple of hot functions when executing readFileSync(filename, "utf-8");.

  35.63%  node     node                  [.] v8::internal::Utf8Decoder::Utf8Decoder
  13.54%  node     node                  [.] v8::internal::Utf8Decoder::Decode<unsigned short>

@lemire
Copy link
Member

lemire commented Sep 12, 2023

I have now modified the benchmark so that the README.md file is not-ASCII. You can see Node's performance go down drastically...


UTF-8 load
Raw load
cpu: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
runtime: node v20.6.1 (x64-linux)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md   14.16 µs/iter  (12.14 µs … 583.17 µs)  13.61 µs  18.94 µs  32.96 µs
README.md    6.83 µs/iter      (5.24 µs … 1.2 ms)   6.99 µs  15.06 µs  21.52 µs

running bun benchmark

UTF-8 load
Raw load
cpu: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
runtime: bun 1.0.0 (x64-linux)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
README.md    4.75 µs/iter      (4.6 µs … 5.57 µs)   4.75 µs   5.57 µs   5.57 µs
README.md    4.13 µs/iter     (3.84 µs … 4.79 µs)   4.22 µs   4.79 µs   4.79 µs

@lemire
Copy link
Member

lemire commented Sep 12, 2023

I am getting that

  const file_content = readFileSync(filename).toString("utf-8");

is consistently faster than

  const file_content = readFileSync(filename, "utf-8");

in Node 20. (But not Bun.)

Here are the results on my mac...

$ node bench_loadfile.js
cpu: Apple M2
runtime: node v19.4.0 (arm64-darwin)

benchmark                   time (avg)             (min … max)       p75       p99      p995
-------------------------------------------------------------- -----------------------------
readFileSync UTF-8         9.8 µs/iter    (8.58 µs … 735.5 µs)   9.75 µs  13.79 µs  15.96 µs
readFileSync to string    8.55 µs/iter     (8.46 µs … 8.73 µs)   8.55 µs   8.73 µs   8.73 µs


$ node bench_loadfile.js
cpu: Apple M2
runtime: node v19.4.0 (arm64-darwin)

benchmark                   time (avg)             (min … max)       p75       p99      p995
-------------------------------------------------------------- -----------------------------
readFileSync UTF-8        9.82 µs/iter   (8.58 µs … 768.21 µs)   9.71 µs  13.83 µs  16.25 µs
readFileSync to string    8.67 µs/iter     (8.47 µs … 9.39 µs)   8.71 µs   9.39 µs   9.39 µs

Here is the code that I am benchmarking...

bench("readFileSync UTF-8", () => {
  const file_content = readFileSync(filename, "utf-8");
  return file_content.length;
});
bench("readFileSync to string", () => {
  const file_content = readFileSync(filename).toString("utf-8");
  return file_content.length;
});

@anonrig
Copy link
Member Author

anonrig commented Sep 12, 2023

@billywhizz Your benchmark does not hit the fast path. Particularly the second parameter of the readFileSync should be lowercase in https://gist.github.com/billywhizz/6592c0489e030bc52941c8b933ce2dcd#file-files-mjs-L5.

@anonrig anonrig requested a review from Qard September 17, 2023 03:12
Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the main goal of this PR is to optimize sync calls by avoiding a branch condition in existing CPP methods right?

benchmark/fs/bench-accessSync.js Show resolved Hide resolved
benchmark/fs/bench-accessSync.js Show resolved Hide resolved
benchmark/fs/bench-copyFileSync.js Show resolved Hide resolved
benchmark/fs/bench-copyFileSync.js Show resolved Hide resolved
@anonrig
Copy link
Member Author

anonrig commented Sep 17, 2023

I suppose the main goal of this PR is to optimize sync calls by avoiding a branch condition in existing CPP methods right?

The main goal is to separate sync/async/callback methods, open the room for optimizations, such as V8 Fast APIs, and to return the errors on C++ side instead of JS, and as a side effect improve the false paths.

I'll follow up with more pull requests in the next couple of weeks to move all fs operations to C++ due to performance concerns.

@anonrig anonrig added the commit-queue Add this label to land a pull request using GitHub Actions. label Sep 17, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Sep 17, 2023
@nodejs-github-bot nodejs-github-bot merged commit 7e12d0e into nodejs:main Sep 17, 2023
33 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 7e12d0e

@joyeecheung
Copy link
Member

joyeecheung commented Sep 27, 2023

I am probably too late here, while I agree with what it wants to do in general I think the particular approach used by this PR to achieve what it wants is problematic, and is leading a bunch of other PRs to amplify the problems..

  1. We do not need a new lib/internal/fs/sync.js. Every change in the JS land can be done in the original file. Bulk-moving code to a new file when it's not really necessary is going to make backporting harder and make git blame harder to use.
  2. This is unnecessarily introducing new binding functions by copy-pasting code from the original implementation to switch to the new helper, when the actual change is quite small - we can simply throw in C++ land by introducing a new SyncCall helper that throws instead of setting context, and then you basically just need 2-3 line of changes in each implementaion in the sync branch of the original implementation, instead of creating a new sync implementation with most code copied from the original one that handles both & repeated. The new implementations also omits PrintSyncTrace() which is going to break --trace-sync-io. The repitition can make the code harder to maintain, because now when we need to change the prologue of both the sync and the async version, we need to change them in two places and make sure that they are in sync, even though they can simply share the code path and had been sharing that in the first place.

I think we should move the JS code back to where they were, and introduce a new SyncCall helper for throwing in C++ land, and simplify what this PR already landed before we move forward with other similar PRs. Otherwise we are heading towards more maintainability and backporting problems...

@joyeecheung
Copy link
Member

I opened #49913 to move the implementations back and also introduced some helpers that should simplify the fs sync error migration PRs a lot.

ruyadorno pushed a commit that referenced this pull request Sep 28, 2023
PR-URL: #49593
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
This was referenced Sep 28, 2023
huozhi pushed a commit to vercel/next.js that referenced this pull request Oct 3, 2023
Using `await fs.access` has couple of downsides. It creates unnecessary
async contexts where async scope can be removed. Also, it creates the
possibility of race conditions such as `Time-of-Check to Time-of-Use`.

It would be nice if someone can benchmark this. I'm rooting for a
performance improvement.

Some updates from Node.js land:

- There is an open pull request to add V8 Fast API to `existsSync`
method - nodejs/node#49893
- Non-existing `existsSync` executions became 30% faster -
nodejs/node#49593

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
alexfernandez pushed a commit to alexfernandez/node that referenced this pull request Nov 1, 2023
PR-URL: nodejs#49593
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. lib / src Issues and PRs related to general changes in the lib or src directory. needs-benchmark-ci PR that need a benchmark CI run. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js. review wanted PRs that need reviews.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of node:fs