`readFileSync` is 40% faster than `readFile` for large files #41435

mmomtchev · 2022-01-07T20:37:07Z

What is the problem this feature will solve?

make readFile as fast as readFileSync

What is the feature you are proposing to solve the problem?

read the whole file in one operation to avoid excessive context switching

What alternatives have you considered?

No response

The text was updated successfully, but these errors were encountered:

mmomtchev · 2022-01-12T14:21:32Z

Since the performance loss can appear huge for reading a file in 512KB chunks, I did spent some time to analyze where it is coming from.

This for reading 32MB files. The larger the file, the larger the loss, asymptotically approaching 40%.

async: 44.707s

 Performance counter stats for 'node test2':

         45 291,93 msec task-clock                #    1,011 CPUs utilized          
           144 976      context-switches          #    0,003 M/sec                  
             1 913      cpu-migrations            #    0,042 K/sec                  
         8 201 447      page-faults               #    0,181 M/sec                  
    91 527 619 025      cycles                    #    2,021 GHz                      (49,63%)
     8 654 142 716      stalled-cycles-frontend   #    9,46% frontend cycles idle     (49,81%)
    57 156 542 570      stalled-cycles-backend    #   62,45% backend cycles idle      (50,46%)
    35 767 220 649      instructions              #    0,39  insn per cycle         
                                                  #    1,60  stalled cycles per insn  (50,42%)
     6 229 128 421      branches                  #  137,533 M/sec                    (50,22%)
       254 991 439      branch-misses             #    4,09% of all branches          (49,55%)

      44,780408214 seconds time elapsed

       4,371191000 seconds user
      43,098050000 seconds sys

sync: 34.222s

 Performance counter stats for 'node test3':

         34 753,79 msec task-clock                #    1,014 CPUs utilized          
            14 762      context-switches          #    0,425 K/sec                  
             1 717      cpu-migrations            #    0,049 K/sec                  
         8 199 240      page-faults               #    0,236 M/sec                  
    86 338 397 933      cycles                    #    2,484 GHz                      (49,96%)
     5 263 748 859      stalled-cycles-frontend   #    6,10% frontend cycles idle     (50,06%)
    58 468 528 969      stalled-cycles-backend    #   67,72% backend cycles idle      (49,90%)
    33 312 199 600      instructions              #    0,39  insn per cycle         
                                                  #    1,76  stalled cycles per insn  (50,06%)
     5 691 816 428      branches                  #  163,775 M/sec                    (49,95%)
       158 764 983      branch-misses             #    2,79% of all branches          (50,12%)

      34,283279957 seconds time elapsed

       0,579624000 seconds user
      33,803549000 seconds sys

The only very significant difference is the number of context switches - 10 fold
Most of the total time and most of the additional time is spent in kernel-mode - setting up the read
The user-mode time also increases almost 10 fold (setting up buffers and async tasks) but it is not significant

So why so many context switches? Reading a 32MB file in 512KB blocks means there will be 64 reads. Where do all those context switches come from?

Here are the context switches of one reading cycle (17 is the file, 16 is the eventfd, ie libuv watcher.fd, 13 is the epoll fd ie libuv backend_fd)

* Worker thread comes back from read with 512KB
[pid 42759] read(17, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288

* Worker thread sends its eventfd message
[pid 42759] write(16, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>

* Main thread wakes up in the epoll
[pid 42751] <... epoll_wait resumed>[{EPOLLIN, {u32=16, u64=16}}], 1024, 7903) = 1

* Worker thread continues after write
[pid 42759] <... write resumed>)        = 8

* Main thread starts consuming the eventfd message
[pid 42751] read(16,  <unfinished ...>

* Worker thread goes to sleep on the condition variable
[pid 42759] futex(0x44f6848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>

* Main thread consumes the eventfd message
[pid 42751] <... read resumed>"\1\0\0\0\0\0\0\0", 1024) = 8

* Main thread sends the buffer to JS which asks for more so decides to set up another async job
      and wakes up a worker through the condition variable
[pid 42751] futex(0x44f684c, FUTEX_WAKE_PRIVATE, 1) = 1

* Worker wakes up in the condition variable (NOTE IT IS NOT THE SAME WORKER)
[pid 42760] <... futex resumed>)        = 0

* Main thread goes to sleep on epoll
[pid 42751] epoll_wait(13,  <unfinished ...>

* It consumes the work order item and it releases the main work mutex
     (it obtained it after returning from the condition variable)
[pid 42760] futex(0x44f67e0, FUTEX_WAKE_PRIVATE, 1) = 0

* This is the cost of UV_METRICS_IDLE_TIME
     (it happens in parallel with the read so it is not as bad, but still
      it is an extra syscall at every event loop iteration)
[pid 42751] <... epoll_wait resumed>[], 1024, 0) = 0

* The new worker starts it read
[pid 42760] read(17,  <unfinished ...>

* Main thread goes to sleep again this time for real
[pid 42751] epoll_wait(13,  <unfinished ...>

* Worker thread finishes its read
[pid 42760] <... read resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288

* The cycle starts all over
[pid 42760] write(16, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>

Two things to note here:

Is the cost of UV_METRICS_IDLE_TIME expected?
The threads round-robin is actually free of charge because Linux steps in and makes them all run on the same CPU - see the cpu-mirgations in the perf output

mmomtchev · 2022-01-12T20:19:09Z

I wonder, wouldn't it be possible to continue the reading without waking up the event loop? After all, the event loop is not needed until the final callback?

I have the same problem in gdal-async - I want to be able to schedule a whole lot of work units and run them one after another, preferably all in the same thread, and then to report the work done at the end. I can't schedule them at once, since the worker threads will pick them. I want to schedule them after finishing the previous unit.

The problem is that currently one cannot enqueue work from a worker thread - only the event loop can enqueue work.

This is definitely fixable.

In the case of readFile one will automatically schedule the next read without waking up the event loop until EOF is reached.

This requires that an alternative interface is provided where one can pass either a finishing callback which has to run on the main thread, either an intermediary callback which will run in the worker thread.

worstpractice · 2022-07-07T08:37:00Z

@mmomtchev Wonderful research! Any way to get my hands on your testing script(s)? Would be interesting to see if I can replicate your findings. 😃

mmomtchev · 2022-07-07T15:44:41Z

It nothing but reading a large a file over and over in a loop

Alf71 · 2022-09-26T12:28:21Z

Hi, I have one or two questions regarding this:
In python we use set to read the file and accelerate the process:

mylist = [line.split()[0] for line in open('file.txt','r')]
list = set(mylist)

This works well, as far as we have enough memory, where 1 GB text will require around 2,5 GB RAM.

Now lets consider the same, but using javascript.
Does it simply work with

const fs = require('fs');

const data = fs.readFileSync('./file.txt');

?

No matter how big the file is?
I mean, the bigger the file to be red, the slowest the performance of the script when using LOOP.
But can we chang this behaviour?
Would this work with an even bigger file? 10 GB? 100 GB?
And what is the relation with RAM?

bnoordhuis · 2023-03-26T11:52:49Z

By some coincidence I was thinking about this earlier this week. The problem with fs.readFile() is that it round-trips through libuv's thread pool several times: open file, stat fd, read data, close fd.

For small files that adds considerable overhead. It's way more efficient to add a C++ ReadFile() method that does it all in one go through uv_queue_work(). Libuv's uv_fs_*() functions can be called synchronously and without an event loop.

(If and when libuv grows io_uring support, that might turn into a deoptimization again, but that's still in the future.)

github-actions · 2023-09-24T01:24:27Z

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions · 2023-10-25T01:22:42Z

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

mmomtchev added the feature request Issues that request new features to be added to Node.js. label Jan 7, 2022

mmomtchev mentioned this issue Jan 7, 2022

fs: readFile in one syscall to avoid context switching #41436

Open

Mesteery added fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js. labels Jan 7, 2022

mmomtchev mentioned this issue Jan 13, 2022

Support queuing work from a worker thread (feature request) libuv/libuv#3422

Open

BridgeAR mentioned this issue Aug 20, 2022

fs: improve promise based readFile performance for big files #44295

Merged

github-actions bot added the stale label Mar 26, 2023

bnoordhuis mentioned this issue Mar 26, 2023

linux: UV_METRICS_IDLE_TIME adds overhead? libuv/libuv#3937

Closed

nodejs deleted a comment from github-actions bot Mar 26, 2023

github-actions bot removed the stale label Mar 27, 2023

privatenumber mentioned this issue May 29, 2023

Feat: Async implementation privatenumber/get-tsconfig#46

Closed

github-actions bot added the stale label Sep 24, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`readFileSync` is 40% faster than `readFile` for large files #41435

`readFileSync` is 40% faster than `readFile` for large files #41435

mmomtchev commented Jan 7, 2022

mmomtchev commented Jan 12, 2022

mmomtchev commented Jan 12, 2022

worstpractice commented Jul 7, 2022

mmomtchev commented Jul 7, 2022

Alf71 commented Sep 26, 2022 •

edited

Loading

bnoordhuis commented Mar 26, 2023

github-actions bot commented Sep 24, 2023

github-actions bot commented Oct 25, 2023

readFileSync is 40% faster than readFile for large files #41435

readFileSync is 40% faster than readFile for large files #41435

Comments

mmomtchev commented Jan 7, 2022

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

mmomtchev commented Jan 12, 2022

mmomtchev commented Jan 12, 2022

worstpractice commented Jul 7, 2022

mmomtchev commented Jul 7, 2022

Alf71 commented Sep 26, 2022 • edited Loading

bnoordhuis commented Mar 26, 2023

github-actions bot commented Sep 24, 2023

github-actions bot commented Oct 25, 2023

`readFileSync` is 40% faster than `readFile` for large files #41435

`readFileSync` is 40% faster than `readFile` for large files #41435

Alf71 commented Sep 26, 2022 •

edited

Loading