Slow disk IO on MacOS (potentially Windows too) #35907

IanButterworth · 2020-05-15T23:14:53Z

Benchmarks reports from SystemBenchmark.jl that have been community submitted have shown that disk IO performance is impeded on MacOS and potentially Windows too.

Example 1: A same-hardware test on a 2015 Macbook Pro, booted into Arch Linux and MacOS shows mostly much worse disk IO in MacOS
The tests are in milliseconds here (except peakflops), and the factor is test / res, so higher numbers indicate that the test (MacOS) is running slower than the ref (Arch Linux)

│ Row │ cat         │ testname          │ units   │ ref_res                                   │ test_res                                  │ factor    │
│     │ String      │ String            │ String? │ Any                                       │ Any                                       │ Any       │
├─────┼─────────────┼───────────────────┼─────────┼───────────────────────────────────────────┼───────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer       │ missing │ 0.2.0                                     │ 0.2.0                                     │ Equal     │
│ 2   │ info        │ JuliaVer          │ missing │ 1.4.1                                     │ 1.4.1                                     │ Equal     │
│ 3   │ info        │ OS                │ missing │ Linux (x86_64-pc-linux-gnu)               │ macOS (x86_64-apple-darwin18.7.0)         │ Not Equal │
│ 4   │ info        │ CPU               │ missing │ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz │ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz │ Equal     │
│ 5   │ info        │ WORD_SIZE         │ missing │ 64                                        │ 64                                        │ Equal     │
│ 6   │ info        │ LIBM              │ missing │ libopenlibm                               │ libopenlibm                               │ Equal     │
│ 7   │ info        │ LLVM              │ missing │ libLLVM-8.0.1 (ORCJIT, haswell)           │ libLLVM-8.0.1 (ORCJIT, haswell)           │ Equal     │
│ 8   │ info        │ GPU               │ missing │ missing                                   │ missing                                   │ Not Equal │
│ 9   │ cpu         │ FloatMul          │ missing │ 1.3759999999999998e-6                     │ 1.3879999999999999e-6                     │ 1.00872   │
│ 10  │ cpu         │ FusedMulAdd       │ missing │ 1.4e-8                                    │ 3.1e-8                                    │ 2.21429   │
│ 11  │ cpu         │ FloatSin          │ missing │ 5.1799999999999995e-6                     │ 5.191e-6                                  │ 1.00212   │
│ 12  │ cpu         │ VecMulBroad       │ missing │ 3.6742943548387095e-5                     │ 5.503947368421053e-5                      │ 1.49796   │
│ 13  │ cpu         │ CPUMatMul         │ missing │ 0.022055                                  │ 0.0228935                                 │ 1.03802   │
│ 14  │ cpu         │ MatMulBroad       │ missing │ 0.003428                                  │ 0.005100375                               │ 1.48786   │
│ 15  │ cpu         │ 3DMulBroad        │ missing │ 0.0013702                                 │ 0.0015412                                 │ 1.1248    │
│ 16  │ cpu         │ peakflops         │ missing │ 1.530257840890542e11                      │ 1.4811308237589566e11                     │ 0.967896  │
│ 17  │ cpu         │ FFMPEGH264Write   │ missing │ 151.23031                                 │ 161.079387                                │ 1.06513   │
│ 18  │ mem         │ DeepCopy          │ missing │ 0.00016894915254237288                    │ 0.00017180699481865284                    │ 1.01692   │

│ 19  │ diskio      │ DiskWrite1KB      │ missing │ 0.0297705                                 │ 0.11039                                   │ 3.70803   │ <<
│ 20  │ diskio      │ DiskWrite1MB      │ missing │ 0.871518                                  │ 0.3283265                                 │ 0.376729  │ <<
│ 21  │ diskio      │ DiskRead1KB       │ missing │ 0.00951925                                │ 0.0330435                                 │ 3.47123   │ <<
│ 22  │ diskio      │ DiskRead1MB       │ missing │ 0.1272875                                 │ 1.0485435                                 │ 8.2376    │ <<

│ 23  │ loading     │ JuliaLoad         │ missing │ 119.903458                                │ 180.141969                                │ 1.50239   │
│ 24  │ compilation │ compilecache      │ missing │ 320.3946125                               │ 344.544143                                │ 1.07537   │
│ 25  │ compilation │ create_expr_cache │ missing │ 1.053377                                  │ 7.869244                                  │ 7.47049   │

Example 2: A same hardware test on a 2018 Macbook Pro with Parallels-hosted Ubuntu 18.04 VM (Ref), vs. the native MacOS (Test)

On a general note, I was surprised how well the Ubuntu VM did here

32×6 DataFrame
│ Row │ cat         │ testname                  │ units   │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String                    │ String? │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────────────┼─────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer               │ missing │ 0.3.0                                    │ 0.3.0                                    │ Equal     │
│ 2   │ info        │ JuliaVer                  │ missing │ 1.4.1                                    │ 1.4.1                                    │ Equal     │
│ 3   │ info        │ OS                        │ missing │ Linux (x86_64-pc-linux-gnu)              │ macOS (x86_64-apple-darwin18.7.0)        │ Not Equal │
│ 4   │ info        │ CPU                       │ missing │ Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz │ Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz │ Equal     │
│ 5   │ info        │ WORD_SIZE                 │ missing │ 64                                       │ 64                                       │ Equal     │
│ 6   │ info        │ LIBM                      │ missing │ libopenlibm                              │ libopenlibm                              │ Equal     │
│ 7   │ info        │ LLVM                      │ missing │ libLLVM-8.0.1 (ORCJIT, skylake)          │ libLLVM-8.0.1 (ORCJIT, skylake)          │ Equal     │
│ 8   │ info        │ GPU                       │ missing │ missing                                  │ missing                                  │ Not Equal │
│ 9   │ cpu         │ FloatMul                  │ ms      │ 1.7070000000000001e-6                    │ 1.7190000000000002e-6                    │ 1.00703   │
│ 10  │ cpu         │ FusedMulAdd               │ ms      │ 1.709e-6                                 │ 1.716e-6                                 │ 1.0041    │
│ 11  │ cpu         │ FloatSin                  │ ms      │ 5.093e-6                                 │ 4.927e-6                                 │ 0.967406  │
│ 12  │ cpu         │ VecMulBroad               │ ms      │ 3.742237903225806e-5                     │ 4.559270516717325e-5                     │ 1.21833   │
│ 13  │ cpu         │ CPUMatMul                 │ ms      │ 0.021145                                 │ 0.0367385                                │ 1.73746   │
│ 14  │ cpu         │ MatMulBroad               │ ms      │ 0.003607                                 │ 0.0189734375                             │ 5.26017   │
│ 15  │ cpu         │ 3DMulBroad                │ ms      │ 0.0014235                                │ 0.00169535                               │ 1.19097   │
│ 16  │ cpu         │ peakflops                 │ flops   │ 1.4902694486333694e11                    │ 2.6443904974091913e11                    │ 1.77444   │
│ 17  │ cpu         │ FFMPEGH264Write           │ ms      │ 121.60521                                │ 143.683998                               │ 1.18156   │
│ 18  │ mem         │ DeepCopy                  │ ms      │ 0.00019974039938556066                   │ 0.00019046474820143885                   │ 0.953561  │
│ 19  │ mem         │ Bandwidth10kB             │ MiB/s   │ 96002.55773217192                        │ 99035.61046884404                        │ 1.03159   │
│ 20  │ mem         │ Bandwidth100kB            │ MiB/s   │ 41807.6505372956                         │ 43489.184021444205                       │ 1.04022   │
│ 21  │ mem         │ Bandwidth1MB              │ MiB/s   │ 21869.752938891692                       │ 26635.227381825163                       │ 1.2179    │
│ 22  │ mem         │ Bandwidth10MB             │ MiB/s   │ 10084.215207687876                       │ 11624.277395933892                       │ 1.15272   │
│ 23  │ mem         │ Bandwidth100MB            │ MiB/s   │ 10347.835483172948                       │ 8848.556421771302                        │ 0.855112  │

│ 24  │ diskio      │ DiskWrite1KB              │ ms      │ 0.051837                                 │ 0.131                                    │ 2.52715   │ <<
│ 25  │ diskio      │ DiskWrite1MB              │ ms      │ 1.1210205                                │ 0.518321                                 │ 0.462365  │ <<
│ 26  │ diskio      │ DiskRead1KB               │ ms      │ 0.0114043                                │ 0.0964005                                │ 8.453     │ <<
│ 27  │ diskio      │ DiskRead1MB               │ ms      │ 0.166096                                 │ 1.016004                                 │ 6.11697   │ <<

│ 28  │ loading     │ JuliaLoad                 │ ms      │ 126.233121                               │ 216.478817                               │ 1.71491   │
│ 29  │ compilation │ compilecache              │ ms      │ 277.6027165                              │ 357.7933795                              │ 1.28887   │
│ 30  │ compilation │ success_create_expr_cache │ ms      │ 269.5016405                              │ 370.772621                               │ 1.37577   │
│ 31  │ compilation │ create_expr_cache         │ ms      │ 3.351107                                 │ 12.492654                                │ 3.72792   │
│ 32  │ compilation │ output-ji-substart        │ ms      │ 40.68284                                 │ 27.275530500000002                       │ 0.670443  │

Further, plotting the full results so far shows a fast clustering of linux disk io results, while MacOS & windows struggle (bottom left here). There are some Linux outliers which according to users are systems using slower storage devices such as SD cards.

Windows needs more testing. So if anyone with a windows machine with a linux VM could run this, it would be helpful:

using SystemBenchmark
res = runbenchmark()
savebenchmark("res_windows.txt", res)
... # repeat on the linux VM
comp = compare(readbenchmark("res_windows.txt),readbenchmark("res_linux.txt))
show(comp, allrows=true, allcols=true)

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2020-05-18T17:05:04Z

It seems that stat calls are more expensive on the mac's filesystem; #35925 addresses that for some cases. But I believe many of these are filesystem differences not due to julia. We should drill down and find out.

IanButterworth · 2020-05-18T18:46:18Z

Curiously I just gave #35925 a go on MacOS and actually think I see a regression from master (note that this is a comparison to the built-in SystemBenchmark ref, linux 1.4.1):
macos_fasterread.txt

│ Row │ cat         │ testname                  │ units   │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String                    │ String? │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────────────┼─────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer               │         │ 0.3.0                                    │ 0.3.2                                    │ Not Equal │
│ 2   │ info        │ JuliaVer                  │         │ 1.4.1                                    │ 1.6.0-DEV.60                             │ Not Equal │
│ 3   │ info        │ OS                        │         │ Linux (x86_64-pc-linux-gnu)              │ macOS (x86_64-apple-darwin19.4.0)        │ Not Equal │
...
| 24  │ diskio      │ DiskWrite1KB              │ ms      │ 0.031895                                 │ 0.123459                                 │ 3.87079   │
│ 25  │ diskio      │ DiskWrite1MB              │ ms      │ 0.920052                                 │ 0.394346                                 │ 0.428612  │
│ 26  │ diskio      │ DiskRead1KB               │ ms      │ 0.006683666666666667                     │ 0.0687755                                │ 10.2901   │
│ 27  │ diskio      │ DiskRead1MB               │ ms      │ 0.143755                                 │ 2.02386                                  │ 14.0785   │

Current master (nightly binary):
macos_master.txt

│ Row │ cat         │ testname                  │ units   │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String                    │ String? │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────────────┼─────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer               │         │ 0.3.0                                    │ 0.3.2                                    │ Not Equal │
│ 2   │ info        │ JuliaVer                  │         │ 1.4.1                                    │ 1.6.0-DEV.58                             │ Not Equal │
│ 3   │ info        │ OS                        │         │ Linux (x86_64-pc-linux-gnu)              │ macOS (x86_64-apple-darwin19.4.0)        │ Not Equal │
...
│ 24  │ diskio      │ DiskWrite1KB              │ ms      │ 0.031895                                 │ 0.125512                                 │ 3.93516   │
│ 25  │ diskio      │ DiskWrite1MB              │ ms      │ 0.920052                                 │ 0.490967                                 │ 0.53363   │
│ 26  │ diskio      │ DiskRead1KB               │ ms      │ 0.006683666666666667                     │ 0.0232165                                │ 3.47362   │
│ 27  │ diskio      │ DiskRead1MB               │ ms      │ 0.143755                                 │ 0.97102                                  │ 6.75469   │

Comparison of the two (ref = macos with master, test = macos with #35925). Mostly seems like benchmark noise apart from DiskRead tests:

│ Row │ cat         │ testname                  │ units   │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String                    │ String? │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────────────┼─────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer               │ missing │ 0.3.2                                    │ 0.3.2                                    │ Equal     │
│ 2   │ info        │ JuliaVer                  │ missing │ 1.6.0-DEV.58                             │ 1.6.0-DEV.60                             │ Not Equal │
│ 3   │ info        │ OS                        │ missing │ macOS (x86_64-apple-darwin19.4.0)        │ macOS (x86_64-apple-darwin19.3.0)        │ Not Equal │
│ 4   │ info        │ CPU                       │ missing │ Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz │ Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz │ Equal     │
│ 5   │ info        │ WORD_SIZE                 │ missing │ 64                                       │ 64                                       │ Equal     │
│ 6   │ info        │ LIBM                      │ missing │ libopenlibm                              │ libopenlibm                              │ Equal     │
│ 7   │ info        │ LLVM                      │ missing │ libLLVM-9.0.1 (ORCJIT, skylake)          │ libLLVM-9.0.1 (ORCJIT, skylake)          │ Equal     │
│ 8   │ info        │ GPU                       │ missing │ missing                                  │ missing                                  │ Not Equal │
│ 9   │ cpu         │ FloatMul                  │ ms      │ 1.71e-6                                  │ 1.716e-6                                 │ 1.00351   │
│ 10  │ cpu         │ FusedMulAdd               │ ms      │ 1.714e-6                                 │ 2.046e-6                                 │ 1.1937    │
│ 11  │ cpu         │ FloatSin                  │ ms      │ 5.371000000000001e-6                     │ 5.21e-6                                  │ 0.970024  │
│ 12  │ cpu         │ VecMulBroad               │ ms      │ 4.007758620689656e-5                     │ 4.53242147922999e-5                      │ 1.13091   │
│ 13  │ cpu         │ CPUMatMul                 │ ms      │ 0.045556                                 │ 0.043592                                 │ 0.956888  │
│ 14  │ cpu         │ MatMulBroad               │ ms      │ 0.0185805625                             │ 0.019206285714285714                     │ 1.03368   │
│ 15  │ cpu         │ 3DMulBroad                │ ms      │ 0.0020743000000000003                    │ 0.0017695                                │ 0.853059  │
│ 16  │ cpu         │ peakflops                 │ flops   │ 2.2755524810313144e11                    │ 2.0147029241838956e11                    │ 0.885369  │
│ 17  │ cpu         │ FFMPEGH264Write           │ ms      │ 142.476753                               │ 142.957275                               │ 1.00337   │
│ 18  │ mem         │ DeepCopy                  │ ms      │ 0.00019632103825136614                   │ 0.00020576475037821485                   │ 1.0481    │
│ 19  │ mem         │ Bandwidth10kB             │ MiB/s   │ 109367.65496137166                       │ 112044.56483827147                       │ 1.02448   │
│ 20  │ mem         │ Bandwidth100kB            │ MiB/s   │ 41919.7501717033                         │ 42626.125973550705                       │ 1.01685   │
│ 21  │ mem         │ Bandwidth1MB              │ MiB/s   │ 23320.07131449444                        │ 22672.522559167202                       │ 0.972232  │
│ 22  │ mem         │ Bandwidth10MB             │ MiB/s   │ 10291.390226598516                       │ 10298.525062970422                       │ 1.00069   │
│ 23  │ mem         │ Bandwidth100MB            │ MiB/s   │ 7989.830485750289                        │ 7947.260803726205                        │ 0.994672  │
│ 24  │ diskio      │ DiskWrite1KB              │ ms      │ 0.125512                                 │ 0.1218915                                │ 0.971154  │
│ 25  │ diskio      │ DiskWrite1MB              │ ms      │ 0.490967                                 │ 0.419084                                 │ 0.853589  │
│ 26  │ diskio      │ DiskRead1KB               │ ms      │ 0.0232165                                │ 0.0680975                                │ 2.93315   │ <<
│ 27  │ diskio      │ DiskRead1MB               │ ms      │ 0.97102                                  │ 2.022493                                 │ 2.08285   │ <<
│ 28  │ loading     │ JuliaLoad                 │ ms      │ 169.502149                               │ 161.599899                               │ 0.95338   │
│ 29  │ compilation │ compilecache              │ ms      │ 316.5481865                              │ 310.810209                               │ 0.981873  │
│ 30  │ compilation │ success_create_expr_cache │ ms      │ 314.877099                               │ 316.0820275                              │ 1.00383   │
│ 31  │ compilation │ create_expr_cache         │ ms      │ 9.0505545                                │ 8.917059                                 │ 0.98525   │
│ 32  │ compilation │ output-ji-substart        │ ms      │ 52.009300499999995                       │ 59.39701100000002                        │ 1.14205   │

It could always be that the benchmark designs aren't capturing the issue accurately?

IanButterworth · 2020-05-18T20:19:51Z

For a non-julia platform-level test, I ran a 1KB dd test with time sh -c "dd if=/dev/zero of=/tmp/testfile bs=1k count=1k && sync" on MacOS and an Ubuntu VM on the same machine.

The results seem more comparable.
MacOS x1.4 slower, compared to DiskRead1KB's x3.5-10 slower in julia

MacOS:
1048576 bytes transferred in 0.004472 secs (234,474,943 bytes/sec)
1048576 bytes transferred in 0.003677 secs (285,161,545 bytes/sec)
1048576 bytes transferred in 0.004267 secs (245,742,108 bytes/sec)

Ubuntu VM:
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00316243 s, 332 MB/s
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0033453 s, 313 MB/s
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00335024 s, 313 MB/s

Same, but 1MB time sh -c "dd if=/dev/zero of=/tmp/testfile bs=1000k count=1k && sync"
MacOS x1.2 slower, compared to DiskRead1MB's x6-14 slower in julia

MacOS:
1048576000 bytes transferred in 0.547662 secs (1,914,640,713 bytes/sec)
1048576000 bytes transferred in 0.579230 secs (1,810,292,756 bytes/sec)
1048576000 bytes transferred in 0.601566 secs (1,743,077,014 bytes/sec)

Ubuntu VM
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.457688 s, 2.3 GB/s
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.459572 s, 2.3 GB/s
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.459511 s, 2.3 GB/s

Also, if it's helpful, I ran #35925 on the reference linux machine, and compared it to its own 1.4.1 results. DiskRead1KB improves, DiskRead1MB looks like it's regressed.

│ Row │ cat         │ testname                  │ units   │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String                    │ String? │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────────────┼─────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer               │         │ 0.3.0                                    │ 0.3.2                                    │ Not Equal │
│ 2   │ info        │ JuliaVer                  │         │ 1.4.1                                    │ 1.6.0-DEV.60                             │ Not Equal │
│ 3   │ info        │ OS                        │         │ Linux (x86_64-pc-linux-gnu)              │ Linux (x86_64-linux-gnu)                 │ Not Equal │
│ 4   │ info        │ CPU                       │         │ Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz │ Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz │ Equal     │
│ 5   │ info        │ WORD_SIZE                 │         │ 64                                       │ 64                                       │ Equal     │
│ 6   │ info        │ LIBM                      │         │ libopenlibm                              │ libopenlibm                              │ Equal     │
│ 7   │ info        │ LLVM                      │         │ libLLVM-8.0.1 (ORCJIT, skylake)          │ libLLVM-9.0.1 (ORCJIT, skylake)          │ Not Equal │
│ 8   │ info        │ GPU                       │         │ GeForce GTX 1650 with Max-Q Design       │ missing                                  │ Not Equal │
│ 9   │ cpu         │ FloatMul                  │ ms      │ 1.1339999999999999e-6                    │ 1.159e-6                                 │ 1.02205   │
│ 10  │ cpu         │ FusedMulAdd               │ ms      │ 1.1339999999999999e-6                    │ 1.159e-6                                 │ 1.02205   │
│ 11  │ cpu         │ FloatSin                  │ ms      │ 4.051e-6                                 │ 3.907e-6                                 │ 0.964453  │
│ 12  │ cpu         │ VecMulBroad               │ ms      │ 2.9823115577889445e-5                    │ 3.01367e-5                               │ 1.01051   │
│ 13  │ cpu         │ CPUMatMul                 │ ms      │ 0.018683                                 │ 0.0258885                                │ 1.38567   │
│ 14  │ cpu         │ MatMulBroad               │ ms      │ 0.0042223                                │ 0.00440328                               │ 1.04286   │
│ 15  │ cpu         │ 3DMulBroad                │ ms      │ 0.0010529                                │ 0.0011584                                │ 1.1002    │
│ 16  │ cpu         │ peakflops                 │ flops   │ 1.8061545733047305e11                    │ 1.79274e11                               │ 0.99257   │
│ 17  │ cpu         │ FFMPEGH264Write           │ ms      │ 107.751337                               │ 108.919                                  │ 1.01084   │
│ 18  │ mem         │ DeepCopy                  │ ms      │ 0.00018790257558790594                   │ 0.000181501                              │ 0.965929  │
│ 19  │ mem         │ Bandwidth10kB             │ MiB/s   │ 104871.77985698299                       │ 118339.0                                 │ 1.12841   │
│ 20  │ mem         │ Bandwidth100kB            │ MiB/s   │ 58579.50346475738                        │ 55572.2                                  │ 0.948663  │
│ 21  │ mem         │ Bandwidth1MB              │ MiB/s   │ 33439.963407070725                       │ 27033.1                                  │ 0.808408  │
│ 22  │ mem         │ Bandwidth10MB             │ MiB/s   │ 5551.476774539766                        │ 11678.0                                  │ 2.10358   │
│ 23  │ mem         │ Bandwidth100MB            │ MiB/s   │ 6910.9558841033295                       │ 5181.83                                  │ 0.749799  │
│ 24  │ diskio      │ DiskWrite1KB              │ ms      │ 0.031895                                 │ 0.033352                                 │ 1.04568   │ <<
│ 25  │ diskio      │ DiskWrite1MB              │ ms      │ 0.920052                                 │ 0.798928                                 │ 0.868351  │ <<
│ 26  │ diskio      │ DiskRead1KB               │ ms      │ 0.006683666666666667                     │ 0.00457438                               │ 0.684411  │ <<
│ 27  │ diskio      │ DiskRead1MB               │ ms      │ 0.143755                                 │ 0.411939                                 │ 2.86556   │ <<
│ 28  │ loading     │ JuliaLoad                 │ ms      │ 91.08045                                 │ 76.1643                                  │ 0.836231  │
│ 29  │ compilation │ compilecache              │ ms      │ 208.532099                               │ 210.482                                  │ 1.00935   │
│ 30  │ compilation │ success_create_expr_cache │ ms      │ 235.4855485                              │ 245.599                                  │ 1.04295   │
│ 31  │ compilation │ create_expr_cache         │ ms      │ 0.894126                                 │ 1.05864                                  │ 1.18399   │
│ 32  │ compilation │ output-ji-substart        │ ms      │ 32.639706000000004                       │ 46.8342                                  │ 1.43488   │

JeffBezanson · 2020-05-18T22:06:28Z

Thanks for checking that. I think I found my mistake.

JeffBezanson · 2020-05-18T22:24:03Z

Ok I've updated #35925; should be better now. But I'm a bit out of ideas; we're just doing open, read, and close. I don't know what would be several times slower.

IanButterworth · 2020-05-18T23:14:51Z

Ok. Tried again on macOS.

With the update to #35925 (relative to 1.4.1 linux reference, table below)

DiskRead1KB stayed at 10x slower (3-8x on 1.4.1)
DiskRead1MB down from 14x to 6x slower (6-8x on 1.4.1)

This is all puzzling..

The benchmark is just:

function tempwrite(x; delete::Bool=false, path = joinpath(@__DIR__, "testwrite.dat"))
    open(path, "w") do io
        write(io, x)
    end
    delete && rm(path)
    return path
end
function tempread(path)
    x = open(path) do io
        read(io)
    end
    return x
end
t = @benchmark tempwrite(x) setup=(x=rand(UInt8, 1000))
t = @benchmark tempwrite(x) setup=(x=rand(UInt8, 1000000))
t = @benchmark tempread(path) setup=(path = tempwrite(rand(UInt8, 1000), delete=false))
t = @benchmark tempread(path) setup=(path = tempwrite(rand(UInt8, 1000000), delete=false))

macos_fasterread2.txt

│ Row │ cat         │ testname                  │ units   │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String                    │ String? │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────────────┼─────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer               │         │ 0.3.0                                    │ 0.3.2                                    │ Not Equal │
│ 2   │ info        │ JuliaVer                  │         │ 1.4.1                                    │ 1.6.0-DEV.60                             │ Not Equal │
│ 3   │ info        │ OS                        │         │ Linux (x86_64-pc-linux-gnu)              │ macOS (x86_64-apple-darwin19.3.0)        │ Not Equal │
│ 4   │ info        │ CPU                       │         │ Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz │ Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz │ Not Equal │
│ 5   │ info        │ WORD_SIZE                 │         │ 64                                       │ 64                                       │ Equal     │
│ 6   │ info        │ LIBM                      │         │ libopenlibm                              │ libopenlibm                              │ Equal     │
│ 7   │ info        │ LLVM                      │         │ libLLVM-8.0.1 (ORCJIT, skylake)          │ libLLVM-9.0.1 (ORCJIT, skylake)          │ Not Equal │
│ 8   │ info        │ GPU                       │         │ GeForce GTX 1650 with Max-Q Design       │ missing                                  │ Not Equal │
│ 9   │ cpu         │ FloatMul                  │ ms      │ 1.1339999999999999e-6                    │ 1.713e-6                                 │ 1.51058   │
│ 10  │ cpu         │ FusedMulAdd               │ ms      │ 1.1339999999999999e-6                    │ 1.711e-6                                 │ 1.50882   │
│ 11  │ cpu         │ FloatSin                  │ ms      │ 4.051e-6                                 │ 5.058e-6                                 │ 1.24858   │
│ 12  │ cpu         │ VecMulBroad               │ ms      │ 2.9823115577889445e-5                    │ 4.50622e-5                               │ 1.51098   │
│ 13  │ cpu         │ CPUMatMul                 │ ms      │ 0.018683                                 │ 0.045178                                 │ 2.41813   │
│ 14  │ cpu         │ MatMulBroad               │ ms      │ 0.0042223                                │ 0.018787                                 │ 4.44947   │
│ 15  │ cpu         │ 3DMulBroad                │ ms      │ 0.0010529                                │ 0.0018309                                │ 1.73891   │
│ 16  │ cpu         │ peakflops                 │ flops   │ 1.8061545733047305e11                    │ 2.45663e11                               │ 1.36014   │
│ 17  │ cpu         │ FFMPEGH264Write           │ ms      │ 107.751337                               │ 143.785                                  │ 1.33442   │
│ 18  │ mem         │ DeepCopy                  │ ms      │ 0.00018790257558790594                   │ 0.00019617                               │ 1.044     │
│ 19  │ mem         │ Bandwidth10kB             │ MiB/s   │ 104871.77985698299                       │ 109491.0                                 │ 1.04404   │
│ 20  │ mem         │ Bandwidth100kB            │ MiB/s   │ 58579.50346475738                        │ 42673.8                                  │ 0.728477  │
│ 21  │ mem         │ Bandwidth1MB              │ MiB/s   │ 33439.963407070725                       │ 22815.2                                  │ 0.682273  │
│ 22  │ mem         │ Bandwidth10MB             │ MiB/s   │ 5551.476774539766                        │ 10334.9                                  │ 1.86165   │
│ 23  │ mem         │ Bandwidth100MB            │ MiB/s   │ 6910.9558841033295                       │ 8071.05                                  │ 1.16786   │
│ 24  │ diskio      │ DiskWrite1KB              │ ms      │ 0.031895                                 │ 0.124188                                 │ 3.89367   │ <<
│ 25  │ diskio      │ DiskWrite1MB              │ ms      │ 0.920052                                 │ 0.408841                                 │ 0.444367  │ <<
│ 26  │ diskio      │ DiskRead1KB               │ ms      │ 0.006683666666666667                     │ 0.0686085                                │ 10.2651   │ <<
│ 27  │ diskio      │ DiskRead1MB               │ ms      │ 0.143755                                 │ 0.862652                                 │ 6.00085   │ <<
│ 28  │ loading     │ JuliaLoad                 │ ms      │ 91.08045                                 │ 156.032                                  │ 1.71313   │
│ 29  │ compilation │ compilecache              │ ms      │ 208.532099                               │ 311.432                                  │ 1.49345   │
│ 30  │ compilation │ success_create_expr_cache │ ms      │ 235.4855485                              │ 310.809                                  │ 1.31986   │
│ 31  │ compilation │ create_expr_cache         │ ms      │ 0.894126                                 │ 8.86578                                  │ 9.91558   │
│ 32  │ compilation │ output-ji-substart        │ ms      │ 32.639706000000004                       │ 55.4831                                  │ 1.69987   │

ViralBShah · 2020-05-23T17:54:05Z

I vaguely remember an old issue regarding buffer sizes that we use for read, and that larger buffer sizes helped make things faster. Is that somehow relevant here? I can't find the issue. Of course, that may not help with 1KB read tests.

JeffBezanson · 2020-05-23T19:02:25Z

In the 1MB case, we do two read calls: one to initially fill the 32kb buffer, then we copy that to the output, and do one more read call to read the entire rest of the file into the output at once. So we're not even really using buffering here.

IanButterworth · 2020-05-23T20:51:51Z

Two benchmarking details:

I realised I was using read(f) instead of read(f, UInt8). Both are presented below, read(f, UInt8) is faster, as to be expected I guess..
In SystemBenchmark, the setup phase of the benchmark is currently re-writing the test file immediately before each sample test. Perhaps MacOS is slower to release the file after closing, and the time includes waiting? Is that plausible?

Also, I thought more granularity might help. The test below shows MacOS is slower than an ubuntu VM on the same machine, across all file sizes. Though the degree slower is less than SystemBenchmark reported. Note that here, the test file written once per file size, not rewritten before each sample, like SystemBenchmark does.

using  BenchmarkTools
for zeros in 3:6
    for dig in 1:9
        bytes = parse(Int,"$(dig)$(repeat("0",zeros))")
        open("dummy.dat", "w") do io
           write(io, rand(UInt8,bytes))
        end;
        t = @benchmark open(f->read(f, UInt8), "dummy.dat")
        time_s = minimum(t).time / 1e9
        MiB_s = (bytes / time_s)  / (1024 * 1024)
        @info "File size: $(round(Int,bytes/1000)) KB. Read speed: $(round(MiB_s,digits=1)) MiB/s"
    end
end

read(f,UInt8) speed in MiB/s:

| File Size | MacOS    | Ubuntu VM on Mac | x faster |
|-----------|----------|------------------|----------|
| 1KB       |     99.9 |              349 |      3.5 |
| 2KB       |    201.4 |            690.3 |      3.4 |
| 3KB       |    255.9 |             1016 |      4.0 |
| 4KB       |    396.2 |           1306.5 |      3.3 |
| 5KB       |    483.7 |           1598.1 |      3.3 |
| 6KB       |    563.4 |           1876.9 |      3.3 |
| 7KB       |    670.1 |             2112 |      3.2 |
| 8KB       |    775.2 |           2375.2 |      3.1 |
| 9KB       |    855.8 |           2536.7 |      3.0 |
| 10KB      |      943 |           2476.9 |      2.6 |
| 20KB      |     1816 |           4289.9 |      2.4 |
| 30KB      |   2536.4 |           6190.9 |      2.4 |
| 40KB      |   3302.8 |           7591.9 |      2.3 |
| 50KB      |   3981.6 |           8316.7 |      2.1 |
| 60KB      |   4704.5 |           9331.8 |      2.0 |
| 70KB      |   5091.7 |           9511.3 |      1.9 |
| 80KB      |   5723.5 |          10168.5 |      1.8 |
| 90KB      |   6220.1 |          10744.1 |      1.7 |
| 100KB     |   6687.3 |          10977.3 |      1.6 |
| 200KB     |  11889.7 |          17958.3 |      1.5 |
| 300KB     |  17490.1 |          26661.3 |      1.5 |
| 400KB     |  24143.7 |          35190.9 |      1.5 |
| 500KB     |  29709.5 |          37537.4 |      1.3 |
| 600KB     |    35424 |          52864.4 |      1.5 |
| 700KB     |  42585.6 |          61875.2 |      1.5 |
| 800KB     |  48342.4 |          70388.4 |      1.5 |
| 900KB     |  55228.5 |          78801.6 |      1.4 |
| 1000KB    |  61317.7 |            87872 |      1.4 |
| 2000KB    | 118058.2 |         175727.7 |      1.5 |
| 3000KB    |   182254 |           263932 |      1.4 |
| 4000KB    | 235809.9 |         351617.4 |      1.5 |
| 5000KB    | 299370.4 |         441679.5 |      1.5 |
| 6000KB    | 363604.6 |         544231.1 |      1.5 |
| 7000KB    | 425774.6 |           615047 |      1.4 |
| 8000KB    | 483945.1 |         715300.4 |      1.5 |
| 9000KB    | 524477.2 |         765320.5 |      1.5 |

read(f) speed in MiB/s:

| File Size | MacOS  | Ubuntu VM on Mac | x faster |
|-----------|--------|------------------|----------|
| 1KB       |   78.7 |            199.3 |      2.5 |
| 2KB       |  157.3 |            380.7 |      2.4 |
| 3KB       |  230.7 |              538 |      2.3 |
| 4KB       |  303.7 |            703.8 |      2.3 |
| 5KB       |  373.2 |            850.6 |      2.3 |
| 6KB       |  438.9 |            996.1 |      2.3 |
| 7KB       |  509.1 |             1103 |      2.2 |
| 8KB       |  577.4 |           1180.4 |      2.0 |
| 9KB       |  641.1 |           1336.9 |      2.1 |
| 10KB      |  691.8 |           1303.9 |      1.9 |
| 20KB      | 1298.9 |           2137.1 |      1.6 |
| 30KB      | 1796.9 |           2728.4 |      1.5 |
| 40KB      | 1683.1 |           3347.7 |      2.0 |
| 50KB      | 1954.8 |           3910.1 |      2.0 |
| 60KB      |   2136 |           4479.1 |      2.1 |
| 70KB      |   2394 |           4605.9 |      1.9 |
| 80KB      | 2717.7 |           4203.1 |      1.5 |
| 90KB      | 2834.8 |           5056.6 |      1.8 |
| 100KB     | 3035.1 |           5247.8 |      1.7 |
| 200KB     |   4888 |           9630.2 |      2.0 |
| 300KB     | 5834.2 |           9311.1 |      1.6 |
| 400KB     | 6435.8 |          10906.6 |      1.7 |
| 500KB     | 6765.2 |          11350.6 |      1.7 |
| 600KB     | 7275.9 |          11575.1 |      1.6 |
| 700KB     | 7468.4 |          12004.5 |      1.6 |
| 800KB     | 7607.6 |          10325.9 |      1.4 |
| 900KB     | 7643.6 |          10249.3 |      1.3 |
| 1000KB    | 7845.5 |          10212.3 |      1.3 |
| 2000KB    | 7868.7 |           9680.5 |      1.2 |
| 3000KB    | 7596.8 |          10611.7 |      1.4 |
| 4000KB    | 6981.3 |           8572.7 |      1.2 |
| 5000KB    | 6477.1 |           8159.5 |      1.3 |
| 6000KB    | 6155.5 |           7901.6 |      1.3 |
| 7000KB    | 5839.3 |           7129.2 |      1.2 |
| 8000KB    | 5853.2 |           6915.7 |      1.2 |
| 9000KB    | 5716.1 |             6730 |      1.2 |

ViralBShah · 2020-05-26T01:06:55Z

I read somewhere that SIP on mac could perhaps slow things down. Maybe try disabling and seeing if it makes a difference for the benchmark?

https://apple.stackexchange.com/questions/208478/how-do-i-disable-system-integrity-protection-sip-aka-rootless-on-macos-os-x

IanButterworth · 2020-05-26T06:03:43Z

The tests above were actually with SIP disabled. I re-tried with SIP enabled and it is slower on the read(f,UInt8) tests.

read(f,UInt8) on MacOS with SIP enabled:

[ Info: File size: 1 KB. Read speed: 74.2 MiB/s
[ Info: File size: 2 KB. Read speed: 148.7 MiB/s
[ Info: File size: 3 KB. Read speed: 190.5 MiB/s
[ Info: File size: 4 KB. Read speed: 251.6 MiB/s
[ Info: File size: 5 KB. Read speed: 360.8 MiB/s
[ Info: File size: 6 KB. Read speed: 445.4 MiB/s
[ Info: File size: 7 KB. Read speed: 515.1 MiB/s
[ Info: File size: 8 KB. Read speed: 560.5 MiB/s
[ Info: File size: 9 KB. Read speed: 625.7 MiB/s
[ Info: File size: 10 KB. Read speed: 696.7 MiB/s
[ Info: File size: 20 KB. Read speed: 1373.9 MiB/s
[ Info: File size: 30 KB. Read speed: 1958.0 MiB/s
[ Info: File size: 40 KB. Read speed: 2526.3 MiB/s
[ Info: File size: 50 KB. Read speed: 3106.6 MiB/s
[ Info: File size: 60 KB. Read speed: 3648.8 MiB/s
[ Info: File size: 70 KB. Read speed: 4115.5 MiB/s
[ Info: File size: 80 KB. Read speed: 4517.4 MiB/s
[ Info: File size: 90 KB. Read speed: 5033.2 MiB/s
[ Info: File size: 100 KB. Read speed: 5379.2 MiB/s
[ Info: File size: 200 KB. Read speed: 9803.9 MiB/s
[ Info: File size: 300 KB. Read speed: 14824.7 MiB/s
[ Info: File size: 400 KB. Read speed: 19496.6 MiB/s
[ Info: File size: 500 KB. Read speed: 24720.7 MiB/s
[ Info: File size: 600 KB. Read speed: 24793.3 MiB/s
[ Info: File size: 700 KB. Read speed: 34898.4 MiB/s
[ Info: File size: 800 KB. Read speed: 41202.1 MiB/s
[ Info: File size: 900 KB. Read speed: 46207.6 MiB/s
[ Info: File size: 1000 KB. Read speed: 49461.9 MiB/s
[ Info: File size: 2000 KB. Read speed: 101741.5 MiB/s
[ Info: File size: 3000 KB. Read speed: 147894.7 MiB/s
[ Info: File size: 4000 KB. Read speed: 202371.2 MiB/s
[ Info: File size: 5000 KB. Read speed: 249183.3 MiB/s
[ Info: File size: 6000 KB. Read speed: 306335.8 MiB/s
[ Info: File size: 7000 KB. Read speed: 348874.8 MiB/s
[ Info: File size: 8000 KB. Read speed: 395305.4 MiB/s
[ Info: File size: 9000 KB. Read speed: 454202.7 MiB/s

IanButterworth · 2020-11-02T03:43:51Z

I just checked and this issue remains on master

UInt8 reading

using BenchmarkTools
path, _ = mktemp()
for zeros in 3:8
    bytes = 10^zeros
    open(path, "w") do io
        write(io, rand(UInt8,bytes))
    end
    t = @benchmark open(path) do io
        read(io, UInt8)
    end
    time_s = minimum(t).time / 1e9
    speed = Base.format_bytes(bytes / time_s)
    @info "File size: $(Base.format_bytes(bytes)). \tbtime (min): $(round(time_s * 1e6, digits=2))us. \tRead speed: $(speed)/s"
end

MacOS with SIP enabled

[ Info: File size: 1000 bytes. 	btime (min): 15.27us. 	Read speed: 62.458 MiB/s
[ Info: File size: 9.766 KiB. 	btime (min): 16.06us. 	Read speed: 593.968 MiB/s
[ Info: File size: 97.656 KiB. 	btime (min): 17.05us. 	Read speed: 5.463 GiB/s
[ Info: File size: 976.562 KiB. btime (min): 17.07us. 	Read speed: 54.559 GiB/s
[ Info: File size: 9.537 MiB. 	btime (min): 16.99us. 	Read speed: 548.224 GiB/s
[ Info: File size: 95.367 MiB. 	btime (min): 17.04us. 	Read speed: 5.336 TiB/s

MacOS with SIP disabled

[ Info: File size: 1000 bytes. 	btime (min): 12.91us. 	Read speed: 73.865 MiB/s
[ Info: File size: 9.766 KiB. 	btime (min): 13.51us. 	Read speed: 705.694 MiB/s
[ Info: File size: 97.656 KiB. 	btime (min): 15.49us. 	Read speed: 6.011 GiB/s
[ Info: File size: 976.562 KiB. btime (min): 14.71us. 	Read speed: 63.304 GiB/s
[ Info: File size: 9.537 MiB. 	btime (min): 14.24us. 	Read speed: 653.835 GiB/s
[ Info: File size: 95.367 MiB. 	btime (min): 14.91us. 	Read speed: 6.099 TiB/s

Ubuntu VM on same machine

[ Info: File size: 1000 bytes. 	btime (min): 3.31us. 	Read speed: 288.391 MiB/s
[ Info: File size: 9.766 KiB. 	btime (min): 3.89us. 	Read speed: 2.395 GiB/s
[ Info: File size: 97.656 KiB. 	btime (min): 5.16us. 	Read speed: 18.033 GiB/s
[ Info: File size: 976.562 KiB. btime (min): 5.19us. 	Read speed: 179.359 GiB/s
[ Info: File size: 9.537 MiB. 	btime (min): 5.21us. 	Read speed: 1.744 TiB/s
[ Info: File size: 95.367 MiB. 	btime (min): 5.12us. 	Read speed: 17.781 TiB/s

IanButterworth · 2020-11-02T15:06:19Z

Seems to be an overhead with file opening/closing
If you just do the file open/close:

t = @benchmark open(path) do io
end

MacOS, no SIP

[ Info: File size: 1000 bytes. 	btime (min): 11.98us.
[ Info: File size: 9.766 KiB. 	btime (min): 12.32us.
[ Info: File size: 97.656 KiB. 	btime (min): 11.95us.
[ Info: File size: 976.562 KiB. btime (min): 11.71us.
[ Info: File size: 9.537 MiB. 	btime (min): 11.51us.
[ Info: File size: 95.367 MiB. 	btime (min): 11.73us.

Ubuntu VM

[ Info: File size: 1000 bytes. 	btime (min): 2.53us.
[ Info: File size: 9.766 KiB. 	btime (min): 2.42us.
[ Info: File size: 97.656 KiB. 	btime (min): 2.41us.
[ Info: File size: 976.562 KiB. btime (min): 2.47us.
[ Info: File size: 9.537 MiB. 	btime (min): 2.47us.
[ Info: File size: 95.367 MiB. 	btime (min): 2.44us.

IanButterworth · 2020-11-02T15:16:19Z

Breaking it down into open / close

for i in 1:5
    @time io = open(path)
    @time close(io)
    println("--")
end

MacOS, no SIP

  0.001304 seconds (29 allocations: 2.141 KiB, 95.06% compilation time)
  0.000011 seconds
--
  0.000051 seconds (14 allocations: 1008 bytes)
  0.000005 seconds
--
  0.000030 seconds (14 allocations: 1008 bytes)
  0.000005 seconds
--
  0.000030 seconds (14 allocations: 1008 bytes)
  0.000005 seconds
--
  0.000025 seconds (14 allocations: 1008 bytes)
  0.000006 seconds
--

Ubuntu VM

  0.001820 seconds (27 allocations: 1.891 KiB, 98.12% compilation time)
  0.000006 seconds
--
  0.000007 seconds (12 allocations: 752 bytes)
  0.000002 seconds
--
  0.000005 seconds (12 allocations: 752 bytes)
  0.000002 seconds
--
  0.000005 seconds (12 allocations: 752 bytes)
  0.000001 seconds
--
  0.000004 seconds (12 allocations: 752 bytes)
  0.000002 seconds
--

StefanKarpinski · 2020-11-02T15:58:42Z

I'm marking this as a 1.6 release blocker.

KristofferC · 2020-11-02T16:02:35Z

Maybe I missed something but I thought this wasn't a regression. Why is it release-blocking suddenly?

IanButterworth · 2020-11-02T16:04:48Z

(I 👍-ed because it would be good to figure this out, but as far as I know it isn't a regression)

StefanKarpinski · 2020-11-02T16:22:19Z

My impression was that this was a regression. If it's not, we can make it non-blocking.

IanButterworth · 2020-11-02T16:29:45Z

If it's a regression, it may be pre 1.0.

MacOS official binaries:

1.0.4

  0.000027 seconds (4 allocations: 480 bytes)
  0.000005 seconds
--
  0.000020 seconds (4 allocations: 480 bytes)
  0.000005 seconds
--
  0.000028 seconds (4 allocations: 480 bytes)
  0.000006 seconds

1.1.1

  0.000025 seconds (12 allocations: 928 bytes)
  0.000006 seconds
--
  0.000021 seconds (12 allocations: 928 bytes)
  0.000004 seconds
--
  0.000019 seconds (12 allocations: 928 bytes)
  0.000004 seconds

1.2.0

  0.000032 seconds (11 allocations: 896 bytes)
  0.000006 seconds
--
  0.000026 seconds (11 allocations: 896 bytes)
  0.000005 seconds
--
  0.000030 seconds (11 allocations: 896 bytes)
  0.000003 seconds

1.3.1

  0.000025 seconds (16 allocations: 1024 bytes)
  0.000004 seconds
--
  0.000027 seconds (16 allocations: 1024 bytes)
  0.000004 seconds
--
  0.000023 seconds (16 allocations: 1024 bytes)
  0.000003 seconds

1.4.2

  0.000020 seconds (15 allocations: 1008 bytes)
  0.000006 seconds
--
  0.000032 seconds (15 allocations: 1008 bytes)
  0.000005 seconds
--
  0.000026 seconds (15 allocations: 1008 bytes)
  0.000005 seconds

1.5.2

  0.000024 seconds (14 allocations: 1008 bytes)
  0.000004 seconds
--
  0.000023 seconds (14 allocations: 1008 bytes)
  0.000002 seconds
--
  0.000019 seconds (14 allocations: 1008 bytes)
  0.000002 seconds

1.6.0-DEV.1221

  0.000019 seconds (14 allocations: 1008 bytes)
  0.000003 seconds
--
  0.000019 seconds (14 allocations: 1008 bytes)
  0.000003 seconds
--
  0.000029 seconds (14 allocations: 1008 bytes)
  0.000003 seconds

KristofferC · 2020-11-02T16:31:20Z

I took it off since I don't think it should block the release if it has been the same since 1.0.

IanButterworth · 2020-11-02T19:18:42Z

To rule out my VM setup doing something weird with filesystem caching, I ran this on a native ubuntu machine and the speed seems real (but the hardware is a little lower spec than my mac)

  0.000008 seconds (12 allocations: 752 bytes)
  0.000002 seconds
--
  0.000008 seconds (12 allocations: 752 bytes)
  0.000002 seconds
--
  0.000007 seconds (12 allocations: 752 bytes)
  0.000002 seconds

IanButterworth · 2020-11-02T20:27:41Z

@code_native is identical on the two

julia> @code_native open(path)
	.text
; ┌ @ iostream.jl:275 within `open'
	pushq	%rax
	movq	%rsi, (%rsp)
	movq	(%rsi), %rsi
; └
; ┌ @ iostream.jl:282 within `open'
	movabsq	$"#open#577", %rax
	movl	$1, %edi
	callq	*%rax
	popq	%rcx
	retq
; └

and the src handling from there onwards doesn't seem to treat MacOS and Linux differently, they both fall within this else

julia/src/support/ios.c

Line 952 in d33e7e0

#else

And lower down, I validated that O_CLOEXEC is defined and supported in MacOS for me

julia/src/support/ios.c

Line 910 in d33e7e0

#ifdef O_CLOEXEC

So code-wise, it seems identical. Could it be compiler settings? I tried with both official binaries and a local build, and they perform the same.

Chances of it being just a slower filesystem are increasing..

IanButterworth · 2020-11-03T02:46:28Z

I made a raw c test that tries to mimic julia's open_cloexec for opening an existent file as read only.

Oddly, both times are proportionally slower than the timing above, but MacOS is still 3-4x slower

#include <stdio.h>
#include <fcntl.h>
#include <time.h>
int main()
{
    struct timespec tstart={0,0}, tend={0,0};
    clock_gettime(CLOCK_MONOTONIC, &tstart);
    int fd = open("/path/to/testfile.txt", O_RDONLY | O_CLOEXEC, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
    clock_gettime(CLOCK_MONOTONIC, &tend);
    printf("%.0f µs\n", (((double)tend.tv_sec + 1.0e-9*tend.tv_nsec) - ((double)tstart.tv_sec + 1.0e-9*tstart.tv_nsec)) * 1.0e6);

   return 0;
}

MacOS, no SIP

51 µs
46 µs
49 µs
30 µs
46 µs

Ubuntu VM

12 µs
12 µs
12 µs
13 µs
13 µs

IanButterworth · 2020-11-05T20:54:18Z

I posted the above example on SO to collect any thoughts https://stackoverflow.com/questions/64656255/why-is-the-c-function-open-4x-slower-on-macos-vs-an-ubuntu-vm-on-the-same-ma/64656675#64656675

All signs are pointing to MacOS just having a slower filesystem

StefanKarpinski · 2020-11-19T21:33:07Z

libc does a lot of locking on I/O so you either need to opt out of that (there's some #define) or use one of the non-locking lower level I/O APIs (the ones that don't start with f and not the *printf family).

IanButterworth · 2022-02-17T14:10:21Z

The slowness and Linux VMs being faster for disk IO on a Mac than native MacOS appears to be explained by this golang/go#28739 (comment)

Just FWIW, this is a problem with Apple NVMe drives (T2 and M1 Macs alike). Their cache flush performance is abysmal. This affects both macOS and (native) Linux (presumably existing VM solutions fail to forward this as F_FULLSYNC if they run faster). You get about 46 IOPS of F_FULLSYNC on macOS on an M1 machine, and about the same on native Linux with fsync().

JeffBezanson added the performance Must go faster label May 18, 2020

ViralBShah added system:mac Affects only macOS system:windows Affects only Windows io Involving the I/O subsystem: libuv, read, write, etc. labels May 23, 2020

vtjnash mentioned this issue Jul 13, 2020

Disallow reading directories as files #36605

Open

IanButterworth mentioned this issue Aug 5, 2020

Some package load times affected drastically by current working directory #36911

Closed

StefanKarpinski added this to the 1.6 features milestone Nov 2, 2020

KristofferC removed this from the 1.6 features milestone Nov 2, 2020

IanButterworth mentioned this issue May 27, 2021

Remove all the rule overload generation stuff JuliaDiff/ChainRulesCore.jl#356

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow disk IO on MacOS (potentially Windows too) #35907

Slow disk IO on MacOS (potentially Windows too) #35907

IanButterworth commented May 15, 2020

JeffBezanson commented May 18, 2020

IanButterworth commented May 18, 2020 •

edited

Loading

IanButterworth commented May 18, 2020

JeffBezanson commented May 18, 2020

JeffBezanson commented May 18, 2020

IanButterworth commented May 18, 2020

ViralBShah commented May 23, 2020 •

edited

Loading

JeffBezanson commented May 23, 2020

IanButterworth commented May 23, 2020 •

edited

Loading

ViralBShah commented May 26, 2020

IanButterworth commented May 26, 2020

IanButterworth commented Nov 2, 2020 •

edited

Loading

IanButterworth commented Nov 2, 2020

IanButterworth commented Nov 2, 2020

StefanKarpinski commented Nov 2, 2020

KristofferC commented Nov 2, 2020

IanButterworth commented Nov 2, 2020

StefanKarpinski commented Nov 2, 2020

IanButterworth commented Nov 2, 2020

KristofferC commented Nov 2, 2020

IanButterworth commented Nov 2, 2020 •

edited

Loading

IanButterworth commented Nov 2, 2020

IanButterworth commented Nov 3, 2020 •

edited

Loading

IanButterworth commented Nov 5, 2020

StefanKarpinski commented Nov 19, 2020

IanButterworth commented Feb 17, 2022 •

edited

Loading

Slow disk IO on MacOS (potentially Windows too) #35907

Slow disk IO on MacOS (potentially Windows too) #35907

Comments

IanButterworth commented May 15, 2020

JeffBezanson commented May 18, 2020

IanButterworth commented May 18, 2020 • edited Loading

IanButterworth commented May 18, 2020

JeffBezanson commented May 18, 2020

JeffBezanson commented May 18, 2020

IanButterworth commented May 18, 2020

ViralBShah commented May 23, 2020 • edited Loading

JeffBezanson commented May 23, 2020

IanButterworth commented May 23, 2020 • edited Loading

ViralBShah commented May 26, 2020

IanButterworth commented May 26, 2020

IanButterworth commented Nov 2, 2020 • edited Loading

UInt8 reading

IanButterworth commented Nov 2, 2020

IanButterworth commented Nov 2, 2020

StefanKarpinski commented Nov 2, 2020

KristofferC commented Nov 2, 2020

IanButterworth commented Nov 2, 2020

StefanKarpinski commented Nov 2, 2020

IanButterworth commented Nov 2, 2020

KristofferC commented Nov 2, 2020

IanButterworth commented Nov 2, 2020 • edited Loading

IanButterworth commented Nov 2, 2020

IanButterworth commented Nov 3, 2020 • edited Loading

IanButterworth commented Nov 5, 2020

StefanKarpinski commented Nov 19, 2020

IanButterworth commented Feb 17, 2022 • edited Loading

IanButterworth commented May 18, 2020 •

edited

Loading

ViralBShah commented May 23, 2020 •

edited

Loading

IanButterworth commented May 23, 2020 •

edited

Loading

IanButterworth commented Nov 2, 2020 •

edited

Loading

IanButterworth commented Nov 2, 2020 •

edited

Loading

IanButterworth commented Nov 3, 2020 •

edited

Loading

IanButterworth commented Feb 17, 2022 •

edited

Loading