-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow disk IO on MacOS (potentially Windows too) #35907
Comments
It seems that stat calls are more expensive on the mac's filesystem; #35925 addresses that for some cases. But I believe many of these are filesystem differences not due to julia. We should drill down and find out. |
Curiously I just gave #35925 a go on MacOS and actually think I see a regression from master (note that this is a comparison to the built-in SystemBenchmark ref, linux 1.4.1):
Current master (nightly binary):
Comparison of the two (ref = macos with master, test = macos with #35925). Mostly seems like benchmark noise apart from
It could always be that the benchmark designs aren't capturing the issue accurately? |
For a non-julia platform-level test, I ran a 1KB The results seem more comparable.
Same, but 1MB
Also, if it's helpful, I ran #35925 on the reference linux machine, and compared it to its own 1.4.1 results.
|
Thanks for checking that. I think I found my mistake. |
Ok I've updated #35925; should be better now. But I'm a bit out of ideas; we're just doing open, read, and close. I don't know what would be several times slower. |
Ok. Tried again on macOS. With the update to #35925 (relative to 1.4.1 linux reference, table below)
This is all puzzling.. The benchmark is just: function tempwrite(x; delete::Bool=false, path = joinpath(@__DIR__, "testwrite.dat"))
open(path, "w") do io
write(io, x)
end
delete && rm(path)
return path
end
function tempread(path)
x = open(path) do io
read(io)
end
return x
end
t = @benchmark tempwrite(x) setup=(x=rand(UInt8, 1000))
t = @benchmark tempwrite(x) setup=(x=rand(UInt8, 1000000))
t = @benchmark tempread(path) setup=(path = tempwrite(rand(UInt8, 1000), delete=false))
t = @benchmark tempread(path) setup=(path = tempwrite(rand(UInt8, 1000000), delete=false))
|
I vaguely remember an old issue regarding buffer sizes that we use for |
In the 1MB case, we do two read calls: one to initially fill the 32kb buffer, then we copy that to the output, and do one more read call to read the entire rest of the file into the output at once. So we're not even really using buffering here. |
Two benchmarking details:
Also, I thought more granularity might help. The test below shows MacOS is slower than an ubuntu VM on the same machine, across all file sizes. Though the degree slower is less than SystemBenchmark reported. Note that here, the test file written once per file size, not rewritten before each sample, like SystemBenchmark does. using BenchmarkTools
for zeros in 3:6
for dig in 1:9
bytes = parse(Int,"$(dig)$(repeat("0",zeros))")
open("dummy.dat", "w") do io
write(io, rand(UInt8,bytes))
end;
t = @benchmark open(f->read(f, UInt8), "dummy.dat")
time_s = minimum(t).time / 1e9
MiB_s = (bytes / time_s) / (1024 * 1024)
@info "File size: $(round(Int,bytes/1000)) KB. Read speed: $(round(MiB_s,digits=1)) MiB/s"
end
end
|
I read somewhere that SIP on mac could perhaps slow things down. Maybe try disabling and seeing if it makes a difference for the benchmark? |
The tests above were actually with SIP disabled. I re-tried with SIP enabled and it is slower on the
|
I just checked and this issue remains on master UInt8 readingusing BenchmarkTools
path, _ = mktemp()
for zeros in 3:8
bytes = 10^zeros
open(path, "w") do io
write(io, rand(UInt8,bytes))
end
t = @benchmark open(path) do io
read(io, UInt8)
end
time_s = minimum(t).time / 1e9
speed = Base.format_bytes(bytes / time_s)
@info "File size: $(Base.format_bytes(bytes)). \tbtime (min): $(round(time_s * 1e6, digits=2))us. \tRead speed: $(speed)/s"
end MacOS with SIP enabled
MacOS with SIP disabled
Ubuntu VM on same machine
|
Seems to be an overhead with file opening/closing
MacOS, no SIP
Ubuntu VM
|
Breaking it down into open / close for i in 1:5
@time io = open(path)
@time close(io)
println("--")
end MacOS, no SIP
Ubuntu VM
|
I'm marking this as a 1.6 release blocker. |
Maybe I missed something but I thought this wasn't a regression. Why is it release-blocking suddenly? |
(I 👍-ed because it would be good to figure this out, but as far as I know it isn't a regression) |
My impression was that this was a regression. If it's not, we can make it non-blocking. |
If it's a regression, it may be pre 1.0. MacOS official binaries: 1.0.4
1.1.1
1.2.0
1.3.1
1.4.2
1.5.2
1.6.0-DEV.1221
|
I took it off since I don't think it should block the release if it has been the same since 1.0. |
To rule out my VM setup doing something weird with filesystem caching, I ran this on a native ubuntu machine and the speed seems real (but the hardware is a little lower spec than my mac)
|
and the src handling from there onwards doesn't seem to treat MacOS and Linux differently, they both fall within this Line 952 in d33e7e0
And lower down, I validated that Line 910 in d33e7e0
So code-wise, it seems identical. Could it be compiler settings? I tried with both official binaries and a local build, and they perform the same. Chances of it being just a slower filesystem are increasing.. |
I made a raw c test that tries to mimic julia's Oddly, both times are proportionally slower than the timing above, but MacOS is still 3-4x slower #include <stdio.h>
#include <fcntl.h>
#include <time.h>
int main()
{
struct timespec tstart={0,0}, tend={0,0};
clock_gettime(CLOCK_MONOTONIC, &tstart);
int fd = open("/path/to/testfile.txt", O_RDONLY | O_CLOEXEC, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
clock_gettime(CLOCK_MONOTONIC, &tend);
printf("%.0f µs\n", (((double)tend.tv_sec + 1.0e-9*tend.tv_nsec) - ((double)tstart.tv_sec + 1.0e-9*tstart.tv_nsec)) * 1.0e6);
return 0;
} MacOS, no SIP
Ubuntu VM
|
I posted the above example on SO to collect any thoughts https://stackoverflow.com/questions/64656255/why-is-the-c-function-open-4x-slower-on-macos-vs-an-ubuntu-vm-on-the-same-ma/64656675#64656675 All signs are pointing to MacOS just having a slower filesystem |
|
The slowness and Linux VMs being faster for disk IO on a Mac than native MacOS appears to be explained by this golang/go#28739 (comment)
|
Benchmarks reports from SystemBenchmark.jl that have been community submitted have shown that disk IO performance is impeded on MacOS and potentially Windows too.
Example 1: A same-hardware test on a 2015 Macbook Pro, booted into Arch Linux and MacOS shows mostly much worse disk IO in MacOS
The tests are in milliseconds here (except peakflops), and the factor is
test / res
, so higher numbers indicate that the test (MacOS) is running slower than the ref (Arch Linux)Example 2: A same hardware test on a 2018 Macbook Pro with Parallels-hosted Ubuntu 18.04 VM (Ref), vs. the native MacOS (Test)
On a general note, I was surprised how well the Ubuntu VM did here
Further, plotting the full results so far shows a fast clustering of linux disk io results, while MacOS & windows struggle (bottom left here). There are some Linux outliers which according to users are systems using slower storage devices such as SD cards.
Windows needs more testing. So if anyone with a windows machine with a linux VM could run this, it would be helpful:
The text was updated successfully, but these errors were encountered: