-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::io::copy performance may be 20x slower than it could be? #49921
Comments
You mention windows. This may be platform-specific behavior. On unix systems I would fadvise |
After a coarse overview, my intuition is that there is some funny business going on with the length determined by |
It’s possible. The actual times did increase dramatically from the std::io::copy default though - my suspicion is that there is probably some caching to RAM going on or something as cold runs were generally slower than hot runs. This was for some work code that I’m no longer actively developing. I’ll leave this email marked as unread for awhile in case I have some spare time to write a benchmark. I don’t think the other flag that was mentioned is accessible with the default open file api, but in any case it at least appears there’s some low hanging fruit wrt to std::io::copy perf |
I tried benchmarking FWIW, I'm providing the benchmark code below; it can easily be expanded to different file sizes:
|
4MB isn’t a whole lot. That’s well within the size of hard drive caches or possibly an OS memory-based cache. I was noticing the difference with files several gigabytes in size. As a sanity check, maybe look to see what the actual time is, not just the % difference between timings with different buffer sizes. The actual time may be wildly different than the actual steady state performance of the disk if it’s reading the data from a cache. It may also be possible at a low level to pass a flag to encourage the OS to disable or flush the cache, but I don’t know if there’s any way to instruct the hard drive to do the same. |
I only pasted the benchmark for 4MB as a reference - I tested files from 4KB up to 16MB. I'm not an expert with low-level hard drive stuff, just hoping my attempt can help someone else. |
Ideally you would test with a file approximately twice as big as you have RAM, so if you have 8GB of RAM you would want to test with a 16GB file. |
@spease extended open flags are accessible via https://doc.rust-lang.org/std/os/windows/fs/trait.OpenOptionsExt.html |
How come? |
But we also don't want to be allocating 4MB every time we do |
possible approaches:
|
I think some kind of specialization for As of today, if your source is loop {
let buf = reader.fill_buf()?;
if buf.is_empty() { break; }
let consumed = file_out.write(buf)?;
drop(buf);
reader.consume(consumed);
} |
@main-- a single heap allocation is entirely reasonable for a single invocation of copy of a lot of data. For many copies of a small amount of data, one ends up with a lot of allocations again (consider for instance copying data out of a tar, where we have views on the data and there's no actual synchronous IO etc happening. I like some of the suggested possibilities with specialisation. |
It's a nice general optimization, but not necesarily ideal if you use a Perhaps a writer-side optimization would be more flexible. Then you can specialize either for |
That sounds rather uncommon to me though. After all, |
yes, generic code taking a write was the concern. E.g. tar-rs has a tar archive Builder which takes a |
I had same kind of issue but with different library (indicatif). The copy rate was around 70 MB/s (inside single NVMe SSD). The problem was actually with running the debug version of my program. When I built and ran the release version, the speed jumped to over 1 GB/s. |
Let io::copy reuse BufWriter buffers This optimization will allow users to implicitly set the buffer size for io::copy by wrapping the writer into a `BufWriter` if the default block size is insufficient, which should fix rust-lang#49921 Due to min_specialization limitations this approach only works with `BufWriter` but not for `BufReader<R>` since `R` is unconstrained and thus the necessary specialization on `R: Read` is not always applicable. Once specialization becomes more powerful this optimization could be extended to look at the reader and writer side and use whichever buffer is larger.
With #78641 merged you can now (on next nightly) use |
I was using std::io::copy from an SSD to memory using ProgressBarReader, which reported an eye-raising 30 MB/s. Wrapping it in a BufReader (either as the outer or middle layer) made no real difference.
So I googled around a bit and found the recommended size was 64KB according to some random Microsoft article, so I decided to copy-paste the std::io::copy implementation and increase the buffer size. This got up to 600 MB/s at about 256KB which sounded a lot more reasonable...then it continued going upwards from there past the point of believability (unless the files were being pulled from RAM as a consequence of my repeated accesses). With a 10MB Vec, I got 228.36 GB/s 🙃.
Unrealistic values aside, has anybody checked the buffer value to make sure there isn't some huge low-hanging fruit for std::io::copy speeds?
The text was updated successfully, but these errors were encountered: