-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File IO with read_to_end and read_to_string is slower than possible #35823
Comments
I completely agree but we can do better. We can just read directly to the string (given a bit of unsafe code). |
The standard library already reads into the underlying |
This has been proposed before as well, and I've written up some concerns about this solution. It basically just boils down to the fact that if you expect |
Would it be unreasonable to ask that if a buffer has been pre-allocated to the capacity, the buffer's capacity is used as a heuristic? This requires no extra syscalls. Also, it seems (though I may be wrong here) that this was the behavior before 240734c (PR #23847); where the write size no longer used the underlying buffers capacity as a baseline for how much to read. I feel that while growing the underlying buffer's capacity without bounds isn't a good idea (at least by default); using the existing capacity still makes sense. Namely, if the buffer's capacity is greater than the default buffer size (8KB), then don't allocate more memory (this is already the case). However, if given a buffer with a greater capacity, attempt to read Some form of exponential (?) backing off could be implemented if the read returns less than expected. |
@Mark-Simulacrum hm it should be the case right now that if you have a pre-allocated buffer (via |
This is the line in the source code for the I can try to get the description I've listed above implemented, though I'm not sure I can run the tests locally with any success (in the past I've seen failures on master, so I suspect something in my environment may be wrong). |
The current default Lines 345 to 374 in b30eff7
Some types however opt into using |
These are the only uses of read_to_end_unitialized that I could find aside from tests/benchmarks. FileDesc is used by File in src/libstr/sys/unix/fs.rs I think, but I don't understand where Either way, it seems that the capacity-based reading either doesn't currently happen or doesn't happen on Windows.
|
Yes that's the generic implementation of |
Hmm, I'm not sure I'm doing this right, but with the code below and debugging in gdb, I'm hitting the read_to_end helper function, not the read_to_end_unitialized. So.. not sure what's going on here. I'm on unix (Ubuntu 16.04)... let mut file = File::open(entry.path())?;
let mut file_contents = String::new();
file.read_to_string(&mut file_contents).unwrap(); |
Oh looks like the implementation of |
I don't believe there's much more to do here, so I'm going to close. Benchmarks seemed to indicate this wasn't actually a win. |
Rust's current implementation of
read_to_end
will read exponentially larger chunks of the buffer, but only up to 8192. This is slow when reading a greater than 16KB file if counting by syscalls.NodeJS reads the entire file in one go, making 3 syscalls: open, fstat, read, and close.
Rust currently makes (for the same file) 58 syscalls not counting mmap/madvise which may be circumstantial (not related to IO). Specifically: open, ioctl, read * 37, mmap, madvise, read * 18, madvise * 14, close.
I would expect that functions such as read_to_end and read_to_string would do the minimal work necessary to read the entire file.
The below diff is what it takes to currently get Rust to do 4 syscalls (open, fstat, read, read, close). I think it's reasonable that similar code could be included (perhaps with specialization) in the filesystem reading code.
The difference between the two (with dropping caches):
fast version:
current, slower version:
The text was updated successfully, but these errors were encountered: