Add specializations of read_to_end for Stdin and File using uninitialised buffers #26950

AlisdairO · 2015-07-10T21:41:30Z

In general, it's undesirable to have read_to_end use a buffer with uninitialized memory, as that could lead to undefined behaviour in the event of a bad Read implementation. Since we control the implementations of Read for Stdin and File, however, it should be okay for us to specialise them to improve performance. This PR is to do that!

Adds some unsafe code to deal with creating the buffers. Since the read_to_end function needed to be used from the io and fs crates, I moved it into a newly-created sys::common::io module. Alternatively we could expose the new read_to_end functions to allow people to create their own read_to_end implementations for code they trust.

Benchmarks:

Read a 2.5MB file:
sys_common::io::tests::bench_init_file ... bench: 27,473,317 ns/iter (+/- 2,490,767)
sys_common::io::tests::bench_uninit_file ... bench: 25,611,793 ns/iter (+/- 2,137,387)

Read a buffer full of constant values
sys_common::io::tests::bench_uninitialized ... bench: 12,877,645 ns/iter (+/- 931,025)
sys_common::io::tests::bench_zeroed ... bench: 18,581,082 ns/iter (+/- 1,541,108)

So, approx a 7% speedup for file reading, which I think is worthwhile.

rust-highfive · 2015-07-10T21:41:44Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @huonw (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

AlisdairO · 2015-07-10T21:42:30Z

cc @bluss

AlisdairO · 2015-07-10T21:50:48Z

Ack, I'll tidy up the formatting issues in the morning - sorry!

bluss · 2015-07-11T12:44:44Z

We must audit this for panic safety: The vector is owned by the user. We simply assume that in case of panic, something will come along and read the vector as it is, so its length must never be set so that it exposes uninitialized memory, regardless of when or how we leave the function.

To be defensive, we should probably assume there exists a panic case somewhere in the Stdin and File code, instead of auditing it for not being present (an analysis that would go out of date anyway).

The way to do that would be (in the uninitialized case at least), to slice outside the length of the vector, and only update the vector length after the reader has written into that slice successfully.

bluss · 2015-07-11T12:49:29Z

src/libstd/sys/common/io.rs

+    unsafe {read_to_end(r, buf, true) }
+}
+
+pub unsafe fn read_to_end_uninitialized<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> io::Result<usize> {


Can we afford to make this function non-generic and just use a &mut Read ?

Could do - is there an advantage to doing so?

libstd should be smaller if there's only one instantiation of this function. It's just a random idea, not so important.

AlisdairO · 2015-07-11T18:36:12Z

Good point on the panic safety! How do I get a slice outside the length of the vector? I'm not seeing an obvious way to do that on Vec itself, at least.

alexcrichton · 2015-07-11T20:38:49Z

src/libstd/sys/common/io.rs

+pub const DEFAULT_BUF_SIZE: usize = 64 * 1024;
+
+
+pub fn read_to_end_zeroed<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> io::Result<usize> {


I agree with @bluss that taking &mut Read here is probably a good idea, there's no real need to instantiate these functions into downstream crates.

Unfortunately making read_to_end take a trait object doesn't currently appear to be possible: if I try I get the following error:

src/libstd/sys/common/io.rs:19:26: 19:27 error: the trait core::marker::Sized is not implemented for the type R [E0277]
src/libstd/sys/common/io.rs:19 unsafe { read_to_end(r, buf, true) }

Asked on IRC with a reduced example ( https://play.rust-lang.org/?code=use%20std%3A%3Aio%3A%3ARead%3B%0A%0Afn%20foo%28r%3A%20%26mut%20Read%29%20{%0A%0A}%0A%0Afn%20bar%3CR%3A%20Read%2B%3FSized%3E%28r%3A%20%26mut%20R%29%20{%0A%20%20%20%20foo%28r%29%3B%0A}%0A%0Afn%20main%28%29%20{%0A%20%20%20%20bar%28%26mut%20std%3A%3Aio%3A%3Astdin%28%29%29%3B%0A}&version=stable ) and nobody seemed quite sure why that wouldn't work, given that R is behind a reference.

AlisdairO · 2015-07-12T19:04:23Z

Okay, latest push addresses all comments aside from using a trait object for read_to_end, which (as far as I can see!) is not currently possible.

alexcrichton · 2015-07-12T23:04:19Z

src/libstd/io/util.rs

-        }
-        Ok(buf.len())
-    }
-}


I don't think this needs to move, you can access it through std::io instead of std::io::utils

Brainfart, apologies!

bors · 2015-07-14T17:24:23Z

☔ The latest upstream changes (presumably #27024) made this pull request unmergeable. Please resolve the merge conflicts.

AlisdairO · 2015-07-14T17:35:18Z

Thanks for the comments - I've moved read_to_end_uninitialized into its own function, with better growth handling - actually allowed me to undo a bunch of the other changes I made!

bluss · 2015-07-14T18:33:21Z

src/libstd/sys/common/io.rs

+    let mut len = start_len;
+    let mut increase =
+        if len == 0 {
+            1


The old version used size 16 for the first read, we should probably use that. The increase logic is a bit weird though, maybe we can use 16 if it's empty or less than 16 (do it outside the loop), and then just .reserve(1) in the loop. Since we're filled up, reserve(1) should double the vec's capacity (or use whatever growth strategy it prefers(?)).

I agree, and in addition to this I believe you don't need a separate len variable as it's the same as buf.len()

AlisdairO · 2015-07-15T09:38:44Z

OK, I've made the requested changes - thanks for your patience :-)

bluss · 2015-07-15T10:20:27Z

It looks beautiful to me 👍

If you time how this performs, I'd be curious on the effect on slurping that large file in the benchmark, or anything of larger size (> 100 MB).

AlisdairO · 2015-07-15T15:32:25Z

OK, I've used the 242MB file generated by the benchmark game fasta benchmark, and written a little application to read it in from stdin. Over 6 runs (averaged), the uninitialized version takes 0.37116s to read it, while the initialized version takes 0.34816s - so the initialized version is about 6.6% slower.

AlisdairO · 2015-07-15T15:40:20Z

Essentially all the remaining time is in je_huge_ralloc, read, je_huge_dalloc.

alexcrichton · 2015-07-15T16:23:44Z

src/libstd/sys/common/io.rs

+// Implementations using this method have to adhere to two guarantees:
+//  *  The implementation of read never reads the buffer provided.
+//  *  The implementation of read correctly reports how many bytes were written.
+pub unsafe fn read_to_end_uninitialized<R: Read + ?Sized>(mut r: &mut R, buf: &mut Vec<u8>)


Could you also update this to use &mut Read instead of a generic?

alexcrichton · 2015-07-15T16:28:28Z

Thanks again @AlisdairO! The only other main primitive I can think of which would benefit from this is perhaps TcpStream, so could you add an override there as well?

AlisdairO · 2015-07-15T17:04:36Z

That should do it I think!

alexcrichton · 2015-07-15T17:07:21Z

Looks good to me! r=me with a squash

allowing them to read into a buffer containing uninitialized data, rather than pay the cost of zeroing.

AlisdairO · 2015-07-15T20:32:38Z

OK, squashed. Thanks for the friendly/helpful reviews, I appreciate it!

alexcrichton · 2015-07-15T21:07:00Z

@bors: r+ 98f2872

Thanks again!

In general, it's undesirable to have read_to_end use a buffer with uninitialized memory, as that could lead to undefined behaviour in the event of a bad Read implementation. Since we control the implementations of Read for Stdin and File, however, it should be okay for us to specialise them to improve performance. This PR is to do that! Adds some unsafe code to deal with creating the buffers. Since the read_to_end function needed to be used from the io and fs crates, I moved it into a newly-created sys::common::io module. Alternatively we could expose the new read_to_end functions to allow people to create their own read_to_end implementations for code they trust. Benchmarks: Read a 2.5MB file: sys_common::io::tests::bench_init_file ... bench: 27,473,317 ns/iter (+/- 2,490,767) sys_common::io::tests::bench_uninit_file ... bench: 25,611,793 ns/iter (+/- 2,137,387) Read a buffer full of constant values sys_common::io::tests::bench_uninitialized ... bench: 12,877,645 ns/iter (+/- 931,025) sys_common::io::tests::bench_zeroed ... bench: 18,581,082 ns/iter (+/- 1,541,108) So, approx a 7% speedup for file reading, which I think is worthwhile.

rust-highfive assigned huonw Jul 10, 2015

bluss reviewed Jul 11, 2015
View reviewed changes

alexcrichton reviewed Jul 11, 2015
View reviewed changes

alexcrichton reviewed Jul 12, 2015
View reviewed changes

bluss reviewed Jul 14, 2015
View reviewed changes

alexcrichton reviewed Jul 15, 2015
View reviewed changes

Add specializations of read_to_end for Stdin, TcpStream and File,

98f2872

allowing them to read into a buffer containing uninitialized data, rather than pay the cost of zeroing.

AlisdairO force-pushed the memset branch from aa712f4 to 98f2872 Compare July 15, 2015 20:30

bors merged commit 98f2872 into rust-lang:master Jul 17, 2015

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Jul 17, 2015

Stebalien mentioned this pull request Jul 21, 2015

Safe uninitialized byte arrays. rust-lang/rfcs#1222

Closed

sfackler mentioned this pull request Nov 13, 2019

Proposing new AsyncRead / AsyncWrite traits tokio-rs/tokio#1744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add specializations of read_to_end for Stdin and File using uninitialised buffers #26950

Add specializations of read_to_end for Stdin and File using uninitialised buffers #26950

AlisdairO commented Jul 10, 2015

rust-highfive commented Jul 10, 2015

AlisdairO commented Jul 10, 2015

AlisdairO commented Jul 10, 2015

bluss commented Jul 11, 2015

bluss Jul 11, 2015

AlisdairO Jul 11, 2015

bluss Jul 11, 2015

AlisdairO commented Jul 11, 2015

alexcrichton Jul 11, 2015

AlisdairO Jul 12, 2015

AlisdairO commented Jul 12, 2015

alexcrichton Jul 12, 2015

AlisdairO Jul 13, 2015

bors commented Jul 14, 2015

AlisdairO commented Jul 14, 2015

bluss Jul 14, 2015

alexcrichton Jul 14, 2015

AlisdairO commented Jul 15, 2015

bluss commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

alexcrichton Jul 15, 2015

alexcrichton commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

alexcrichton commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

alexcrichton commented Jul 15, 2015

		pub const DEFAULT_BUF_SIZE: usize = 64 * 1024;


		pub fn read_to_end_zeroed<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> io::Result<usize> {

Add specializations of read_to_end for Stdin and File using uninitialised buffers #26950

Add specializations of read_to_end for Stdin and File using uninitialised buffers #26950

Conversation

AlisdairO commented Jul 10, 2015

rust-highfive commented Jul 10, 2015

AlisdairO commented Jul 10, 2015

AlisdairO commented Jul 10, 2015

bluss commented Jul 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlisdairO commented Jul 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlisdairO commented Jul 12, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Jul 14, 2015

AlisdairO commented Jul 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlisdairO commented Jul 15, 2015

bluss commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

Choose a reason for hiding this comment

alexcrichton commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

alexcrichton commented Jul 15, 2015

AlisdairO commented Jul 15, 2015

alexcrichton commented Jul 15, 2015