-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #23820 - sfackler:fast_read_to_end, r=alexcrichton
with_end_to_cap is enormously expensive now that it's initializing memory since it involves 64k allocation + memset on every call. This is most noticable when calling read_to_end on very small readers, where the new version if **4 orders of magnitude** faster. BufReader also depended on with_end_to_cap so I've rewritten it in its original form. As a bonus, converted the buffered IO struct Debug impls to use the debug builders. I first came across this in sfackler/rust-postgres#106 where a user reported a 10x performance regression. A call to read_to_end turned out to be the culprit: sfackler/rust-postgres@9cd413d. The new version differs from the old in a couple of ways. The buffer size used is now adaptive. It starts at 32 bytes and doubles each time EOF hasn't been reached up to a limit of 64k. In addition, the buffer is only truncated when EOF or an error has been reached, rather than after every call to read as was the case for the old implementation. I wrote up a benchmark to compare the old version and new version: https://gist.github.com/sfackler/e979711b0ee2f2063462 It tests a couple of different cases: a high bandwidth reader, a low bandwidth reader, and a low bandwidth reader that won't return more than 10k per call to `read`. The high bandwidth reader should be analagous to use cases when reading from e.g. a `BufReader` or `Vec`, and the low bandwidth readers should be analogous to reading from something like a `TcpStream`. Of special note, reads from a high bandwith reader containing 4 bytes are now *4,495 times faster*. ``` ~/foo ❯ cargo bench Compiling foo v0.0.1 (file:///home/sfackler/foo) Running target/release/foo-7498d7dd7faecf5c running 13 tests test test_new ... ignored test new_delay_4 ... bench: 230768 ns/iter (+/- 14812) test new_delay_4_cap ... bench: 231421 ns/iter (+/- 7211) test new_delay_5m ... bench: 14495370 ns/iter (+/- 4008648) test new_delay_5m_cap ... bench: 73127954 ns/iter (+/- 59908587) test new_nodelay_4 ... bench: 83 ns/iter (+/- 2) test new_nodelay_5m ... bench: 12527237 ns/iter (+/- 335243) test std_delay_4 ... bench: 373095 ns/iter (+/- 12613) test std_delay_4_cap ... bench: 374190 ns/iter (+/- 19611) test std_delay_5m ... bench: 17356012 ns/iter (+/- 15906588) test std_delay_5m_cap ... bench: 883555035 ns/iter (+/- 205559857) test std_nodelay_4 ... bench: 144937 ns/iter (+/- 2448) test std_nodelay_5m ... bench: 16095893 ns/iter (+/- 3315116) test result: ok. 0 passed; 0 failed; 1 ignored; 12 measured ``` r? @alexcrichton
- Loading branch information
Showing
2 changed files
with
76 additions
and
59 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters