-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize BufWriter #79930
Optimize BufWriter #79930
Conversation
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
This improves Optimization was guided by rustc benchmarks. I have a local change that uses Performance was initially completely unworkable. Here is the effect of adding each optimization to the current Bufwriter. Optimization 1: inline hot paths Realize that these are benchmarks of the entirety of compilation, not just Obviously optimization 2 is the most impactful, optimization 1 is good, and optimization 3 would be good any other day of the week, but after the other two, it's like, whatever. :) After all was said and done, the toll of using Still isn't quite where I want it to be though. See #79921 for details. |
I'm aware that #78551 is also making changes in |
5d0dbb4
to
0a8863b
Compare
Out of curiosity, have you experimented with a smaller hammer by using |
I have a concern that this optimization is unsound when |
I don't think I did--definitely not Now that you mention it, I'm not quite sure what the right thing to do is here. If I don't separate the functions out, the rustc benchmarks take a pretty big hit. So I know that, for my modified rustc's use of But maybe I could just remove the annotations, as currently, the "cold" functions don't get inlined in rustc, even without Anyway, I'm open to suggestions. |
Ah, thank you. Looks like there was a correctness issue already, if I understand correctly, but it didn't lead to unsoundness until my change. I'll fix the problem.
There is potential for overflow in Update again: disregard the strike-through text. I've learned that there's no guarantee yet that the maximum slice size won't increase beyond isize::MAX in the future. I've changed the code to safeguard against that possibility. |
ebdda3f
to
6b68ae4
Compare
I think I've addressed the possibility of overflow in |
This comment has been minimized.
This comment has been minimized.
6b68ae4
to
157ede8
Compare
I've been told by @RalfJung that there's no guarantee yet that the maximum slice size won't increase beyond |
157ede8
to
36844de
Compare
Btw, recombining the hot/cold code and annotating the branches wasn't helpful. And using |
|
|
Or perhaps, just |
36844de
to
5bfbe41
Compare
Okay, I've rebased, added a comment about the overflow edge case discussed above, and changed |
5bfbe41
to
a84518a
Compare
9a5792c
to
76094fa
Compare
Rebased on latest master. Will address comments soon. |
@rustbot label: +S-waiting-on-review -S-waiting-on-author |
Ensure that `write` and `write_all` can be inlined and that their commonly executed fast paths can be as short as possible. `write_vectored` would likely benefit from the same optimization, but I omitted it because its implementation is more complex, and I don't have a benchmark on hand to guide its optimization.
We use a Vec as our internal, constant-sized buffer, but the overhead of using methods like `extend_from_slice` can be enormous, likely because they don't get inlined, because `Vec` has to repeat bounds checks that we've already done, and because it makes considerations for things like reallocating, even though they should never happen.
Optimize for the common case where the input write size is less than the buffer size. This slightly increases the cost for pathological write patterns that commonly fill the buffer exactly, but if a client is doing that frequently, they're already paying the cost of frequent flushing, etc., so the cost is of this optimization to them is relatively small.
76094fa
to
01e7018
Compare
In this case, I think it's easy enough to get by with a |
let old_len = self.buf.len(); | ||
let buf_len = buf.len(); | ||
let src = buf.as_ptr(); | ||
let dst = self.buf.as_mut_ptr().add(old_len); | ||
ptr::copy_nonoverlapping(src, dst, buf_len); | ||
self.buf.set_len(old_len + buf_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note that you could also implement this with self.buf.spare_capacity_mut()
and MaybeUninit::write_slice
, which are both still unstable.)
Thanks a lot for doing this! @bors r+ |
📌 Commit 01e7018 has been approved by |
… r=m-ou-se Optimize BufWriter
@bors rollup=never - performance PR |
☀️ Test successful - checks-actions |
No description provided.