-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update VecDeque implementation to use head+len instead of head+tail #102991
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon. Please see the contribution instructions for more information. |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
I'm trying to run the |
Is there any better way to run the tidy tool just for alloc, since I didn't change any other code? |
Ok this error is really weird, I got it locally too but didn't really think too much of it as I also got some weird errors due to winapi functions failing. I don't really know what this error is about and it doesn't give any stacktrace to help identify the issue unfortunately. |
Also, for some reason it's telling me that collectiontests is somehow causing a stack buffer overrun, and I'd be quite surprised if that came from my VecDeque implementation. |
This comment has been minimized.
This comment has been minimized.
When I was trying to rewrite |
Yea I read that somewhere that VecDeque is used in the testing framework, but the weird thing is that all the tests in the first test suite pass just fine, and also i don't see how my code could cause a stack buffer overrun. The only part i can think of where it reads memory from the stack directly would be the From<[T; N]> impl, and from what I can tell that shouldn't be able to overrun the buffer. |
You can enable debug asserts for the standard library in
We don't have 100% test coverage, so it might be some edge-case |
I think I finally got it to work, sorry for all that. |
I'm not familiar with the format of these .stderr files, what do I need to do to make it not fail there? |
See https://rustc-dev-guide.rust-lang.org/tests/running.html#editing-and-updating-the-reference-files
|
Oh boy, doing that does an LLVM build for me, which my laptop doesn't have the resources for. Edit: nevermind, figured out you can just set download-ci-llvm=true |
set |
Ugh, this is kinda frustrating. Problem is, I only have access to a pretty slow laptop, and so everything takes forever. Running just the ui tests took like 30 minutes, not including the compile time. |
Not sure about the current circumstance of this PR, but you can try to only run previously failing test locally or try to get the CI to help you run the test (since its much faster, but still takes 10+ mins to finishing the first run of test and 40+ mins to complete the CI). |
Phew, finally got the checks passing. I'm very sorry for all the hassle, this is my first time trying to contribute to the stdlib and it probably shows. |
That's fine, there will be many more failures (those are not even all the tests that have to pass 😁 ). Could you share the results of stdlib (or any other) microbenchmarks of the old and the new |
Running
And these are the results for the new implementation:
Some of these seem a bit weird to me, like |
Std benches are finnicky because they measure wall-time. Especially if you're on a laptop. This pending update to the std dev guide might help
It's copying 512 bytes into a pre-allocated vecdeque, that's 8 cache-lines worth of data and fits into the L1 cache. 20ns seems like the right order of magnitude, especially considering that a processor can execute a handful of instruction every nanosecond. 1000ns does not. |
Results from my laptop (Zen 3):
(I try to reduce variance by turning off hyper-threading, turbo-boost etc., but it's still quite noisy, as usually). It looks really good! 🚀 I should have tried this simple representation when I was trying to do what you have done, I tried the "unmasked indices" approach from https://www.snellman.net/blog/archive/2016-12-13-ring-buffers/, but it was super complicated. |
Seems like I messed up something then, if I had to guess it's a worse SpecExtend impl, I can look into it later. |
Alrighty, the new numbers on my laptop are looking like this:
|
Alright, at this point I'm fairly confident in the correctness of the code, I just did a final thorough pass and looked over all of it. I'd still like to try and fuzz it against my original deque rewrite, which I had already fuzzed against the old |
It would be good if you could clean up the git history (with
|
The test just failed because a |
This comment has been minimized.
This comment has been minimized.
352e5c6
to
b1c3c63
Compare
The tests all pass on windows now. I removed the commit that changed the CI, and I guess @scottmcm it's time for bors round 3. |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (69df0f2): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
Uhh what should I say |
@Sp00ph I'd just leave it for perf triage to look at and see if they have particular concerns. I think it's probably fine -- no primary regressions, a couple primary improvements, and mixed (albeit slightly negative overall) secondary results. For a fairly substantial change, I think it's good enough. We can always do more changes later should particular things be pointed out. |
Yeah, the number of perf changes is very small and it's a mix of small improvements and regressions. Not a concern. @rustbot label: +perf-regression-triaged |
Update VecDeque implementation to use head+len instead of head+tail (See rust-lang#99805) This changes `alloc::collections::VecDeque`'s internal representation from using head and tail indices to using a head index and a length field. It has a few advantages over the current design: * It allows the buffer to be of length 0, which means the `VecDeque::new` new longer has to allocate and could be changed to a `const fn` * It allows the `VecDeque` to fill the buffer completely, unlike the old implementation, which always had to leave a free space * It removes the restriction for the size to be a power of two, allowing it to properly `shrink_to_fit`, unlike the old `VecDeque` * The above points also combine to allow the `Vec<T> -> VecDeque<T>` conversion to be very cheap and guaranteed O(1). I mention this in the `From<Vec<T>>` impl, but it's not a strong guarantee just yet, as that would likely need some form of API change proposal. All the tests seem to pass for the new `VecDeque`, with some slight adjustments. r? `@scottmcm`
(See #99805)
This changes
alloc::collections::VecDeque
's internal representation from using head and tail indices to using a head index and a length field. It has a few advantages over the current design:VecDeque::new
new longer has to allocate and could be changed to aconst fn
VecDeque
to fill the buffer completely, unlike the old implementation, which always had to leave a free spaceshrink_to_fit
, unlike the oldVecDeque
Vec<T> -> VecDeque<T>
conversion to be very cheap and guaranteed O(1). I mention this in theFrom<Vec<T>>
impl, but it's not a strong guarantee just yet, as that would likely need some form of API change proposal.All the tests seem to pass for the new
VecDeque
, with some slight adjustments.r? @scottmcm