RFC for Vec::append_from_within() #2714

Shnatsel · 2019-06-21T20:16:32Z

This addition is quite trivial. However, in the pre-RFC I've learned that there are many approaches to solving this, so I'm opening an RFC to provide a clear rationale and get some input on the final design.

Prototype implementation in playground

SimonSapin · 2019-06-21T21:17:21Z

This feels very ad-hoc. I wonder if exposing some intermediate building block would help? Something like:

impl<T> Vec<T> {
    /// Returns the result of `self.as_mut_slice()` together with the rest of the capacity.
    ///
    /// The two slices have length `self.len()` and `self.capacity() - self.len()` respectively.
    pub fn with_uninit_capacity(&mut self) -> (&mut [T], &mut [MaybeUninit<T>]) {…}
}

impl<T> MaybeUninit<T> {
    /// Panic if the two slices have different lengths.
    pub fn copy_from_slice(this: &mut [Self], other: &[T]) where T: Copy {…}
    pub fn clone_from_slice(this: &mut [Self], other: &[T]) where T: Clone {…}
}

Then, append_from_within could be implemented outside of the standard library with the only unsafe code being a call to Vec::set_len.

Shnatsel · 2019-06-21T22:09:04Z

I do not believe the MaybeUninit solution is optimal. For one, MaybeUninit::copy_from_slice actually guarantees that the buffer is initialized after its completion, but this is not reflected in the type system. Which is why this still requires an unsafe set_len() operation on the vector. Also, performance characteristics of this solution when used in a loop are exactly the same as for append_from_within. Finally, two similar but entirely safe solutions are already discussed in detail in "Alternatives" section.

Note that the desired end result in all three motivating examples is actually repeat_tail(&mut self, elements: usize, times: usize), while append_from_within() intended as a minimum viable safe building block for implementing the former.

While a hypothetical more general solutions could serve more use cases (see the pre-RFC for a motivating example from Claxon for a more general abstraction), it would take a long time to design and implement, and it would not entirely supersede append_from_within(). This function is more discoverable and easier to apply than a generic abstraction by virtue of being much simpler and also by being the counterpart of an existing method on slice. And in this case having a low cognitive load is important: as illustrated by the motivating examples, people are repeatedly tempted to hand-roll this and keep getting it wrong.

Ixrec · 2019-06-22T14:26:34Z

Could we add a self-contained description of why this method is so difficult to implement correctly? Two of the three example links didn't seem to ever describe the bug, and the third explained only at the end of a very long post about a bunch of other security stuff that the broken code forgot to check if an argument was 0 (which imo is a perfectly good answer, if that is correct, it just needs to be easier to find).

Shnatsel · 2019-06-22T14:41:09Z

The code in rust stdlib was not checking for overflow in capacity calculation. Here's the fix.

DEFLATE decoders forgot to check that one of the arguments is not 0. Here is a simplified version of the code they used that illustrates the problem:

/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_vulnerable(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
    buffer.reserve(num_bytes_to_fill); // allocate required memory immediately, it's faster this way
    unsafe { // set length of the buffer up front so we can set elements in a slice instead of pushing, which is slow
        buffer.set_len(buffer.len() + num_bytes_to_fill);
    }
    for i in (buffer.len() - num_bytes_to_fill)..buffer.len() {
        self.buffer[i] = self.buffer[i - repeating_fragment_len];
    }
}

If you pass repeating_fragment_len set to 0 it will expose contents of uninitialized memory in the output. Both inflate and libflate crates have this bug even though they were implemented independently.

Ixrec · 2019-06-22T15:50:35Z

Thanks, that makes perfect sense now.

burdges · 2019-06-22T19:35:23Z

At present, you'll need separate append_copies_from_within and append_clones_from_within, because the copy version require only Vec::set_len, while the Clone versions requires first Vec::as_mut_ptr and then later Vec::set_len.

In the simple versions, If Vec<(F,Arc<T>)> panics mid-way, due to F::Clone panicing, then the Arc<T> gets left with an incorrect ref count, which sounds okay but annoying. If otoh Mutex<Vec<F>> panics then PoisonError permits access while the Vec<F> contains uninitialized data. You could do only one Vec::set_len at the end in the Clone version, but doing them after each Clone works too.

If you wait for specialization, then presumably append_from_within can be specialized for Copy types, so I'd kinda suggest waiting for specialization.

Oops! Just noticed you want this restricted to Copy types throughout.

Shnatsel · 2019-06-22T20:06:22Z

I was using slice::copy_within() as reference, which is restricted to Copy types and does not come with an equivalent function for Clone types.

I am open to adding append_clones_from_within() if there are use cases for it.

text/0000-vec-append-from-within.md

scottmcm · 2019-06-23T00:48:54Z

On copy-vs-clone, I think Copy is fine for now. It could be expanded compatibly to Clone in the future if desired, especially once we eventually get specialization and could thus promise that it's just a memcpy when the input is Copy.

I was using slice::copy_within() as reference

I like this plan -- when I saw it get stabilized in 1.37 I also thought of this scenario, as it implicitly set a precedent for API design for things that treat part of themselves as input.

(Now that it's simplified so much and not blazing its own trail -- something like .as_fixed_capacity() was a much bigger addition -- I'd personally consider it small enough to just be a PR instead of an RFC, though of course I'm not on libs so my opinion isn't all that important here.)

Shnatsel · 2019-06-29T17:17:12Z

@WanzenBug and me have ported libflate to a 100% safe RLE implementation using this abstraction. That let us remove an unsound bespoke unsafe block.

It also improved end-to-end performance by 5% 10% over the unsafe, unsound implementation, and by much more over the best safe implementation possible without append_from_within().

text/0000-vec-append-from-within.md

to guide-level explanation, as requested in a comment

XAMPPRocky · 2019-07-11T11:56:08Z

From reading the RFC I found that Vec::repeat_part was a more clear description and API for the desired operation than append_from_within despite being more generic. That coupled with the potential to allow for eliding the checks makes it more compelling to me than append_from_within.

bikeshed I think switching the position of the parameters of repeat_part would better match the name. e.g. repeat_part(&mut self, times: usize, src: Range) as you first specify how may times you're going to repeat and then what part you want to repeat.

comex · 2019-07-11T22:46:34Z

I don't think the name repeat_part is clear at all; it sounds like it might return a repeated value without modifying the Vec, or perhaps replace the part with repeated copies of itself (as opposed to appending those copies at the end).

XAMPPRocky · 2019-07-12T07:05:41Z

So I went through Vec's current methods to see if there was existing terminology that would make more sense, and I think the extend terminology might suit this better. E.g. append_from_within -> extend_from_within, repeat_part -> extend_from_part.

alexcrichton · 2019-09-03T20:42:04Z

Thanks for the RFC here @Shnatsel and sorry for the delay!

FWIW the libs team tends to not require RFCs for API additions like this nowadays. This looks like a pretty reasonable feature addition to send in a PR, and we can have more discussion there if necessary (but it seems like a lot has happened here already!) and we can have another round of discussion before stabilization.

Does that sound alright with you? If so feel free to adapt the discussion here to a PR sent to libstd, and we can close this and open a tracking issue and copy over any remaining unresolved questions and such.

Shnatsel · 2019-09-03T20:45:06Z

Sounds good to me! I'll close this as soon as I create a PR.

Shnatsel · 2019-09-03T20:46:20Z

BTW: could we get the README in this repo updated? It states that additions to std require an RFC.

alexcrichton · 2019-09-04T16:42:06Z

Sure! Seems reasonable to update the README

Ixrec · 2019-10-11T12:10:37Z

@Shnatsel did you ever get a chance to make a PR?

Shnatsel · 2019-10-11T18:23:14Z

No, I didn't get around to it yet. I got sidetracked by other projects.

Shnatsel · 2019-10-26T23:42:11Z

For future reference, this would also eliminate the only unsafe block in ruzstd, the zstd decoder in Rust:

https://github.com/KillingSpark/zstd-rs/blob/e521dfdb4005b9d6d9556a8e3df9db445a5d038d/src/decoding/decodebuffer.rs#L135

WaffleLapkin · 2020-11-11T08:51:11Z

@Shnatsel am I right, that feature is still unimplemented? If so, I could make a PR.

Shnatsel · 2020-11-12T01:38:02Z

am I right, that feature is still unimplemented?

Yes, that is correct.

After a bunch of experimentation with this in a crate I am convinced that the API described here is the right level of abstraction to expose.

The implementation is already written and tested, see https://github.com/WanzenBug/rle-decode-helper/blob/690742a0de158d391b7bde1a0c71cccfdad33ab3/src/lib.rs#L74

All you need to do is make a PR for the standard library adding this function. I'd very much appreciate this since I keep getting sidetracked by other projects.

Implement <rust-lang/rfcs#2714>, changes from the RFC: - Rename the method `append_from_within` => `extend_from_within` - Loose :Copy bound => :Clone - Specialize in case of :Copy This commit also adds `Vec::split_at_spare` private method and use it to implement `Vec::spare_capacity_mut` and `Vec::extend_from_within`. This method returns 2 slices - initialized elements (same as `&mut vec[..]`) and uninitialized but allocated space (same as `vec.spare_capacity_mut()`).

…r=KodrAus add `Vec::extend_from_within` method under `vec_extend_from_within` feature gate Implement <rust-lang/rfcs#2714> ### tl;dr This PR adds a `extend_from_within` method to `Vec` which allows copying elements from a range to the end: ```rust #![feature(vec_extend_from_within)] let mut vec = vec![0, 1, 2, 3, 4]; vec.extend_from_within(2..); assert_eq!(vec, [0, 1, 2, 3, 4, 2, 3, 4]); vec.extend_from_within(..2); assert_eq!(vec, [0, 1, 2, 3, 4, 2, 3, 4, 0, 1]); vec.extend_from_within(4..8); assert_eq!(vec, [0, 1, 2, 3, 4, 2, 3, 4, 0, 1, 4, 2, 3, 4]); ``` ### Implementation notes Originally I've copied `@Shnatsel's` [implementation](https://github.com/WanzenBug/rle-decode-helper/blob/690742a0de158d391b7bde1a0c71cccfdad33ab3/src/lib.rs#L74) with some minor changes to support other ranges: ```rust pub fn append_from_within<R>(&mut self, src: R) where T: Copy, R: RangeBounds<usize>, { let len = self.len(); let Range { start, end } = src.assert_len(len);; let count = end - start; self.reserve(count); unsafe { // This is safe because `reserve()` above succeeded, // so `self.len() + count` did not overflow usize ptr::copy_nonoverlapping( self.get_unchecked(src.start), self.as_mut_ptr().add(len), count, ); self.set_len(len + count); } } ``` But then I've realized that this duplicates most of the code from (private) `Vec::append_elements`, so I've used it instead. Then I've applied `@KodrAus` suggestions from rust-lang#79015 (comment).

WaffleLapkin · 2021-02-02T13:01:11Z

Since rust-lang/rust#79015 was merged, I guess this PR can be closed?

Shnatsel added 2 commits June 21, 2019 22:08

Create 0000-vec-append-from-within.md

86791f7

Fill in PR number

c705940

Link to fix of stdlib vulnerability

9b2dc46

Link to a simplified example of bug in inflate

f1704f9

Fix self-link

5d1015c

scottmcm reviewed Jun 23, 2019

View reviewed changes

text/0000-vec-append-from-within.md Outdated Show resolved Hide resolved

scottmcm added T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-unsafe Unsafe related proposals & ideas A-collections Proposals about collection APIs labels Jun 23, 2019

Shnatsel added 2 commits June 23, 2019 05:37

Fix code examples to actually compile

08fdab8

Add code comments

866c6f9

Shnatsel mentioned this pull request Jun 23, 2019

Canvas unsafe code in the wild rust-lang/unsafe-code-guidelines#146

Closed

9 tasks

Shnatsel mentioned this pull request Jun 30, 2019

Switch to rle_decode_fast crate sile/libflate#38

Merged

mikeyhew reviewed Jul 5, 2019

View reviewed changes

text/0000-vec-append-from-within.md Show resolved Hide resolved

Add one-sentence description of the function

cece87b

to guide-level explanation, as requested in a comment

Shnatsel mentioned this pull request Jul 21, 2019

Audit libflate rust-secure-code/safety-dance#1

Closed

Shnatsel mentioned this pull request Sep 1, 2019

New lint: dangerous use of vec.set_len() rust-lang/rust-clippy#4483

Closed

This was referenced Sep 3, 2019

Document missing safe abstractions rust-secure-code/safety-dance#34

Open

RFC ideas rust-secure-code/wg#10

Open

KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020

WaffleLapkin mentioned this pull request Nov 13, 2020

add Vec::extend_from_within method under vec_extend_from_within feature gate rust-lang/rust#79015

Merged

Shnatsel closed this Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC for Vec::append_from_within() #2714

RFC for Vec::append_from_within() #2714

Shnatsel commented Jun 21, 2019 •

edited

Loading

SimonSapin commented Jun 21, 2019

Shnatsel commented Jun 21, 2019

Ixrec commented Jun 22, 2019

Shnatsel commented Jun 22, 2019

Ixrec commented Jun 22, 2019

burdges commented Jun 22, 2019 •

edited

Loading

Shnatsel commented Jun 22, 2019 •

edited

Loading

scottmcm commented Jun 23, 2019 •

edited

Loading

Shnatsel commented Jun 29, 2019 •

edited

Loading

XAMPPRocky commented Jul 11, 2019 •

edited

Loading

comex commented Jul 11, 2019

XAMPPRocky commented Jul 12, 2019

alexcrichton commented Sep 3, 2019

Shnatsel commented Sep 3, 2019

Shnatsel commented Sep 3, 2019

alexcrichton commented Sep 4, 2019

Ixrec commented Oct 11, 2019

Shnatsel commented Oct 11, 2019

Shnatsel commented Oct 26, 2019

WaffleLapkin commented Nov 11, 2020 •

edited

Loading

Shnatsel commented Nov 12, 2020

WaffleLapkin commented Feb 2, 2021

RFC for Vec::append_from_within() #2714

RFC for Vec::append_from_within() #2714

Conversation

Shnatsel commented Jun 21, 2019 • edited Loading

SimonSapin commented Jun 21, 2019

Shnatsel commented Jun 21, 2019

Ixrec commented Jun 22, 2019

Shnatsel commented Jun 22, 2019

Ixrec commented Jun 22, 2019

burdges commented Jun 22, 2019 • edited Loading

Shnatsel commented Jun 22, 2019 • edited Loading

scottmcm commented Jun 23, 2019 • edited Loading

Shnatsel commented Jun 29, 2019 • edited Loading

XAMPPRocky commented Jul 11, 2019 • edited Loading

comex commented Jul 11, 2019

XAMPPRocky commented Jul 12, 2019

alexcrichton commented Sep 3, 2019

Shnatsel commented Sep 3, 2019

Shnatsel commented Sep 3, 2019

alexcrichton commented Sep 4, 2019

Ixrec commented Oct 11, 2019

Shnatsel commented Oct 11, 2019

Shnatsel commented Oct 26, 2019

WaffleLapkin commented Nov 11, 2020 • edited Loading

Shnatsel commented Nov 12, 2020

WaffleLapkin commented Feb 2, 2021

Shnatsel commented Jun 21, 2019 •

edited

Loading

burdges commented Jun 22, 2019 •

edited

Loading

Shnatsel commented Jun 22, 2019 •

edited

Loading

scottmcm commented Jun 23, 2019 •

edited

Loading

Shnatsel commented Jun 29, 2019 •

edited

Loading

XAMPPRocky commented Jul 11, 2019 •

edited

Loading

WaffleLapkin commented Nov 11, 2020 •

edited

Loading