Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: From<&[T]> for Rc<[T]> + From<&str> for Rc<str> #1845

Merged
merged 10 commits into from
Mar 13, 2017
Merged

RFC: From<&[T]> for Rc<[T]> + From<&str> for Rc<str> #1845

merged 10 commits into from
Mar 13, 2017

Conversation

Centril
Copy link
Contributor

@Centril Centril commented Jan 5, 2017

This is the RFC for the issue #1844.

Rendered

A special thanks to @ubsan for helping me write my first RFC.

@nrc nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jan 5, 2017
Centril added a commit to Centril/rust_strong_rc that referenced this pull request Jan 5, 2017
@strega-nil
Copy link

I don't like that From is used, and I'm of the opinion that rc_from_slice is overgeneralized.

IMO, the API should look something like:

impl<T: Clone> Rc<[T]> {
  fn from_slice(s: &[T]) -> Self;
}

impl Rc<str> {
  fn from_str(s: &str) -> Self;
}

@Centril
Copy link
Contributor Author

Centril commented Jan 6, 2017

@ubsan

(Maybe you didn't intend to discuss this, if so, ignore this paragraph...) The choice to de-generalize slice_to_rc and expand to use sites or not is not that important to me - that part can even be left unspecified as a decision for the person(s) involved in making the PR to the rust repo. However - I do think it is better this way since it clearly shows the differences between the 3 use sites in an overviewable way.

Regarding T: Clone and T: Copy, I think that both should be provided with a specialization for the latter since it's consistent with both clone_from_slice and copy_from_slice existing. In the latter case, it should always boil down to one memcpy even for mem::size_of::<T>() > mem::size_of::<u8>() by simply multiplying the length. However, if you can show that clone_from_slice will also do this, then specialization is needless.

Specialization immediately requires us to use From style API. But other than that, having one method, .into() to remember instead of 2 (or 3) makes you have to remember more special forms, having a mental cost. Furthermore, while most things in Rc use associated functions - they are mostly cases of already having an Rc and is done because: "This avoids conflicts with methods of the inner type T.". In using From, there's no risk of conflict.

Also, using From is consistent with:

  • impl<'a> From<&'a str> for String
  • impl<'a> From<&'a str> for Cow<'a, str>
  • impl<'a, T> From<&'a [T]> for Vec<T> where T: Clone
  • impl<'a, T: Clone> From<&'a [T]> for Vec<T>
  • a host of other conversions.

@strega-nil
Copy link

The examples of uses of From are a good argument. I revise what I'm saying then. I sould still really like there to be non-trait static methods, but with From in addition. slice_to_rc still feels like the wrong API to standardize, however, at least for now. Obviously there should be specialization, I was just speaking to user-facing API.

There isn't a risk of conflict with associated functions (except with user-implemented traits, I guess).

@Centril
Copy link
Contributor Author

Centril commented Jan 6, 2017

@ubsan

Adding non-static trait in addition would be TMTOWTDI but I don't mind it - to each their own I guess.

If you were referring to unsafe fn slice_to_rc<'a, T, U, W, C>(src: &'a [T], cast: C, write_elems: W) -> Rc<U>, then that was/is simply a way to be DRY. It's just an implementation detail, and not user facing, so that can always be changed for better performance, more/less safety, or less generic code, etc.

How do you feel about doing this for Arc ?

And while we're at it: should we do this for Box as well for the sake of uniformity ?

```rust

#[inline(always)]
unsafe fn slice_to_rc<'a, T, U, W, C>(src: &'a [T], cast: C, write_elems: W)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ubsan implementation detail, but: de-generify this / not ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand the point of slice_to_rc; ignore that thing. That's fine.


Should the implementations use [`AsRef`][AsRef] or not? Might this be a case of making things a bit too ergonomic? This RFC currently leans towards not using it.

Should these trait implementations of [`From`][From] be added as functions on [`&str`][str] like `.into_rc_str()` and on [`&[T]`][slice] like `.into_rc_slice()`?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ubsan Add associated functions on Rc to the list of unresolved questions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. Thanks!

difference being that this version uses the [`From`][From] trait, and is
converted into a shared smart pointer instead.

Because of the similarities between the layout of [`Rc`][Rc] and [`Arc`][Arc], almost identical implementations could be added for `From<&[T]> for Arc<[T]>` and `From<&str> for Arc<str>`. However, in this case the synchronization overhead while doing `.clone()` is probably greater than the overhead of doing `Arc<Box<str>>`. But once the clones have been made, `Arc<str>` would probably be cheaper to dereference due to locality. While the motivating use cases for `Arc<str>` are probably fewer, one could perhaps use it for multi threaded interning with a type such as:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ubsan Include Arc (and Box?) or not ?

@strega-nil
Copy link

@Centril ah, I didn't get that slice_to_rc was an implementation detail. It may have been because originally I read it on my phone.

I'm personally of the opinion that including Arc is a good idea. Box is less important because it's easily implementable in non-std safe code.

@Centril
Copy link
Contributor Author

Centril commented Jan 7, 2017

@ubsan

Added stuff about Arc as well as associated functions.

What do we name the feature now if Arc is included?

@Centril
Copy link
Contributor Author

Centril commented Jan 10, 2017

Renamed feature to "shared_from_slice"

Also, libs team and @sfackler, thoughts on this RFC?

@abonander
Copy link

abonander commented Jan 10, 2017

I would personally prefer to see conversions for Vec<T> -> Rc<[T]> and String -> Rc<str>, as they are more generally useful than copying slices into a new Rc allocation. No requirement of Copy or Clone.

@Centril
Copy link
Contributor Author

Centril commented Jan 10, 2017

@abonander With the current layout of Rc, specifically the fact that:

struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak:   Cell<usize>,
    value:  T
}

is allocated together on the heap and not separately, one can't reuse the allocation of a String (= Vec<u8>), or Vec<T> for Rc<str> or Rc<[T]>. Therefore, you must allocate memory for the Rc<[T]> or Rc<str>. This could be changed if Rc used separate allocations, but that would add overhead to many other use cases and reduce data locality.

Given this, &[T] -> Rc<[T]> and &str -> Rc<str> are strictly more general, as you can get Vec<T> -> Rc<[T]> and String -> Rc<str> with .as_ref(), .borrow(), .deref() on Vec<T> and String. This also supports other owned forms of contiguous collections such as VecDeque<T>, SmallVec<T>, Cow, Tendril, ... In general, there should be a move towards making libstd less special.

@abonander
Copy link

abonander commented Jan 10, 2017

@Centril you can move the values from the Vec into a new allocation, that doesn't violate ownership rules.

By all means, have both kinds of conversions, but I don't think it's "more general" to restrict the element type to Clone or Copy. If you really wanted to be general, you could have a generic conversion for IntoIterator<Item = T> or IntoIterator<Item = [str, char]>, respectively, though you'd have to have a strategy for recovering from panics.

@seanmonstar
Copy link
Contributor

The point @abonander is making is that &[T] -> Rc<[T]> requires T: Clone, whereas Vec<T> -> Rc<[T]> does not.

@Centril
Copy link
Contributor Author

Centril commented Jan 10, 2017

@abonander @seanmonstar ah - now I get it =) In case I didn't get it fully: how would an implementation of Vec<T> -> Rc<[T]> look like?

Would IntoIterator<Item = T> -> Rc<[T]> work, or must we not know the exact .len() in order to allocate enough memory for the Rc<[T]> ?

@Centril
Copy link
Contributor Author

Centril commented Jan 10, 2017

I guess ExactSizeIterator could work ?

@eddyb
Copy link
Member

eddyb commented Jan 10, 2017

To repeat what I've said in the past: with custom allocators you could define an allocator wrapper, say ForRc (e.g. usage with Vec: Vec<T, ForRc<A>>) that always keep enough space before the data for the two usize values of Rc, allowing conversion to Rc<T, A>.
The advantage here is that now String<ForRc<A>> would work, and a bunch of other things, for free.
It also lets you use Vec's (optimized) implementation of FromIterator even in non-exact cases.

@abonander
Copy link

abonander commented Jan 10, 2017

@Centril

Avoiding the genericity of your suggested solution for simplicity:

// Technically this could even take `&mut Vec<T>`, since it doesn't need to consume the vector, just drain elements from it
unsafe fn vec_to_rc<T>(mut src: Vec<T>) -> Rc<[T]>{
    // Compute space to allocate for `RcBox<U>`.
    let susize = mem::size_of::<usize>();
    let num_elems = src.len();
    let aligned_len = 2 + (mem::size_of::<T>() * num_elems + susize - 1) / susize;

    // Allocate enough space for `RcBox<T>`.
    let vec = RawVec::<usize>::with_capacity(aligned_len);
    let ptr = vec.ptr();
    forget(vec);

    // Initialize fields of `RcBox<T>`.
    ptr::write(ptr,           1);         // strong: Cell::new(1)
    ptr::write(ptr.offset(1), 1);         // weak:   Cell::new(1)
    
    // This prevents `src` from trying to invoke the drop glue on the elements
    src.set_len(0);

    ptr::copy_nonoverlapping(src.as_ptr(), ptr.offset(2) as *mut T, num_elems);

    // Combine the allocation address and the slice length into a
    // fat pointer to `RcBox`.
    let rcbox_ptr = slice::from_raw_parts_mut(ptr, num_elems) 
                        // `as *mut [T]` may be redundant but removing it 
                        // triggers a timeout on the playpen
                        as *mut [usize] as *mut [T] as *mut RcBox<[T]>;

    debug_assert_eq!(aligned_len * susize, mem::size_of_val(&*rcbox_ptr));
    Rc { ptr: Shared::new(rcbox_ptr) }
}

In your suggested solution, using [T]::clone_from_slice() isn't a good idea because it assumes the destination slice contains valid elements which can be Droped before overwriting them, which isn't true in this case. You'd have to iterate manually, cloning elements onto the stack and then ptr::write()ing them into the slice.

As for coming from an iterator, you could round-drip through Vec<T> for the generic version, and then specialize for ExactSizeIterator, though you'd have to handle panics during iteration, naively by just letting the already written elements leak or ideally by having a panic guard to keep track of your progress and drop the elements that have already been written. You also shouldn't forget your RawVec until after that point so it can deallocate itself as well.

@Centril
Copy link
Contributor Author

Centril commented Jan 10, 2017

@eddyb that'd be slick! Tho, I take it that this would require an overhaul of stuff?

Would this change anything for &str -> Rc<str>, &[T], T: Copy/Clone -> Rc<[T]> ?

@abonander regarding [T]::clone_from_slice(), are you only referring to the Vec<T> -> Rc<[T]> case, or is &[T: Clone] -> Rc<[T]> with the suggested implementation problematic as well?

@abonander
Copy link

abonander commented Jan 10, 2017

@Centril

&[T: Clone] -> Rc<[T]> is problematic if you use [T]::clone_from_slice() because the destination slice must contain valid elements already. It essentially boils down to this:

for i in 0 .. dest.len() {
    dest[i] = src[i].clone();
}

And overwriting by assignment implicitly drops the existing value.

Vec<T> -> Rc<T> doesn't require Clone at all because it copies the elements byte-for-byte from the vector and then sets its length to 0, effectively making it think it's empty so it deallocates without attempting to drop its elements.

@eddyb
Copy link
Member

eddyb commented Jan 10, 2017

@Centril It'd let them be .to_string() and .to_vec() (followed by a noop conversion to Rc).

@Centril
Copy link
Contributor Author

Centril commented Jan 11, 2017

@abonander Ah, right =)

Anyways: I'm certainly amenable to adding the API:s you proposed in this RFC. Could you please specify the details so that I can add them?

I'll take a look at those tomorrow, after my exam =)

@eddyb that's neat - tho... could that be done as a separate RFC since it is probably a much larger and more-into-the-future change?

@abonander
Copy link

@Centril Since copying is necessary anyway, having just &str -> Rc<str> is probably fine for strings, and then Vec<T> -> Rc<[T]> and &[T: Clone] -> Rc<[T]>.

And then in a future RFC, maybe have impl<T> FromIterator<T> for Rc<[T]> and suggest specializing for ExactSizeIterator.

@ghost
Copy link

ghost commented Jan 11, 2017

Something I've been wondering for things like this: Is it ok to call {size|align}_of_val on a null pointer?

Then you could just use the easy way (which would also negate the need for the assertion at the end I think):

fn from_slice(value: &[T]) -> Rc<[T]> {
    let fake: *mut RcBox<[T]> = mem::transmute([0, value.len()]);
    let ptr = allocate(size_of_val(&*fake), align_of_val(&*fake));
    // ... etc

I also don't really understand why #[repr(C)] would be needed. Couldn't you cast the pointer earlier and set the values there by dereferencing? No knowledge of the field layout would be required for that I don't think

let rcbox_ptr: *mut RcBox<[T]> = mem::transmute([ptr as usize, value.len()]);
(*rcbox_ptr).strong = Cell::new(1);
(*rcbox_ptr).weak = Cell::new(1);
// ... etc

@abonander
Copy link

I agree, #[repr(C)] shouldn't be needed; in fact you can just call .set():

(*rcbox_ptr).strong.set(1);
(*rcbox_ptr).weak.set(1);

I'm more ambivalent about calling allocate() directly.

@alexcrichton
Copy link
Member

Thanks for the updates @Centril!

@alexcrichton alexcrichton self-assigned this Feb 28, 2017
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

1 similar comment
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Feb 28, 2017
@malbarbo
Copy link

malbarbo commented Mar 8, 2017

This will make easy to implement Rc::make_mut for unsized types. @Centril Please consider this change if you make the PR implementing this RFC.

@Centril
Copy link
Contributor Author

Centril commented Mar 8, 2017

@malbarbo Could you please clarify the signature of the function you want to add and how it would behave?

@malbarbo
Copy link

malbarbo commented Mar 8, 2017

@Centril I'm sorry for being lazy. The function Rc::make_mut already exists, but it is only defined for sized types (it is not possible to call Rc::make_mut(&mut x) for x: Rc<[T]> for example)

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 10, 2017

The final comment period is now complete.

@Centril
Copy link
Contributor Author

Centril commented Mar 10, 2017

@alexcrichton Even tho the FCP is complete, I think @malbarbo raises something I (and you?) have to a address but haven't had time to do so...

I'll take a look at it today =)

@alexcrichton
Copy link
Member

@Centril sure yeah if you want to write something up in the RFC I'll merge right after.

My guess is that unfortunately we can't as [T] doesn't implement Clone, but we can likely explore further strategies in the future. (e.g. it's ok to be an "unresolved question") for now

@Centril
Copy link
Contributor Author

Centril commented Mar 10, 2017

@alexcrichton This might be wrong, but, ..., what if we do:

impl<T> Rc<[T]> where T: Clone {
    fn make_mut_slice(this: &mut Rc<[T]>) -> &mut [T]
}

This should be safe as long as the referent that got the &mut [T] is not able to deallocate parts of the slice...

@alexcrichton
Copy link
Member

Yeah I think that'd work, but we probably want to save that for a future PR perhaps?

@Centril
Copy link
Contributor Author

Centril commented Mar 10, 2017

@alexcrichton Yeah, seems reasonable =) Should I add that to the list of unresolved questions?

@alexcrichton
Copy link
Member

Yeah sounds good!

@Centril
Copy link
Contributor Author

Centril commented Mar 10, 2017

@alexcrichton Done... Ready to merge I guess =)

@alexcrichton
Copy link
Member

Thanks @Centril!

Tracking issue: rust-lang/rust#40475

@Centril Centril added A-slice Slice related proposals & ideas A-dst Proposals re. DSTs A-collections Proposals about collection APIs labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Proposals about collection APIs A-dst Proposals re. DSTs A-slice Slice related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants