Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Raw Reform #365

Closed
wants to merge 7 commits into from
Closed

RFC: Raw Reform #365

wants to merge 7 commits into from

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Oct 7, 2014

  • Introduce RawSlice and RawMutSlice extension traits to be implemented on
    *const [T] and *mut [T] that provide parts of the slice and ptr API to better bridge the
    gap between the two.
  • Provide unsafe versions of the slicing and indexing operators on raw slices to provide ergonomic
    unchecked slice manipulation.
  • Provide unsafe versions of the addition, subtraction, and indexing operators on raw ptrs to
    provide more ergonomic ptr manipulation.
  • Deprecate some of the free functions in ptr, and duplicate the rest as convenience methods on
    the RawPtr extension traits.
  • Deprecate all unsafe methods on slices and slice::raw in favour of raw slices.
  • Add conversion methods to the three families of extension traits for conveniently moving between
    them.

Rendered

@tbu-
Copy link
Contributor

tbu- commented Oct 7, 2014

Given x: &[u8], y: *[u8], shouldn't x as *const [u8] and &*y work already?

@aturon
Copy link
Member

aturon commented Oct 7, 2014

@tbu-

Given x: &[u8], y: *[u8], shouldn't x as *const [u8] and &*y work already?

Yes, though at least for the first one, x.as_raw() is definitely nicer for method chaining.

@tbu-
Copy link
Contributor

tbu- commented Oct 7, 2014

Do we have this for normal raw pointers?

(I. e. x: &u8, x.as_raw() or similar.)

@arthurprs
Copy link

Unsafe addition, subtraction, and indexing of *const T and *mut T should also be provided with ints rather than uints. Unsafe addition and subtraction should be equivalent to calling offset on the ptr today. Indexing should have the same behaviour as indexing into a raw slice.

Doesn't this requires bounds checking? Thus effectively defeating the raw access advantage?

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

@glaebhoerl Good catch! I think I caught them all.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

@arthurprs Perhaps I miscommunicated the intended behaviour, why do you expect bounds checking is required? Perhaps I didn't sufficiently clarify that raw slices are the *const [T]'s, which are unchecked.

@arthurprs
Copy link

@gankro nevermind, when I read the uint -> int change I had python indexes in mind (so -1 is the last element, -2 the before last ...)

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

I've started implementing these changes in a branch as a sanity check, and to demonstrate the simplicity. One thing I've run into:

Where previously you could call

ptr::copy_nonoverlapping_memory(&mut foo, &bar, 1)

Now you would need to do:

(&mut foo as *mut T).copy_nonoverlapping(&bar, 1)

Hmm... perhaps some of these would be better of as free fn's in mem, rather than methods on RawPtr.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

Ack, and ptr::swap is actually subtly different, it allows the ptrs to point to the same location to permit some tricks to avoid branching. The free fn moving in ptr needs to be completely rethought.

@reem
Copy link

reem commented Oct 8, 2014

Generally I think it's a bad idea for pointers to have inherent methods if they can avoid it, and that those should usually be free functions in ptr or equivalent.

@mahkoh
Copy link
Contributor

mahkoh commented Oct 8, 2014

👍

More concise unsafe code means fewer bugs.

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2014

insertion_sort could actually be done more cleanly as:

/// Rotates a slice one element to the right,
/// moving the last element to the first one and all other elements one place
/// forward. (i.e., [0,1,2,3] -> [3,0,1,2])
fn rotate_right<T>(s: &mut [T]) {
    let len = s.len();
    let s = s.as_raw_mut();
    if len == 0 { return; }

    unsafe {
        let first = s.read(len-1);
        s[1..].copy(s[..len-1]);
        s.write(0, first);
    }
}

fn insertion_sort<T>(v: &mut [T], compare: |&T, &T| -> Ordering) {
    let len = v.len();

    // 1 <= i < len;
    for i in range(1, len) {
        // j satisfies: 0 <= j <= i;
        let mut j = i;

        // `i` is in bounds.
        let read_ptr = unsafe { v.unsafe_get(i) };

        // find where to insert, we need to do strict <,
        // rather than <=, to maintain stability.
        // 0 <= j - 1 < len, so j - 1 is in bounds.
        while j > 0 && compare(read_ptr,
                               unsafe { v.unsafe_get(j - 1) }) == Less {
            j -= 1;
        }

        // `i` and `j` are in bounds, so [j, i+1) = [j, i] is valid.
        rotate_right(unsafe { v.unsafe_slice_mut(j, i+1) });
    }
}

– we should support this style (although unsafe { v.unsafe_XYZ(..) } is ugly).

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

@reem When I drafted this up I was trying to apply the lesson's I'd learned working on collections stuff. Namely, that it's generally easier and safer to work and reason with objects in "modes". With collections we have the "iterator" mode, and with maps we now have "entry" mode.

Unfortunately this translates poorly to references and rawptrs because we don't (and probably shouldn't ever) have .as_raw() on references to get raw ptrs, so you either have to do an awkward explicit cast to "as *const _" which then needs to be wrapped in parens to chain off of, or rely on automatic coercions, which of course don't (and probably shouldn't ever) work on method calls.

I still think it would be nicer if when you have a ptr foo, you can call foo.bar() on it rather than passing it in as bar(foo). Chaining is better than nesting IMO. That said, a lot of the time you have a reference, or even value, that you want to transiently convert into a ptr to do something unsafe real-quick.

I'm leaning towards having the methods and the free fns, so that people can just do whatever is most natural. Possibly moving most if not all of the free functions to mem instead, since these really aren't ptr functions, so much as raw memory manipulation functions. ptr::swap is an awkward one, though, since it has subtly different semantics to mem::swap. mem::swap_overlapping 😩 ?

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

@arielb1 I've definitely mulled over the possibility of offering safe rotate_left, and rotate_right methods, as well as possibly some shift_*_and_replace methods on mutable slices before. Where the _and_replace methods would pop the element that would be "rotated" and inserts a new value where it would have been rotated to. This would be useful for handling the wrapping case in e.g. RingBuf. If you're interested, I could help you write up a proposal for such methods if you want. I'm a bit swamped though, and I'm not sure if the win is big enough to justify it

Otherwise, I'm not clear what you're suggesting. Every single one of your slicing operations is unchecked, and consequently riddled with unsafe's. It seems much cleaner at that point to simply say "okay, this is all unsafe" and just use raw slices. It doesn't seem very helpful to say exactly this subexpression is unsafe. But maybe that's just me?

@aturon
Copy link
Member

aturon commented Oct 8, 2014

@gankro

Unfortunately this translates poorly to references and rawptrs because we don't (and probably shouldn't ever) have .as_raw() on references to get raw ptrs, so you either have to do an awkward explicit cast to "as *const _" which then needs to be wrapped in parens to chain off of, or rely on automatic coercions, which of course don't (and probably shouldn't ever) work on method calls.

I think the main reason not to have .as_raw() as a method on references is just that it would shadow any other method with that name on the pointed-to data.

Generally, we freely coerce from &T to *const T -- it's the deref that's unsafe. So I don't see why applying this coercion when calling a method taking *const self on a &T is problematic: assuming that the method actually derefs, it has to be unsafe. But maybe I'm missing something here?

@aturon
Copy link
Member

aturon commented Oct 8, 2014

@gankro Ah, I think I understand now: you're saying that we don't (and shouldn't) automatically make the various methods of RawPtr available on arbitrary references. Definitely agree.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

@arielb1 Also, you can just do this, if you really want:


fn insertion_sort<T>(v: &mut [T], compare: |&T, &T| -> Ordering) {
    let len = v.len();

    // 1 <= i < len;
    for i in range(1, len) {
        // j satisfies: 0 <= j <= i;
        let mut j = i;

        // `i` is in bounds.
        let read_ptr = unsafe { &v.as_raw()[i] };

        // find where to insert, we need to do strict <,
        // rather than <=, to maintain stability.
        // 0 <= j - 1 < len, so j - 1 is in bounds.
        while j > 0 && compare(read_ptr,
                               unsafe { &v.as_raw()[j - 1] }) == Less {
            j -= 1;
        }

        // `i` and `j` are in bounds, so [j, i+1) = [j, i] is valid.
        rotate_right(unsafe { &mut*v.as_raw_mut()[j, i+1] });
    }
}

Which is basically the same code-wise. Just more sigily.

@aturon
Copy link
Member

aturon commented Oct 8, 2014

@gankro

I'm leaning towards having the methods and the free fns, so that people can just do whatever is most natural. Possibly moving most if not all of the free functions to mem instead, since these really aren't ptr functions, so much as raw memory manipulation functions. ptr::swap is an awkward one, though, since it has subtly different semantics to mem::swap. mem::swap_overlapping 😩 ?

Given the above discussion, providing both free functions and methods probably makes sense (and it explains why these weren't just methods in the first place).

I believe the motivation for having the ptr module was that the operations there essentially take raw pointers as their receiver; it's like the slice module in that it provides basic operations for a built-in type. Note that, by contrast, none of the (non-deprecated) functions in mem take raw pointers.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

@aturon

I was discussing this with @kballard yesterday on IRC. They suggested that ptr::replace should also remain as a convenience for not doing mem::replace(&mut *foo, ...), largely for semantic reasons. Something along the lines of "you're working with raw ptrs, not references".

I'm not totally convinced by this argument, but I don't have particularly strong feelings. Removing it would reduce the API surface area with minimal ergonomic loss in my mind. It might also prevent novices from thinking ptr::replace is special, or from not realizing mem::replace exists at all.

Any thoughts?

@aturon
Copy link
Member

aturon commented Oct 8, 2014

@gankro

I was discussing this with @kballard yesterday on IRC. They suggested that ptr::replace should also remain as a convenience for not doing mem::replace(&mut *foo, ...), largely for semantic reasons. Something along the lines of "you're working with raw ptrs, not references".

I'm not totally convinced by this argument, but I don't have particularly strong feelings. Removing it would reduce the API surface area with minimal ergonomic loss in my mind. It might also prevent novices from thinking ptr::replace is special, or from not realizing mem::replace exists at all.

I think I agree with @kballard on this: &mut *foo is a bit of a pain, and while providing replace in ptr does increase the API surface, if is has the same semantics and basically an equivalent signature to mem then it seems like an easy-to-remember convenience.

I think the same is true for swap, and I think it can remain in ptr with its current name. The fact that the mem version doesn't allow overlapping follows directly from its type signature, since mutable refs are unaliasable.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 8, 2014

Free fns are mostly all back in the latest draft. Fleshed out some other stuff too.

@arielb1
Copy link
Contributor

arielb1 commented Oct 9, 2014

@gankro

Wrapping the operations in individual unsafe-blocks is just a style thing – I put unsafe blocks around all places where safety invariants are temporarily being violated, so each occurrence unchecked indexing (which does not violate any invariant) gets placed in its own unsafe block.

The primary point of my example is that it mostly uses unchecked indexing, rather than playing with raw pointers (other than within rotate). Raw pointers lose the aliasing guarantees, which just adds unneeded unsafety. I just noticed your proposal didn't talk about unchecked indexing, which is what we want in this case.

Having official rotations (and "rotate-through-carry"-es) would be quite nice (should I post an RFC?).

@Gankra Gankra changed the title Raw Reform RFC: Raw Reform Oct 9, 2014
@Gankra
Copy link
Contributor Author

Gankra commented Oct 9, 2014

@arielb1: You're totally right. I strongly alluded to the fact that these operations would be unchecked in the earlier sections, but completely forgot to state this in the detailed design. I've now fixed this.

I've also added your lifetime concern as a drawback to the proposal, as it is a very legitimate one!

You can post an RFC, or prototype it out on discuss if you aren't totally comfortable with your current design.

@aturon
Copy link
Member

aturon commented Oct 9, 2014

@gankro This is one reason that I think leaving unsafe_get and unsafe_mut for slices is a good idea: avoiding a bounds check is a very common case, and these allow you to do it without jumping to a raw pointer. (With the new conventions, these will likely become get_unchecked and get_unchecked_mut -- easy to remember variants of the usual get methods.)

@Gankra
Copy link
Contributor Author

Gankra commented Oct 9, 2014

@arielb1 @aturon I've added back unsafe_get and unsafe_mut.

```
trait RawSlice<T> {
/// Gets the length of the rawslice
fn len(self) -> uint;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of putting len() in RawSlice<T>, you should just implement Collection on *const [T]/*mut [T].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Collection is going to survive #235, although that's a bit ambiguous.

@arielb1
Copy link
Contributor

arielb1 commented Oct 9, 2014

@gankro

What about slice_unchecked and slice_mut_unchecked?

@Gankra
Copy link
Contributor Author

Gankra commented Oct 9, 2014

@arielb1 Can you clarify?

@arielb1
Copy link
Contributor

arielb1 commented Oct 10, 2014

@gankro

Just noticed that the unsafe methods on &[T] aren't really part of this RFC. I was talking about slice_unchecked(self: &[T], from: uint, len: uint) -> &[T] and a mut equivalent.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 13, 2014

Egh, just got reminded of str::raw. Might want to add some stuff about raw strs or something? That seems like a much more dubious motivation though.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 23, 2014

Discussed with @pcwalton the viability of adding any lang stuff for new unsafe operators and it looks like that's a total no-go for 1.0; not out of the question for later, though! If this is indeed the case, then we need a migration plan.

I propose adding the desired operators as unsafe named methods marked #[experimental], although that doesn't really work because as I understand it 1.0+ apis need to be 100% stable. 😕

@Gankra
Copy link
Contributor Author

Gankra commented Oct 28, 2014

After some discussion with @aturon we've concluded that the most interesting bits can largely be hacked out in cargo while we wait for Rust to flesh out its operator overloading story. There's some useful ideas in here for stabilization, but we can revisit that with a different RFC.

@Gankra Gankra closed this Oct 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants