Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Improve the Send trait. #458

Merged
merged 4 commits into from
Feb 13, 2015
Merged

Conversation

pythonesque
Copy link
Contributor

Rendered view.

This RFC proposes extending the Send trait in some relatively small but backwards incompatible ways, to allow more safe behaviors to be exposed in Rust's type system. In particular, this involves removing the 'static bound from Send in a way that preserves thread safety.

@Ericson2314
Copy link
Contributor

Hmm, this sounds very nice! What does your fork/spawn signature look like (to ensure the child thread exits before the parent thread exits the lifetime the shared data is defined in)?

@pythonesque
Copy link
Contributor Author

@Ericson2314 It would be superficially similar to @nikomatsakis's existing rayon library, but it would likely use a Send bound instead of a Sync bound (for the reasons outlined in the RFC). There are some subtleties around task failure but they can be resolved with clever Drop trait implementations (I believe Servo does this currently).

@Ericson2314
Copy link
Contributor

Cool, I did not know about that library. So under your proposal, Future itself might as well take a lifetime so your code or Rayon's would not need the transmute?

@pythonesque
Copy link
Contributor Author

I'm not sure any code in the standard library is safe to use as is in a fork join context, but possibly. The RFC is more about making sure Rust the language supports such libraries properly in the future.

@Ericson2314
Copy link
Contributor

Gotcha. Sounds good either way.

@pythonesque
Copy link
Contributor Author

A user in IRC just ran into a situation where he or she wasn't able to box a type with a reference in it as an Error trait object because it requires a Send bound. The type would have been correct under the proposed system.

This is not something I mentioned in the issue, since I think it's a side benefit, but in general I think this provides some evidence that it's useful to be able to specify Send on types that aren't necessarily 'static, even if there's no intention of actually sending them between threads, since it is a bound that some libraries specify in order to be (potentially) threadsafe. Because any actual thread safety violations would still be caught when you tried to move a type into a thread without the appropriate lifetime, I suspect that with this change it would not as critical / useful as it currently is for these APIs to require 'static in addition to Send.


Another important note is that with this definition, ```Send``` will fulfill the proposed role of ```Sync``` in a fork-join concurrency library. At present, to use ```Sync``` in a fork-join library one must make the implicit assumption that if ```T``` is ```Sync```, ```T``` is ```Send```. One might be tempted to codify this by making ```Sync``` a subtype of ```Send```. Unfortunately, this is not always the case, though it should be most of the time. A type can be created with ```&mut``` methods that are not thread safe, but no ```&```-methods that are not thread safe. An example would be a version of ```Rc``` with a ```clone_mut()``` method that took ```&mut self``` and no other ```clone()``` method. It could be thread-safely shared provided that a ```&mut``` reference was not sent to another thread, since then it could only be cloned in its original thread and could not be dropped while shared (hence, it is ```Sync```), but a mutable reference could not, nor could it be moved into another thread (hence, it is not ```Send```). However, because ```&T``` is Send if ```T``` is Sync (per the new definition), adding a ```Send``` bound will guarantee that only shared pointers of this type are moved between threads.

Thirdly, we'd add an ```Own``` trait as specified above. This would be used mostly as a convenience in user code for the current cases where ```Send``` is being used as a proxy for ```Send + 'static```. We would probably still want to use ```Send + 'static``` as an explicit bound in most cases, just in case a user implemented ```Own``` but not ```Send```.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Send + 'static is used as the explicit bound in most cases, where would Own be used? (I.e. is it useful?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Own wasn't part of the original proposal. I added it after Niko suggested it might be useful for, e.g., where Box<Send> is used today, but as I commented earlier today I'm not sure how useful this really is. I don't really see any harm in its existence either way.

@huonw
Copy link
Member

huonw commented Nov 13, 2014

Metapoint: you can use a single backtick for inline code foo which makes reading the raw markdown easier.

@pythonesque
Copy link
Contributor Author

I just played around with rustc a bit and you actually can get &'static mut values in static items:

static FOO: [uint, .. 2] = [0, 0];
static BAR: &'static mut [uint, .. 2] = & FOO;
static BAZ: &'static mut [uint, .. 2] = & FOO;
println!("{} == {}", BAR as *mut _, BAZ as *mut _);

I think this is actually a bug, since we have aliased &muts here (seemingly, anyway). On the other hand I was not able to figure out a way to actually mutate them or pass them to something else as mutable references. If it turns out that this is safe, it still doesn't really affect the correctness of the statement that safe types currently satisfying Send will satisfy Send + 'static under the new scheme (all three of these statics would) but that this is allowed at all surprises me.

@huonw
Copy link
Member

huonw commented Nov 14, 2014

I filed rust-lang/rust#18939 about part of that bug.

guaranteed to be threadsafe by Rust's type system.

Another important note is that with this definition, `Send` will fulfill the proposed role of `Sync` in a fork-join concurrency library. At present, to use `Sync` in a fork-join library one must make the implicit assumption that if `T` is `Sync`, `T` is `Send`. One might be tempted to codify this by making `Sync` a subtype of `Send`. Unfortunately, this is not always the case, though it should be most of the time. A type can be created with `&mut` methods that are not thread safe, but no `&`-methods that are not thread safe. An example would be a version of `Rc` with a `clone_mut()` method that took `&mut self` and no other `clone()` method. It could be thread-safely shared provided that a `&mut` reference was not sent to another thread, since then it could only be cloned in its original thread and could not be dropped while shared (hence, it is `Sync`), but a mutable reference could not, nor could it be moved into another thread (hence, it is not `Send`). However, because `&T` is Send if `T` is Sync (per the new definition), adding a `Send` bound will guarantee that only shared pointers of this type are moved between threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've now managed to parse this correctly, but perhaps it could be reworded for greater clarity. In particular I was having trouble with "but a mutable reference could not" - could not what? (Be thread-safely shared.)

And "It could be thread-safely shared provided that a &mut reference was not sent to another thread" - but what if one was? (The answer is apparently that it couldn't have been, because &mut T is not Send. Again, this was stated, but it took me a while to understand.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify what's being asked here--do you want clarification on what this means from me, or just a change of wording in the RFC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding (in parentheses) is correct, then just a change of wording.

@glaebhoerl
Copy link
Contributor

This seems like a worthwhile generalization as far as I understand it, but as noted above I don't understand it all the way to the finer points.

Could you try to give the one-sentence descriptions of what Send and Sync would mean under this plan, and in particular what the difference is? (It seems like in many places which one is appropriate comes down to nuance.)


Another important note is that with this definition, `Send` will fulfill the proposed role of `Sync` in a fork-join concurrency library. At present, to use `Sync` in a fork-join library one must make the implicit assumption that if `T` is `Sync`, `T` is `Send`. One might be tempted to codify this by making `Sync` a subtype of `Send`. Unfortunately, this is not always the case, though it should be most of the time. A type can be created with `&mut` methods that are not thread safe, but no `&`-methods that are not thread safe. An example would be a version of `Rc` with a `clone_mut()` method that took `&mut self` and no other `clone()` method. It could be thread-safely shared provided that a `&mut` reference was not sent to another thread, since then it could only be cloned in its original thread and could not be dropped while shared (hence, it is `Sync`), but a mutable reference could not, nor could it be moved into another thread (hence, it is not `Send`). However, because `&T` is Send if `T` is Sync (per the new definition), adding a `Send` bound will guarantee that only shared pointers of this type are moved between threads.

Thirdly, we'd add an `Own` trait as specified above. This would be used mostly as a convenience in user code for the current cases where `Send` is being used as a proxy for `Send + 'static`. We would probably still want to use `Send + 'static` as an explicit bound in most cases, just in case a user implemented `Own` but not `Send`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure whether it's possible or not--I might have meant to switch this (that is, it is possible that a user could have Send implemented, but not Own, not the other way around, in which case we'd still want to allow it to be sent across threads).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Own is just defined literally as a synonym for Send + 'static, i.e.

trait Own: Send + 'static { }
impl<T: Send + 'static> Own for T { }

then I don't think either issue arises. (This is basically the definition from the RFC, but I'm not sure why we'd need unsafe?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right and we don't need unsafe. That definition went through several iterations. I'll adjust the RFC. However, I still don't think we want to use Own as an explicit bound, because it is still possible that someone could deliberately write a negative impl for Own (I'm not sure why you would though, so maybe that's not really a case worth considering).

On second thoughts, we actually aren't using a default trait here so we can't write a negative impl. So I think you're right and this is just a nonissue. Maybe then Own would have some convenience in library types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I would hope.

@pythonesque
Copy link
Contributor Author

Could you try to give the one-sentence descriptions of what Send and Sync would mean under this plan, and in particular what the difference is? (It seems like in many places which one is appropriate comes down to nuance.)

The differences are precisely stated in the introductory section, but essentially the meanings would be identical to what they are now--Send means a type T can be safely moved into a new thread, and Sync means &T can be safely moved into a new thread. The difference comes down to which types are automatically derived to be Send for threads with non-'static lifetimes (since the 'static bound has been removed) and the explanations for that take more than one line.

I was trying to look for a simple way to define Send (since I agree it's intuitively confusing, both now and with the new changes), and thought I had found one that was nicely analogous with Sync (T is Send iff &mut T can be safely moved into a new thread) but arielb1 pointed out some corner cases that would not satisfy this definition, which is what the "unusual types" section is about. That said, for most purposes you can think about it that way.

@glaebhoerl
Copy link
Contributor

I think I get it now, thanks.

fn xyz_play(s: *mut Xyz);
}

pub struct Xyz(marker::NoCopy);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this type, as written, is actually unsafe, as mem::swap can be applied to two instances of it. However, if there were some sort of marker::NoSized, this example should be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just went through the original logs and this is still safe in this particular case since all the markers are non sized (I think).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct Xyz is zero-sized. You can swap around instances of it as much as you want – it won't do anything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* can still be used to get a Xyz rvalue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some sense, but in safe code it can't be copied, created (except via xyz_create()), moved out of, or dropped, so you can't actually do anything with it.

@Manishearth
Copy link
Member

The proposed RcMut would be quite useful for sending pinned DOM objects across tasks (only to be reused in the originating task)

@glaebhoerl
Copy link
Contributor

Hmm, I wonder if Send roughly corresponds to Qt's notion of "reentrant", and Sync to its notion of "thread safe"?

(I'm sure other libraries like std and boost have their own definitions, but Qt is what I'm more familiar with.)

@arielb1
Copy link
Contributor

arielb1 commented Nov 30, 2014

@glaebhoerl

Send doesn't directly correspond to reëntrancy, because reëntrancy traditionally applies on a per-context basis while Send works on a per-struct basis (e.g., it is unwise to concurrently access 2 XML elements that belong the same document, so they aren't Send, but it is completely fine to concurrently access elements from 2 different documents, so they are reëntrant).

However, Sync does seem to match to thread-safety well.

@glaebhoerl
Copy link
Contributor

Oh, right. Is there anything in safe Rust that's not reentrant, then?

@sfackler
Copy link
Member

You could create a non-reentrant function by using a static AtomicInt internally or something, but it'd be pretty silly.

```rust
impl<'a, T> !Send for &'a T {}

unsafe impl<'a, T> Send for &'a T where T: Sync + 'a {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this change also add corresponding impls for &mut T ? (Seeing how you can't send non-'static &mut T today either)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way default trait implementations work, Send will automatically be implemented for &mut T unless it explicitly opts out, so I did not include that in the pseudocode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so &mut T would just always implement Send, as opposed to today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, subject to the rules outlined in the RFC (the default implementation only exists if T is already Send, since that's how default traits are proposed to work).

@nikomatsakis
Copy link
Contributor

So I gave some thought to this RFC. As you know I am in agreement that
factoring the lifetime bound out of Send is a cleaner approach and
more forward looking to the possibility of time-bound threads.

I have some comments. I'll try to separate them. This comment is
focused on the definition of Send and Sync and how they are
related. I think that we can still have Sync imply Send
without losing any generality. For example, the RcMut example
of having a ref-count that is only manipulated on the main task
can be addressed using a newtype wrapper (in fact, we can wrap
normal Rc values without any need for the RcMut infrastructure):

pub struct RcRef<'x,T> {
    r: &'x Rc<T> // private
}

// RcRef is safe to send/share and so forth
// so long as T is synchronous. This is because
// its API does not permit any action that would
// affect the ref-count of the inner `Rc`.
impl<'x,T:Sync> Send for RcRef<'x,T> { }
impl<'x,T:Sync> Sync for RcRef<'x,T> { }

// We can even copy/clone the `RcRef`,
// which still will not touch the reference count.
impl<'x,T> Copy for RcRef<'x,T> { }
impl<'x,T> Clone for RcRef<'x,T> {
    fn clone(&self) -> RcRef<'x,T> { *self }
}

impl Deref<T> for RcRef<'x,T> {
    fn deref(&self) -> &T {
        &*self.r
    }
}

So basically I think that while the RcMut type might indeed be a
case where Sync does not imply Send, the pattern you were
describing can still be expressed and hence we lose nothing. (Using a
newtype is arguably cleaner anyhow; it's a pretty subtle invariant
we're talking about.) I can't quite figure out what the Xyz example
is supposed to be saying, but from talking to @arielb1 on IRC it seems
like it doesn't create any problems for having Sync imply Send.

In talking things over with @aturon and @alexcrichton, we came to the
conclusion that we would like to define Send and Sync as follows:

Send : Immutable OR Synchronized OR Unique
Sync : Immutable OR Synchronized

Basically, you can implement Send if your type T maintains any one
of these three properties:

  • Immutable: memory reachable from T has no interior mutability
    (e.g., int, Vec<int>).
  • Synchronized: memory reachable from T permits only mutations
    synchronized with respect to all reads (e.g., AtomicInt,
    Arc<U>).
  • Unique: memory reachable from T is not reachable from elsewhere
    (e.g., Cell<int> but not Arc<T> or Rc<T>)

In all cases these are interior properties. So for example a
Vec<int> is immutable because the vector itself is not mutable
unless placed in a mutable slot. Similarly, a Cell<int> is unique
because to alias it you must put it into an aliasable type (like
&Cell<int> or Rc<int>).

Note that different bits of reachable memory can be justified under
different properties. So if I have a Arc<Vec<int>>, the Arc itself
is synchronized, but the Vec<int> is Sync because it is
immutable (and unique, actually).

It's nice that when you define a type you can decide whether it is
sendable and then, for some subset of those types, they are also
sharable between threads (Sync).

These definitions seem like things that data structure authors will be
able to reason about. They also tie in with Rust's general story about
aliasing and mutability being the root of all danger; when you
transition over into parallelism, you also have synchronization to
think about (in a sequential setting, of course, synchronization is
always maintained).

Anyway, what do you think of that?

@nikomatsakis
Copy link
Contributor

I realize now that my definitions were slightly off in that they fail to account for the API. That is, memory which is reachable but not accessible through the API doesn't count.

@nikomatsakis
Copy link
Contributor

@arielb1 yes. Also, I was thinking more and I also realized that while what I wrote is true regarding the ability to express a pattern like Rc, it is also true that the original RcMut is more composable. That is, structures built on RcMut can be used both from the main thread and shared amongst threads, whereas my newtype wrapper requires creating distinct "views" for sharing (presuming that the reference counting thing you want to share is embedded within something else).

@nikomatsakis
Copy link
Contributor

(That said, I think that "values reachable from" is still the right basis for a definition, but as I wrote above it really has to take into account the effects of operations that can be performed on those values.)

@nikomatsakis
Copy link
Contributor

But really what I'm most concerned with is whether Sync => Send is a good idea or not. I imagine we can have several definitions for Send/Sync ranging from less to more precise.

@nikomatsakis
Copy link
Contributor

There was one other thing that @aturon and @alexcrichton and I were talking about. We were thinking that in the standard library, at least, the Own trait would see very little (or no) use. Spawning a thread would require a F type bounded by FnOnce()+Send+'static, however most of the other types (ports, ARCs, queues, etc) would just require Send or Sync.

However one place where Own would be really useful is in object types. It'd be a drag to have to write something like Box<Writer+Send+'static> in place of Box<Writer+Send>. Box<Writer+Own> is ok but another possible solution here would be to adjust the defaults so that Box<Trait> defaults to 'static when no other bound can be derived. (Similarly, &'a Trait would default to &'a Trait+'a.) This really requires an RFC of its own. I've been going through scenarios for such defaults for a while now and I think I've reached the conclusion that this is the least bad. Regardless of the details, if we did adopt such a system, we could avoid having an Own trait -- this also avoids the need of bikeshedding the name.

Anyway, the question then is, beyond object types, what other places are there that might want Own vs Send?

@pythonesque
Copy link
Contributor Author

As you said, RcMut is easier to embed in another structure than Rc plus a wrapper. My current preference would be to leave that option available, even though I realize that complicates the rather lovely definition you gave above (and even though I was originally the one who wanted Sync => Send). The good news is that I think the number of somewhat useful types like RcMut is probably really small, so I doubt it will come up much in practice. That being said, I'm capable of being convinced that a wrapper is the way to go.

As far as Own goes, I honestly can't think of many uses for it outside of a few places that spawn threads in the standard library and object types. That was why I originally didn't include such a trait and added it after you mentioned the object types case. Perhaps someone else can think of some examples?

@pythonesque
Copy link
Contributor Author

Also, as a side note: it seems to me that the "immutable" in "immutable OR synchronized" is really redundant, because if something is immutable clearly all mutation of it satisfies any property you might choose. But I may be missing something there.

Edit: Oh right, "immutable" in this context doesn't precisely mean "immutable", but "lacks interior mutability." So never mind :)

@Ericson2314
Copy link
Contributor

I mentioned Futures earlier in this thread. Upon further reflection I think that just about all forking, spawning, and data parallelism can be done in terms of them.

impl<'a, A, F> Future<'a, A> where F: FnOnce(): Send + 'a -> A {
    /// Dropping the future thus constitutes a join
    fn spawn(blk: F) -> Future<'a, A>;

    ...
}

impl Future<'static, ()> {
   /// Drop the future without blocking on the child thread's termination
   fn forget(self);
}

spawn<'static, A> makes Futures work like today, and additionaly A = () plus a call to forget makes (thread) spawn.

@krdln
Copy link
Contributor

krdln commented Dec 14, 2014

@nikomatsakis I am opposing Send → Sync, because I don't see what bound would you give for T in following function:

fn parallel_inplace_map<T: ?, Closure: Fn(&mut T) + Sync>(closure: &Closure, &mut[T]);

I guess T: Sync isn't a good idea, because it would even prevent T from being Vec<int>. That would work for T: Send, but from my understanding it would still forbid eg. Vec<&str> and almost any type containing references. I wish for Send to be "defined" as maximal safe bound that you could use for T in this example.

And speaking of RcRef example, I think that allowing programmer to impl arbitrary bound (Send or Sync) for a type without using unsafe or markers (or any similar construct) is just unsafe and denies the purpose of such bounds.

Edit: Sorry for my confusion.

@arielb1
Copy link
Contributor

arielb1 commented Dec 14, 2014

@krdin

We are talking about Sync → Send, not Send → Sync (which is clearly false, e.g. because of Cell), and the bound you want is probably T: Sync, which Vec<int> and Vec<&str> certainly are (generally, "data structures" – as opposed to "objects" – are Sync and Send when their contents are).

@aturon
Copy link
Member

aturon commented Dec 29, 2014

@pythonesque Btw, I wanted to let you know that I'll be helping "shepherd" this RFC. The core team is extremely overloaded right now trying to get together the alpha (in two weeks), but I'm hoping we can make progress on the backwards-incompatible changes before then. I will be in touch again soon.

@pythonesque
Copy link
Contributor Author

_Edit:_ Don't mind me, pretty sure everything I said in this comment was the product of a brain fart :)

@aturon
Copy link
Member

aturon commented Jan 1, 2015

@pythonesque BTW at this stage it's looking unlikely that we'll be ready to make a change before alpha (Jan 9), but I will avoid stabilizing affected APIs. We'll need to come to a decision soon after alpha.

@nikomatsakis
Copy link
Contributor

@pythonesque can you adjust the RFC to remove mention of the Own trait? I think #599 offers a better alternative there.

@aturon
Copy link
Member

aturon commented Feb 13, 2015

I'm happy to report that this RFC has been merged a long last! The response has long been positive, and many have been looking forward to this change. Landing the RFC was previously blocked on default lifetime bounds, but the relevant RFC has been accepted and the change should be merged soon.

Implementing this change should be relatively straightforward, but the audit will take some time.

Tracking issue

huonw added a commit to huonw/rust that referenced this pull request Feb 18, 2015
Previously Send was defined as `trait Send: 'static {}`. As detailed in
rust-lang/rfcs#458, the `'static` bound is not
actually necessary for safety, we can use lifetimes to enforce that more
flexibly.

`unsafe` code that was previously relying on `Send` to insert a
`'static` bound now may allow incorrect patterns, and so should be
audited (a quick way to ensure safety immediately and postpone the audit
is to add an explicit `'static` bound to any uses of the `Send` type).

cc rust-lang#22251.
@Centril Centril added A-typesystem Type system related proposals & ideas A-sync Synchronization related proposals & ideas labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sync Synchronization related proposals & ideas A-typesystem Type system related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.