Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version of the Extending Safe Mutability proposal. #78

Closed
wants to merge 2 commits into from

Conversation

matthieu-m
Copy link

Here comes an improved version of the discussion started on reddit An alternative take on memory safety (mutability/aliasing).

I went down the rabbit hole, trying to dig up the fundamentals of memory safety at the lowest level available (ie, the C Standard level), hopefully, this work can always be reused later on in the event this RFC does not pan out.

I still feel that segregating mutability from aliasing is a worthy goal in and out of itself. For an extreme example, Concurrent Containers are generally shared and yet you might want to restrict mutation to only some references.

This proposal is thus about achieving safe mutability in the presence of aliasing:

  • with the mutability clearly documented in the type system
  • and opening the way to compiler-verified safely mutable types (and their methods)

This may be seen as adding significant complexity and it certainly adds complexity to the compiler. From a user point of view, however, the segregation makes both mutability and aliasing explicit and thus more easily searchable; with support from the compiler diagnostics (reporting all useful terms) it might actually end up being both more approachable and more flexible.

Finally, the appeal of immutability may be a sufficient reason in itself: it helps drawing all those users to which immutability has been taught as The Silver Bullet(tm) of programming.


*Note: why `mutable` and not `mut`? Just so we can easily identify snippets written in the current system from those written in the proposed system.*

It is trivially provable that mutation through `&exclusive T` or `&exclusive mutable T` is safe, although we disable the former. The borrow-checker current implementation is sufficient to enforce the `exclusive` rule, and therefore we can easily convert today's code:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: "disallow" might be more clear than "disable" here. Also "borrow-check" should probably be "borrow-checker's"

@thestinger
Copy link

I have a much simpler transitive mutability system to propose, although I don't really want the current system to change. The important thing to take into account is that mutability and aliasing are not orthogonal concepts. The useful definition of alias used by C and LLVM refers to a memory dependency. This means two pointers alias if one can observe a write through the other. If the pointers are both read-only, they do not alias. If they're guaranteed to point to immutability memory, they alias no other pointer.

There are 3 kinds of mutability in Rust:

  • external (observed) mutability (&mut)
  • internal (inherent) mutability (Cell)
  • inherited mutability

The mut keyword only refers to inherited mutability and the other two are always permitted. The proposal to introduce a third &uniq reference type simply takes &T and allows external mutability by borrowing the unique ownership of the value. Both would continue forbid inherited mutability and permit internal mutability. The only use case is avoiding some mut markers on locals, but with by-value captures for closures I think this will be so rare that it would be silly to special-case it.

In a system where Rust had no mutability markers for local variables (which is entirely a check of the programmer's intuition), it would only have the &T and &mut T pointer types. This was the alternative proposal by @nikomatsakis, although it was mixed with a renaming of the type. There would be no difference between &uniq T and the current &mut T type without immutable locals, so lets leave the naming issue aside for now.

If you want transitive immutability, then you don't want any changes to aliasing. The system you want is a distinction between three types of mutability:

  • inherited, external and internal mutability
  • external and internal mutability
  • transitive immutability

A fully backwards compatible example is the following:

  • &, x (internal mutability for both and external mutability for variables)
  • &mut, mut x (inherited, internal and external mutability)
  • &imm, imm x (transitive immutability) along with a reintroduction of Freeze

There are no real changes required to the type system or standard libraries to support this. The compiler is already fully aware of inherited (mut), internal (Unsafe<T>) and external mutability (&mut T). The feature would do nothing more than prevent coercion from &T to &imm T or &imm borrows when the type contains Unsafe<T> or &mut T. The Freeze trait would indicate a lack of internal Unsafe<T> and &mut T for use as a bound. I don't really see the value prospect, but that's all you need to extend inherited mutability to transitive immutability.

@thestinger thestinger mentioned this pull request May 17, 2014
@lilyball
Copy link
Contributor

I agree 100% with @thestinger. I don't think the current system should change, but if it should, adding &imm seems to be the only approach that actually does what all these various proposals are trying to do (which is, to provide a way to ensure deep immutability). I'd rather not have it, though, as I believe it's an unnecessary complication.


Let us define a model to talk about how types are laid out in memory.

For interoperability reasons, Rust lays its `struct`s out like C would. We will take the assumption that its `enum`s and arrays are laid out similarly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we make any guarantees about how enums are laid out. There's no FFI compatibility issue there (as our enum types don't exist in C). The only guarantee we make is for C-like enums, where we guarantee that the enum is represented by an integral value with the given discriminant (and the integral type can be controlled with the #[repr] attribute).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be unclear. The point I am trying to make is that much like struct, enum are (probably) laid out without concern for their names but only for their "shape". It might not hold for Option<~T> though, which is special cased (and I don't know if this special casing is done by name or shape).

@lilyball
Copy link
Contributor

You didn't make it entirely clear as to whether you can take a &mutable reference to any type, or just to types of SafelyMutable (you defined SafelyMutable types as being mutable through &mutable references, but didn't say whether the reference itself was legal for non-SafelyMutable types). I assume it has to be legal to take a &mutable reference to any type, as that's the only way that type can mutate any contained SafelyMutable fields.

In any case, your proposal here makes it impossible to implement Rc. Specifically, Rc needs to mutate its contained Cells from the clone() method, but clone() is defined by a trait and must take &self. It cannot take &mutable self in the trait as it needs to be callable on non-mutable values. But if it takes &self then Rc can't update its reference count.

I don't see any solution to this. The only thing that makes Rc work is the ability for Cell to be mutated via a & reference. Even if you try and inline the behavior of Cell into Rc, it has the same problem (where mutation must be able to happen from within a &self method).

- no `unsafe` code in `get` and `set`
- `set` now requires a `&mutable self` parameter

*Note: `noshare` should be unnecessary now, because the aliasing of this type is tracked properly.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noshare cannot be inferred, because that would prevent Mutex from being written. Mutex needs to be Share, but it also needs to be SafelyMutable. It's safe to be Share because the implementation of Mutex is designed to enforce this safety.

@lilyball
Copy link
Contributor

If this proposal were implemented, I could still write a Cell-like type that allows for mutation through & pointers. I think it's possible with TLS (if you're willing to accept that the Cell is non-Send), and I think it's also possible using a static mut (using a thread-safe data structure to actually contain the values). The Cell in this case would only contain an immutable key.

@matthieu-m
Copy link
Author

The proposal does not seek to eliminate all uses of internal mutability (such as Rc), but instead to introduce mutable to document (in the type signature) visible mutability such as that of Cell.

It then goes on to try and define helpful rules to diminish the use of unsafe code, which is a somewhat orthogonal goal, but I seriously doubt that it will ever be possible to remove all instances of unsafe and therefore we need to draw the line... somewhere.

I think that as long as visible mutability is clearly documented, we have won something. If it takes unsafe to implement methods on &mutable self, so be it.

There were holes and distractions in the previous iteration, however perfection is unnecessary so let's just aim for a step forward instead. Besides, the language will continue to evolve, and it will be easier to add it than to remove from it.

Notable changes:
- removed the distraction on layouts, Rust is nominally typed so let us focus on nominal typing.
- removed the vague "do not leak references", it is much too easy to accidentally introduce holes here.
- removed the example of RefCell, and focus on POD and Cell for now.
- reorganized the sections a bit, with more examples.
- added some more unresolved questions to ponder.
@lilyball
Copy link
Contributor

This still doesn't solve the Rc problem. The only way for Rc to work is if it uses unsafe to coerce a &self pointer to &mutable self within the clone() method. And as far as I can tell, the ability to do that contradicts the entire point of this proposal (which is to disallow "hidden" mutability).

@matthieu-m
Copy link
Author

The point of the proposal is not actually to disable hidden mutable. Or not exactly.

The point of the proposal is to propose a framework for annotating operations that mutate user-visible state:

  • regardless of whether the reference is exclusive or not, which is an incident property
  • and with no regard for user-invisible state (such as the state of the actual mutex in Mutex or the reference count in Rc)
  • and with no intention of replacing fully unsafe contexts, though I do wish to make a dent here

With this proposal, MutexGuard would implement DerefExclusive (fn deref<'a>(&'a self) -> &'a exclusive T) and DerefExclusiveMut (fn deref<'a>(&'a mutable self) -> &'a exclusive mutable T). It always guarantee non-aliasing and the programmer may choose whether mutation of the internal object is allowed or not.

@lilyball
Copy link
Contributor

If this proposal essentially allows opt-in mutability annotations, but doesn't actually prohibit internal mutability through a &self pointer, then it doesn't actually appear to solve any problem. I can already provide opt-in mutability annotations on methods by giving them more descriptive names. I thought the whole point was to require mutability annotations, because only then can you actually consider &T to be deeply immutable.

@ftxqxd
Copy link
Contributor

ftxqxd commented May 17, 2014

With this proposal, would there be any downside to simply using &mutable T pointers anywhere where we’d use &T today? As far as I can tell, they only add functionality to &T (they can mutate SafelyMutable types) and don’t restrict any (they’re still aliasable). Presumably using &mutable pointers everywhere would also require making all variables mutable as well, so would it be possible to combine Niko’s proposal with this one, remove mutability and have 2 pointer types: &mutable T and &exclusive mutable T (obviously with different names; perhaps &T and &my T respectively)?

@lilyball
Copy link
Contributor

@P1start That's functionally no different than just adopting Niko's proposal, because you've just gotten rid of &T and the only real purpose to this PR #78 is to establish the difference between &T and &mutable T (which of course requires &T to exist).

@ftxqxd
Copy link
Contributor

ftxqxd commented May 17, 2014

@kballard How does that make it no different? It allows one to have multiple mutable pointers to the same location (if it’s SafelyMutable). The following code would not work under Niko’s proposal:

let x = 4;
let y = &x;
let z = &x;
*y = 5; // under Niko’s proposal this wouldn’t work
*z = 6; // … nor this

It also allows for a safe Cell implementation (which I don’t think Niko’s proposal does).

@zwarich
Copy link

zwarich commented May 17, 2014

One correction to @thestinger's comment: aliasing in C and LLVM is not just about mutability.

If I have a fragment of LLVM IR with two loads from addresses with distinct TBAA tags, followed by a comparison of the two addresses, an optimization pass is allowed to optimize the comparison even though there is nothing involved with mutability. A more realistic example would be scalar replacement. If an alloca is potentially read from by a load that may read from multiple allocas depending on control flow, then that would block a scalar replacement pass from converting the alloca to an SSA variable (unless that pass is sufficiently advanced to rewrite the code to push the load up the control flow), and this has nothing to do with mutability. Another example is points-to analysis for devirtualization, which is nontrivial even without any mutation.

Aliasing is a dynamic property of variables and expressions in a language with a semantics based on memory locations. This dynamic property can be approximated with simple static analyses (e.g. comparing offsets on loads with a common base, looking for escaping pointers, etc.) and augmented with type systems. These static approximations of aliasing can then be used to create static approximations of dependence between memory operations, but aliasing is not defined in terms of memory dependence.

Aliasing is actually more important for safety of deallocation than safety of mutation. If code iterating over a fixed-size vector of integers has multiple mutable references into the vector, then no violation of memory safety can occur unless the vector's storage is deallocated. Of course, since Rust pervasively uses destructors, mutation often goes hand-in-hand with deallocation, and the removal of const means that POD and non-POD types now share the same rules, even though it is not required for memory safety.

@lilyball
Copy link
Contributor

@P1start Ok, yes, I forgot that this proposal defined primitives as SafelyMutable. So you're right it's not exactly the same thing.

But I think that making primitives mutable like this is actually a bad thing. Today, LLVM knows that any &T pointer, where T is not Unsafe (and the vast majority of types are not Unsafe), does not alias with any mutable pointer. This allows it to optimize away redundant reads, or to reorder reads with respect to other write operations. But once T is a type that may mutate (which is to say, that's Unsafe today, or SafelyMutable with this proposal), it loses that optimization opportunity.

@zwarich
Copy link

zwarich commented May 17, 2014

@kballard Today, LLVM does not know that &T does not alias with any mutable pointer. This property can not be expressed with LLVM's existing TBAA system, even disregarding Unsafe.

@lilyball
Copy link
Contributor

@zwarich Really? Darn, I thought it did know that. That's rather unfortunate.

Still, there was talk about writing a custom LLVM pass to teach it about alias information that Rust has that it can't express to LLVM today. Presumably such a pass would then enable LLVM to make the optimizations I just described.

@ftxqxd
Copy link
Contributor

ftxqxd commented May 17, 2014

@kballard Yes, I now see that it does seem somehow wrong to have almost no assertions about mutability anywhere. Although I’d probably personally find it easier to code with, all pointers being mutable seems like numerous optimisations could be made impossible, and just knowing that a reference or variable will not change can be useful when writing code.

@thestinger
Copy link

@matthieu-m: The mutation performed during clone for Rc is essentially user-visible. It can cause the destructor (fn drop(&mut self) { println!("side effect") }) of the contained value to be run later.

@thestinger
Copy link

@zwarich: That may apply to TBAA metadata, but I expect that we would be using a custom pass where NoAlias is documented as being entirely about memory dependency. The same set of rules is used for noalias parameters and return values, which we're already using.

@zwarich
Copy link

zwarich commented May 18, 2014

@thestinger There are other aliasing rules in the LLVM LangRef, e.g. noalias pointers can't alias pointers that are not derived from them, global variables and allocas don't alias other memory, etc. You are correct in asserting that despite the name, LLVM's AliasAnalysis interface isn't really an alias analysis interface; it's a memory dependence interface. We would probably need a custom system of TBAA metadata on memory operations, a replacement for the BasicAlias pass, and some modifications to the inliner (to ensure that the must-not-alias constraints of a &mut inlined twice don't overlap). Hopefully we can ride on support for the existing TBAA metadata so that existing passes that copy that metadata will just copy ours, and the inliner will be the only existing pass we need to modify.

@matthieu-m The compiler doesn't need to analyze T in depth in order to determine whether &T aliases another mutable reference. Any load or store that violates this aliasing rule will have to occur behind an unsafe block, and the compiler will just have to mark those memory operations as potentially aliasing anything.

There is a bit of a problem with this approach: since transitive unsafety is quite pervasive (so many basic data structures are implemented in terms of unsafe), many operations will ultimately expand to an inlined unsafe block. At the moment, the compiler has to basically assume that unsafe code can do anything, a raw pointer used in an unsafe block has to be assumed to alias any other pointer. The only way to improve this situation would be to have some way to distinguish between type-unsafe and lifetime-unsafe. A lifetime-unsafe use of a T* would still be seen as only aliasing other pointers to T, whereas a type-unsafe use of T* would be seen as potentially aliasing any other pointer.

@huon What is gained by making modifications through &T in an unsafe block be undefined behavior?


## 1. Defining safe mutation

Rust is nominally typed, and therefore it seems dangerous to allow a `&A` to point to an instance of anything else than a `A`. Not only methods could be called that would be unexpected, but it would possibly confuse the Type-Based Alias Analysis in LLVM.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust doesn't really make any guarantees about this right now.

@thestinger
Copy link

@zwarich: The current rules require that internal mutability happens through an Unsafe<T> wrapper type, so the compiler is aware of types where unsafe is used for that purpose.

@zwarich
Copy link

zwarich commented May 18, 2014

@thestinger: That rule is frequently violated in Rust and Servo code, so I consider it more wishful thinking at the moment. I'm also not entirely convinced it's sound, although it's difficult to say since it's never even been precisely stated.

@huonw
Copy link
Member

huonw commented May 18, 2014

The statement is exactly that: the only legal way to mutate aliased data is by using the Unsafe type to perform the mutation.

RefCell, Cell, Mutex, RWLock and the atomics all use Unsafe internally. (There's only a few remaining instances of non-Unsafe mutation in Rust; at least, I removed the obvious ones when I removed the old transmute_mut function and, iirc, the only ones I didn't migrate to Unsafe were those where the & references didn't alias with anything. I left fixmes on them too.)

The advantage of this rule is &T references are known to be NoAlias except if T contains Unsafe.

@nrc
Copy link
Member

nrc commented May 18, 2014

@huonw by 'legal', do you mean by convention or that the compiler enforces it?

@huonw
Copy link
Member

huonw commented May 18, 2014

I mean: not using Unsafe is undefined behaviour (in the technical sense, like dereferencing null or signed integer overflow in C). That is, the compiler is allowed to assume that no-one ever mutates behind & without using Unsafe and is allowed to optimise based on it never happening, which may break any program that does invoke the bad behaviour.

@thestinger
Copy link

@zwarich: The manual has stated that this is undefined behaviour for quite some time. The standard library types with inherited mutability used to rely on a #[no_freeze] marker, followed by NoFreeze and now Unsafe<T>. If Servo is relying on this, I think that's an issue for Servo to fix rather than one rustc has to be concerned with.

@rkjnsn
Copy link
Contributor

rkjnsn commented May 21, 2014

I'm pretty new to Rust, but I thought I'd drop my two cents.

To me, uniqueness and mutability seem to be distinct, though related, things, and expressing them independently, as in this proposal, makes the most intuitive sense.

Uniqueness (or exclusiveness) is a property of the reference, is strongly enforced by the compiler, is required to provide compile-time memory safety, and is one of the features that makes Rust so exciting.

Mutability, on the other hand, I feel is more an expression of programmer intent, and helps with reading and reasoning about code. Having it in the type system is not required for safety, but including it allows the compiler to catch programmer mistakes, which is nice. (E.g., you tried to mutate a value you said you wouldn't (error), or you didn't mutate a value you said you would (warning/lint)).

Because mutability would be a statement of programmer intent, it makes sense to me that it would be concerned with "logical" mutability. In other words, any mutation to a non-mutable object would be limited to private members and would not be readily apparent to the user of the object. This would allow an implementation to be changed to, e.g., use caching for better performance without changing the interface.

While the additional possibilities for reference types introduced by this proposal can be seen as adding complexity, I feel separating exclusivity and mutability actually makes things conceptually simpler and easier to reason about. For example, changing the value in a Cell would require mutability but not exclusiveness, which I find easier to understand/explain than the current situation. I also know that a non-mutable (exclusive or not) reference shouldn't change in a way I have to worry about as a consumer of the type. I think all four combinations are useful and worth having.

All that said, I'm not as convinced about SafelyMutable. It seems like a fair amount of complexity, and I'm not sure what it actually gets you. It seems to have a lot of overlap with the capabilities of Cell.

@brson
Copy link
Contributor

brson commented Jun 19, 2014

Discussed at https://github.com/rust-lang/rust/wiki/Meeting-RFC-triage-2014-06-19. This is a large and complex proposal, containing major changes to language semantics. At this stage in development we don't want to consider such a major overhaul. Closing. Thank you.

@brson brson closed this Jun 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants