-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for an operator to take a raw reference #2582
Conversation
I've not been following Rust enough lately to judge this RFC from the technical side, but from the usability/ergonomics side, I think we should hide this semantic behind a magic function and/or macro, then tell people to use the macro. I think it's a bad idea to have a language construct where whether or not a part of an expression is stored in a variable changes defined behavior to undefined behavior. In this case I understand why we want the semantic. What I'm saying is do both, put the semantic in the rustonomicon or somewhere else suitably advanced, and direct people to the macro instead. |
MIR is not a part of the language, and therefore, this does not require an RFC, afaik. It's simply a way of encoding the semantics we want into the compiler. |
@ubsan From my reading of this RFC, this affects Rust's abstract machine (in the sense that it makes some things defined behavior and some things explicitly UB...) and makes certain things that weren't hard errors into hard errors. Thus, it requires an RFC. |
@Centril I disagree - it was always the case that we wanted |
I might not be able to keep track of the discussion but I should at least note down two things I thought of:
let r = &*p;
let p2 = r as *const _;
let p3 = r as *const _; let p2 = &*p as *const _;
let p3 = &*p as *const _; A more realistic scenario that this could enable is We might even be able to relax this so other operations involving the reference are also allowed, but I haven't fully considered the implications. |
I do believe an RFC is appropriate here -- but perhaps I mistake the purpose of the RFC. I presumed that the RFC is more about the surface Rust syntax that generates this new MIR operator than the operator itself. It seems to me that there are four main alternatives here:
Of these, I currently prefer the first. @RalfJung, am I correct that this would leave us room in the future to extend the rules to more cases (e.g., if coerced etc)? One open question mark for me is how best to manage the design process here. This applies to all the efforts of the Unsafe Code Guidelines work, in my view. I will leave a separate comment on how I think that should work, actually. |
Regarding the meta question of how the UCG group should be approaching RFCs. The challenge is that we have a lot of "little pieces" that have to be put together that affect one another. I don't think we should be working out a huge "all or nothing" proposal, but I also think it's hard to reason about random RFCs for one tiny piece without seeing the whole picture. I think I'd like to compromise somehow but having
I'm not entirely sure where these RFCs should be opened. I'd like to be trying out the "phase RFC" procedure, which I think basically implies that we are opening up RFCs on this repo as we move between "phases" -- so this RFC in particular might correspond to moving from the "proposal" (nee spitballing) to "prototyping" -- so it's not really a final decision. But anyway I suppose this isn't the right forum for this comment. Perhaps I should open an internals thread to talk about it. I think it'd be good to figure this out, though, as it seems relevant to a lot of things the UCG team will be producing. |
I would say the best thing would probably be that the result of the fn foo() -> &T {
&<invalid lvalue expression>
}
let x = &<invalid lvalue expression>;
fn bar(x: &T) {}
bar(&<invalid lvalue expression>); // note: this is UB at the call site, not in bar
&<invalid lvalue expression>;
// this is equivalent to `let __tmp = &t; drop(__tmp);
&<invalid lvalue expression> as &T as *const T
// this creates a value of reference type with the 'as &T' |
This RFC is mostly about changing/specifying those semantics. MIR is a good way to do that, as it provides a much clearer language to talk about what happens when a Rust program gets executed than if we tried to do the same thing in the surface language. This is not the first RFC to talk about MIR or use MIR as means of specification, either.
Which RFC/reference is saying that? |
I do not understand. This RFC is not primarily about the "cast" part of "take-ref-then-cast", it is about the "take-ref" part. Or are you saying that we should also compile
This is about defining the semantics, which starts with the syntax.
This RFC is first and foremost about adding such a primitive operation to the MIR -- acknowledging the need for it. Secondly, it is about how the user can write code that ends up generating this operation. It would certainly be easy to later add more cases where we take a raw reference instead of a safe one, that can only make more code defined.
I felt this is somewhat different from the kind of "let us decide about this type's layout/invariant"-RFC that we (UCG) will likely be producing eventually: The invariant this refers to is already encoded in rustc, and the RFC proposes to add a new ingredient to our "toolbox for defining surface language semantics", namely taking a raw borrow.
I and the RFC agree these should all be UB. More interesting would be some examples you think should not be UB that the RFC leaves UB. |
Another possibility, just so that it doesn't go unmentioned: Currently the This might be a way to codify the "result of the It's not obvious to me whether there would be backwards compatibility edge cases to worry about? (To be clear this neither would nor could entirely replace the reference-to-pointer coercion, as that applies to any expression, whereas this only applies to literals.) |
This seems really important to have. I even have a crate which already relies on this behaviour. One thing I am concerned about is the implicitness - it would be very easy for someone not familiar with this corner case to come along and accidentally refactor the code in a way which "broke" the guarantee, even with changes that would in most languages be semantically identical. As a way to solve this, would it make sense to relax the constraint on values being initialised if they are only used in this way? eg. start allowing: fn main() {
let x: u32;
let y = &x as *const _;
println!("{:?}", y);
} This would allow writing code that would turn into a hard error if you somehow tried to actually do something with the value, and would also make it possible to calculate field offsets from completely safe code: we would no longer have to do hacks with |
But the only cases where we'd not treat a borrow as being of a reference type is if it's only ever used as a raw pointer, in my value-use-based model. Anything else (including calling a function with the reference as an argument) would still impose reference requirements. I actually think @glaebhoerl's formulation of a polymorphic borrow operator is equivalent to mine in behavior but it might be easier to implement it as an analysis in order to construct the MIR (as I've suggested) instead of during type-checking. |
@RalfJung yeah, alright that makes sense. I don't believe there are any examples on which me and the RFC disagree, although if you can think of any, that'd be useful to know. I would like to see that list put in the RFC, if you don't mind, since I feel it gives a nice overview from a Rust pov; I also don't feel the transform is defined well enough I think discussing "binding to reference values" is probably more useful, instead of talking about "the same statement", since that would make your let x: *const T = {
let y: &T = &*null;
y as *const T
}; since It's also important to think about when the coercion actually occurs - for example, does let x: *const T = {&*null}; translate to let x: *const T = {&*null} as *const T;
// or
let x: *const T = {(&*null) as *const T}; if the former, it should be UB. If the latter, it should not be. |
Certainly we could not today just make I do sort of like the idea of making |
Yes. I'm not sure I agree about the relative importance of those two points, but it doesn't really matter very much. I would argue that simply from a backwards compatibility point of view we really want to make There seem to be two "basic ways" we can make such code work:
I don't think anybody has a good idea how to make the dynamic idea work. It seems "imaginable", though. To start, however, we'd probably have to remove the various annotations we give to LLVM. Presuming that we are going to go with the static option, then we return to: what is this static subset? As I wrote above, I think that for backwards compat reasons it basically has to include cases where the It is conceivable that we might go further and add an explicit Rust syntax for this. I had a strawman proposal of Does all that make sense? Do we all agree that |
@nikomatsakis I would argue that it should be valid for all (possibly-invalid) lvalue expressions, since we have no guarantees on raw pointers - i.e., |
But in
It seems to me that my proposal is a prerequisite for yours. You are also suggesting that there be a way to create a raw pointer to a field without creating an intermediate reference. We need a way to represent your inference after some kind of desugaring -- we need a primitive operation to "take a raw reference". You are just going further than I did in terms of when we use that operation, i.e. when we take a raw reference vs a safe one. I will add a remark to the RFC saying that we might want to use the new operation for more cases. But I do not see a way to realize any of the proposals (by @glaebhoerl and @eddyb) without having this new operation that is distinct from any operation we can express so far; and if we do have such an operation it should explicitly show up in the MIR. Making such things explicit is part of what MIR is about. |
My problem with this stronger inference proposed by @glaebhoerl is that if someone relies on this behavior, there is a danger of accidentally adding a non-raw-ptr use to a reference, which would then rather subtly make the program have UB. If we say that you have to cast immediately, things cannot be correct for "subtle" reasons. But I guess we could have a lint against any "taking a raw reference" that is not immediately followed by a cast: Then, more existing code works (because we take raw references in more cases), but it is less likely that people will accidentally break their code because they relied on this behavior. We might even make this an err-by-default lint after some transition period?
I never said that the new operation is used when the cast happens in the same statement. I said that it is used when the reference is "immediately cast [...] to a raw pointer":
Anyway, I will add some examples.
That is an interesting example indeed. During my experiments, I noticed a similar problem with implicit coercions, namely coercion
Not sure which part of @nikomatsakis posts' you are referring to here, but assuming |
@nikomatsakis Very well put, I do not have anything to add. :) |
@RalfJung What I'm suggesting is a pre-MIR analysis, not considering those copies. Alternatively, a dataflow analysis on MIR, where copies are considered noops, and which rewrites the MIR to "weaken" borrows, as needed. Something I have not considered is interaction with mutable state, but I suspect dataflow analysis would be able to understand that.
Yes, I never said anything about the MIR operation not being needed, but rather that the syntactic condition for producing it, could be relaxed to something more general. |
I see, okay. I do not fully understand which syntactic condition you have in mind, but if the result is that at some point (pre-borrowck, I would guess) we have the result of this inference encoded explicitly in the MIR, then I think I am fine. I'd prefer this condition to be as simple and hence predictable as possible, but I'd be basically satisfied with any syntactic condition. |
I didn't mean to imply otherwise :)
It seems we are concerned about different things, both of which however seem worth being concerned about:
I guess the latter is easier to lint against than the former. But otherwise as long as we wish to re-use the
(Once again, this is in the spirit of "so that the option doesn't go unmentioned", and I'm not sure how well I like it.) How about, as a (drawback) slightly jarring but otherwise (advantage) extremely simple and straightforward alternative: It is slightly jarring because of course |
As a non-expert on all of these issues who's hoping for Rust to become the first language with UB where non-wizards can actually figure out whether or not they're invoking UB, I'm strongly in favor of only making the |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
🎉 Huzzah! This RFC has been merged! Tracking issue: rust-lang/rust#64490 |
Sorry for arriving late to the party (and arguably just adding bikeshedding), but I haven't seen this proposal discussed above, and I think it might make the syntax more intuitive. Put simply, use
This means that there's a clear split between
My natural read of Disadvantage: |
This has definitely been considered in the RFC and by the language team -- we rejected this proposal due to the disadvantage you mention. |
The RFC mentions using |
@mb64 There is no UB-free equivalent to |
What is the status of rvalue to lvalue promotion in
The code compiles without warnings, but it doesn't modify Does it make sense to prohibit such promotions in |
@red75prime promotion being confusing here seems mostly orthogonal to raw-vs-ref, doesn't it? If no promotion happens with raw ptrs, what would you expect to happen instead? The only alternative I see is for this code to be UB, because without promotion the temporary result of the cast is deallocated before However, whether we do promotion on raw refs or not is a good question indeed. I would expect |
We should probably continue over at the tracking issue tho. :) |
Introduce new variants of the
&
operator:&raw mut <place>
to create a*mut <T>
, and&raw const <place>
to create a*const <T>
. This creates a raw pointer directly, as opposed to the already existing&mut <place> as *mut _
/&<place> as *const _
, which create a temporary reference and then cast that to a raw pointer. As a consequence, the existing expressions<term> as *mut <T>
and<term> as *const <T>
where<term>
has reference type are equivalent to&raw mut *<term>
and&raw const *<term>
, respectively. Moreover, add a lint to existing code that could use the new operator, and treat existing code that creates a reference and immediately casts or coerces it to a raw pointer as if it had been written with the new syntax.As an option, we could treat
&mut <place> as *mut _
/&<place> as *const _
as if they had been written with&raw
to avoid creating temporary references when that was likely not the intention.Rendered
Tracking issue
The RFC got half-rewritten; click here to jump to the beginning of the post-rewrite discussion.
Cc @Centril @rust-lang/wg-unsafe-code-guidelines