-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Rust Has Provenance #3559
RFC: Rust Has Provenance #3559
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this! I think it's a good idea to set in stone this fact :)
The above is true. Is this RFC declaring only that Rust's abstract machine includes a notion of provenance, or is it also declaring that "provenance" is going to be the public-facing term? (I think "place" has been a valuable improvement on "lvalue", so maybe there's an opportunity to make a similar improvement here.) |
The RFC is primarily concerned with the Abstract Machine. |
Thanks for writing this up! It was a good read. After the design meeting we had about this, I agree that there's no practical alternative to having provenance in Rust, and thus we ought acknowledge that fact in a way that's visible like this. Let's see if others agree: |
Team member @scottmcm has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. |
text/0000-rust-has-provenance.md
Outdated
[^erase]: It is of course still possible to erase provenance during compilation, *if* the target that we are compiling to does not actually do the access checks that the abstract machine does. What is not safe is having a language operation that strips provenance, and inserting that in arbitrary places in the program. | ||
|
||
*Historical note:* The author assumes that provenance in C started out purely descriptively. | ||
However, the moment compilers started doing optimizations that exploit undefined behavior depending on the provenance of a pointer, provenance became prescriptive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads strangely. Per your definition, if it was always undefined behaviour, it was always prescriptive, regardless of whether implementations exploited it. If it was not always undefined behaviour, the important thing is that it became undefined, not that implementations exploited it. The way you've written it reads to me like you're saying it was always undefined behaviour, but was prescriptive until compilers exploited it for optimisations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard C was never worded precise enough to really say whether this alias analysis is allowed. The wording about one-past-the-end pointers hints at prescriptive provenance, but it's not sufficiently clear.
So as is often the case with C, we have to look at implementations to see how the standard is interpreted.
In de-facto-C, I think this was descriptive for a while, but at some point (decades ago) alias analysis made it prescriptive. The example from the RFC shows prescriptive provenance on the oldest version of clang for which godbolt can still execute code (clang 3.4.1). I couldn't find an exact release date but it seems to be from around 2013/2014.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When C was standardized, they didn't really consider provenance, but many basic optimizations that C compilers do, such as register allocating local variables, both seem "obviously" correct to everyone and imply provenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register allocation of variables that never have their address taken, or where no information about the address is ever used to influence what happens with other unrelated pointers, is fine without provenance under the non-deterministic-allocation model.
001d4c5
to
0c165b0
Compare
I think its useful to add in the RFC alternative names of a term:
|
🔔 This is now entering its final comment period, as per the review above. 🔔 |
As title for RFC topic "Rust Has Provenance" is a good and loud name, but for language feature this title is self-referencing, trivial and meaningless same time. |
The "point" of the RFC, at least in the initial draft that I sent to Ralf, and I think it's also fair to say in Ralf's draft that is now in FCP here, is that we're not actually adding anything. We're just admitting, concretely, what is already the case. Merging the RFC doesn't change the language, it just focuses future discussions about the development of the language (which otherwise often stall out). |
A point that came out of a zulip conversation that I felt worth lifting up to this less transient forum: The sentence "Most of the rest of the details, such as a specific provenance model, are intentionally left unspecified" is carrying a lot of weight. In particular, one cannot immediately generalize the given (This is all consistent with the text of the RFC, since all the RFC is saying is that "Provenance Exists", and is not making statements about what precise effect that has on the rest of the language; more precise statements are left for future work in defining the specific provenance (+ memory) model.) |
…om-raw, r=m-ou-se Document requirements for unsized {Rc,Arc}::from_raw This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers. Note that the term "data pointer" is adopted from rust-lang/rfcs#3559 See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
Rollup merge of rust-lang#120449 - udoprog:document-unsized-rc-arc-from-raw, r=m-ou-se Document requirements for unsized {Rc,Arc}::from_raw This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers. Note that the term "data pointer" is adopted from rust-lang/rfcs#3559 See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
Huzzah! The @rust-lang/lang team has decided to accept this RFC. Thank you @RalfJung and everyone who contributed! To track further discussion, subscribe to the tracking issue here: |
Just got around to reading this. It's very similar to the Annelid tool @nnethercote and I did for Valgrind, where pointers were tagged by a "segment" (eg a segment corresponding to a malloc), and all memory load/stores were checked against the pointer's segment. In practice this didn't work all that well, as it was trying to track the segment through arbitrary raw assembler instructions which often entangled pointers from different segments (eg xor pointer swap), but it was very interesting to play with. Discussed in Nick's thesis https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-606.pdf |
* The pointer's "address" says where in memory the pointer is currently pointing. | ||
* The pointer's "provenance" says where and when the pointer is allowed to access memory. | ||
|
||
(This is disregarding any "metadata" that may come with wide pointers, it only talks about thin pointers / the data part of a wide pointer.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The data part of a wide pointer" seems ambiguous - I read this as referring to the metadata, but from context I assume it means the address/pointer part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is the usual way to refer to the address part (at least in Rust). For example from_raw_parts
:
pub fn from_raw_parts<T>(
data_address: *const (),
metadata: <T as Pointee>::Metadata
) -> *const T
where
T: ?Sized,
"data" refers to the fact that the "data" address points to the data. This is in contrast to the metadata which for trait objects is a pointer pointing to the vtable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, but I think "address" is the significant part of that name (though I realize the whole point of the RFC is to make a clear distinction between "address" and "pointer").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have "metadata", usually the things that this is "meta about" is the "data". I have not seen any context where one would abbreviate "metadata" as "data", that is losing the key distinction of it being "meta"!
Calling it an "address" definitely does not work in the context of this RFC. On nightly, that function signature has been changed:
pub fn from_raw_parts<T>(
data_pointer: *const (),
metadata: <T as Pointee>::Metadata
) -> *const T
where
T: ?Sized,
That's cool, I didn't know about this tool! The key difference is that here we are making this extra information (tag/provenance/segment) part of the spec, and in the spec we have all the information we need to define how it propagates. XOR of pointers is one of the very few patterns that need "exposed" pointer semantics, lucky enough it is very rare. Specifying this fully is still not a solved problem, but this is "just" part of figuring out how to make int2ptr casts work in a world with provenance. Doing this kind of tracking in valgrind after the compiler erased all information is of course a lot harder. @pnkfelix' valgrind tool for Stacked Borrows must be doing something similar. |
…=m-ou-se Document requirements for unsized {Rc,Arc}::from_raw This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers. Note that the term "data pointer" is adopted from rust-lang/rfcs#3559 See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
…=m-ou-se Document requirements for unsized {Rc,Arc}::from_raw This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers. Note that the term "data pointer" is adopted from rust-lang/rfcs#3559 See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
Pointers (this includes values of reference type) in Rust have two components.
(This is disregarding any "metadata" that may come with wide pointers, it only talks about thin pointers / the data part of a wide pointer.)
Whether a memory access with a given pointer causes undefined behavior (UB) depends on both the address and the provenance:
the same address may be fine to access with one provenance, and UB to access with another provenance.
In contrast, integers do not have a provenance component.
Most of the rest of the details, such as a specific provenance model, are intentionally left unspecified.
This RFC very deliberately aims to be as minimal as possible, to just get the entire Rust Project on the "same page" about the long-term future development of the language.
📖 RFC book
Rendered
Tracking issue