Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Rust Has Provenance #3559

Merged
merged 15 commits into from
Feb 17, 2024
Merged

RFC: Rust Has Provenance #3559

merged 15 commits into from
Feb 17, 2024

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Jan 25, 2024

Pointers (this includes values of reference type) in Rust have two components.

  • The pointer's "address" says where in memory the pointer is currently pointing.
  • The pointer's "provenance" says where and when the pointer is allowed to access memory.

(This is disregarding any "metadata" that may come with wide pointers, it only talks about thin pointers / the data part of a wide pointer.)

Whether a memory access with a given pointer causes undefined behavior (UB) depends on both the address and the provenance:
the same address may be fine to access with one provenance, and UB to access with another provenance.

In contrast, integers do not have a provenance component.

Most of the rest of the details, such as a specific provenance model, are intentionally left unspecified.

This RFC very deliberately aims to be as minimal as possible, to just get the entire Rust Project on the "same page" about the long-term future development of the language.

📖 RFC book

Rendered

Tracking issue

@RalfJung RalfJung changed the title Rust Has Provenance RFC: Rust Has Provenance Jan 25, 2024
text/0000-rust-has-provenance.md Outdated Show resolved Hide resolved
text/0000-rust-has-provenance.md Outdated Show resolved Hide resolved
Co-authored-by: lcnr <rust@lcnr.de>
Copy link
Member

@WaffleLapkin WaffleLapkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this! I think it's a good idea to set in stone this fact :)

text/0000-rust-has-provenance.md Outdated Show resolved Hide resolved
text/0000-rust-has-provenance.md Outdated Show resolved Hide resolved
text/0000-rust-has-provenance.md Outdated Show resolved Hide resolved
@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jan 25, 2024
@mattheww
Copy link

Note that "provenance" is a somewhat unfortunate term.

The above is true.

Is this RFC declaring only that Rust's abstract machine includes a notion of provenance, or is it also declaring that "provenance" is going to be the public-facing term?

(I think "place" has been a valuable improvement on "lvalue", so maybe there's an opportunity to make a similar improvement here.)

@RalfJung
Copy link
Member Author

Is this RFC declaring only that Rust's abstract machine includes a notion of provenance, or is it also declaring that "provenance" is going to be the public-facing term?

The RFC is primarily concerned with the Abstract Machine.
However, it is also meant to pave the way for the stabilization of the strict provenance APIs, which would obviously use the term in public ways (and may even include it in the names of some functions).

@scottmcm
Copy link
Member

Thanks for writing this up! It was a good read.

After the design meeting we had about this, I agree that there's no practical alternative to having provenance in Rust, and thus we ought acknowledge that fact in a way that's visible like this. Let's see if others agree:
@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 25, 2024

Team member @scottmcm has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Jan 25, 2024
@tmandry
Copy link
Member

tmandry commented Jan 25, 2024

+1 to what @scottmcm said and thanks for writing this up, @RalfJung. This reflects the understanding I took away from the lang design meeting last year and I think the RFC speaks for itself in terms of motivation.

@rfcbot reviewed

@traviscross
Copy link
Contributor

traviscross commented Jan 25, 2024

@rustbot labels +I-lang-nominated

Even though we've already proposed FCP merge here, let's nominate so we discuss it in one meeting for at least general awareness.

Huge thanks to @RalfJung for writing this up. I earlier reviewed this at the draft stage and am +1 on it.

@rustbot rustbot added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Jan 25, 2024
Co-authored-by: Ruby Lazuli <general@patchmixolydic.com>
[^erase]: It is of course still possible to erase provenance during compilation, *if* the target that we are compiling to does not actually do the access checks that the abstract machine does. What is not safe is having a language operation that strips provenance, and inserting that in arbitrary places in the program.

*Historical note:* The author assumes that provenance in C started out purely descriptively.
However, the moment compilers started doing optimizations that exploit undefined behavior depending on the provenance of a pointer, provenance became prescriptive.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads strangely. Per your definition, if it was always undefined behaviour, it was always prescriptive, regardless of whether implementations exploited it. If it was not always undefined behaviour, the important thing is that it became undefined, not that implementations exploited it. The way you've written it reads to me like you're saying it was always undefined behaviour, but was prescriptive until compilers exploited it for optimisations?

Copy link
Member Author

@RalfJung RalfJung Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard C was never worded precise enough to really say whether this alias analysis is allowed. The wording about one-past-the-end pointers hints at prescriptive provenance, but it's not sufficiently clear.

So as is often the case with C, we have to look at implementations to see how the standard is interpreted.
In de-facto-C, I think this was descriptive for a while, but at some point (decades ago) alias analysis made it prescriptive. The example from the RFC shows prescriptive provenance on the oldest version of clang for which godbolt can still execute code (clang 3.4.1). I couldn't find an exact release date but it seems to be from around 2013/2014.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When C was standardized, they didn't really consider provenance, but many basic optimizations that C compilers do, such as register allocating local variables, both seem "obviously" correct to everyone and imply provenance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Register allocation of variables that never have their address taken, or where no information about the address is ever used to influence what happens with other unrelated pointers, is fine without provenance under the non-deterministic-allocation model.

@RalfJung RalfJung force-pushed the provenance branch 5 times, most recently from 001d4c5 to 0c165b0 Compare January 26, 2024 10:34
@VitWW
Copy link

VitWW commented Jan 26, 2024

I think its useful to add in the RFC alternative names of a term:

  • provenance
  • place
  • entity
  • gist
  • responsibility
  • conformity
  • association
  • essence
  • ...

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 7, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Feb 7, 2024
@VitWW
Copy link

VitWW commented Feb 7, 2024

As title for RFC topic "Rust Has Provenance" is a good and loud name, but for language feature this title is self-referencing, trivial and meaningless same time.
It is better to change the title into something more meaningful before merging, like "Adding Provenance to Pointer Model" or "Pointer's Provenance"

@Lokathor
Copy link
Contributor

Lokathor commented Feb 7, 2024

The "point" of the RFC, at least in the initial draft that I sent to Ralf, and I think it's also fair to say in Ralf's draft that is now in FCP here, is that we're not actually adding anything.

We're just admitting, concretely, what is already the case.

Merging the RFC doesn't change the language, it just focuses future discussions about the development of the language (which otherwise often stall out).

@traviscross
Copy link
Contributor

traviscross commented Feb 14, 2024

@rustbot labels -I-lang-nominated

We mentioned this for awareness in the meeting last week, and it's now in FCP, so we can unnominate.

Thanks for @RalfJung for pushing this forward, and to @Lokathor for the initial draft work.

@rustbot rustbot removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Feb 14, 2024
@pnkfelix
Copy link
Member

A point that came out of a zulip conversation that I felt worth lifting up to this less transient forum: The sentence "Most of the rest of the details, such as a specific provenance model, are intentionally left unspecified" is carrying a lot of weight.

In particular, one cannot immediately generalize the given shrptr example to other arbitrary *mut T types and make the same conclusion, for all choices of T, that an analogous shrptr.write(...) line for that other T is also UB. In particular, there are some choices for T where, under all memory models under consideration currently, we would almost certainly consider the behavior to be well-defined.

(This is all consistent with the text of the RFC, since all the RFC is saying is that "Provenance Exists", and is not making statements about what precise effect that has on the rest of the language; more precise statements are left for future work in defining the specific provenance (+ memory) model.)

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Feb 15, 2024
…om-raw, r=m-ou-se

Document requirements for unsized {Rc,Arc}::from_raw

This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers.

Note that the term "data pointer" is adopted from rust-lang/rfcs#3559

See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 15, 2024
Rollup merge of rust-lang#120449 - udoprog:document-unsized-rc-arc-from-raw, r=m-ou-se

Document requirements for unsized {Rc,Arc}::from_raw

This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers.

Note that the term "data pointer" is adopted from rust-lang/rfcs#3559

See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Feb 17, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 17, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

text/0000-rust-has-provenance.md Outdated Show resolved Hide resolved
@scottmcm scottmcm merged commit 29645fe into rust-lang:master Feb 17, 2024
@scottmcm
Copy link
Member

Huzzah! The @rust-lang/lang team has decided to accept this RFC.

Thank you @RalfJung and everyone who contributed!

To track further discussion, subscribe to the tracking issue here:
rust-lang/rust#121243

@RalfJung RalfJung deleted the provenance branch February 18, 2024 07:51
@jsgf
Copy link

jsgf commented Mar 8, 2024

Just got around to reading this. It's very similar to the Annelid tool @nnethercote and I did for Valgrind, where pointers were tagged by a "segment" (eg a segment corresponding to a malloc), and all memory load/stores were checked against the pointer's segment. In practice this didn't work all that well, as it was trying to track the segment through arbitrary raw assembler instructions which often entangled pointers from different segments (eg xor pointer swap), but it was very interesting to play with.

Discussed in Nick's thesis https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-606.pdf

* The pointer's "address" says where in memory the pointer is currently pointing.
* The pointer's "provenance" says where and when the pointer is allowed to access memory.

(This is disregarding any "metadata" that may come with wide pointers, it only talks about thin pointers / the data part of a wide pointer.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The data part of a wide pointer" seems ambiguous - I read this as referring to the metadata, but from context I assume it means the address/pointer part of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is the usual way to refer to the address part (at least in Rust). For example from_raw_parts:

pub fn from_raw_parts<T>(
    data_address: *const (),
    metadata: <T as Pointee>::Metadata
) -> *const T
where
    T: ?Sized,

"data" refers to the fact that the "data" address points to the data. This is in contrast to the metadata which for trait objects is a pointer pointing to the vtable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, but I think "address" is the significant part of that name (though I realize the whole point of the RFC is to make a clear distinction between "address" and "pointer").

Copy link
Member Author

@RalfJung RalfJung Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have "metadata", usually the things that this is "meta about" is the "data". I have not seen any context where one would abbreviate "metadata" as "data", that is losing the key distinction of it being "meta"!

Calling it an "address" definitely does not work in the context of this RFC. On nightly, that function signature has been changed:

pub fn from_raw_parts<T>(
    data_pointer: *const (),
    metadata: <T as Pointee>::Metadata
) -> *const T
where
    T: ?Sized,

@RalfJung
Copy link
Member Author

RalfJung commented Mar 9, 2024

Just got around to reading this. It's very similar to the Annelid tool @nnethercote and I did for Valgrind, where pointers were tagged by a "segment" (eg a segment corresponding to a malloc), and all memory load/stores were checked against the pointer's segment. In practice this didn't work all that well, as it was trying to track the segment through arbitrary raw assembler instructions which often entangled pointers from different segments (eg xor pointer swap), but it was very interesting to play with.

That's cool, I didn't know about this tool!

The key difference is that here we are making this extra information (tag/provenance/segment) part of the spec, and in the spec we have all the information we need to define how it propagates. XOR of pointers is one of the very few patterns that need "exposed" pointer semantics, lucky enough it is very rare. Specifying this fully is still not a solved problem, but this is "just" part of figuring out how to make int2ptr casts work in a world with provenance.

Doing this kind of tracking in valgrind after the compiler erased all information is of course a lot harder. @pnkfelix' valgrind tool for Stacked Borrows must be doing something similar.

lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Apr 7, 2024
…=m-ou-se

Document requirements for unsized {Rc,Arc}::from_raw

This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers.

Note that the term "data pointer" is adopted from rust-lang/rfcs#3559

See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
…=m-ou-se

Document requirements for unsized {Rc,Arc}::from_raw

This seems to be implied due to these types supporting operation-less unsized coercions. Taken together with the [established behavior of a wide to thin pointer cast](rust-lang/reference#1451) it would enable unsafe downcasting of these containers.

Note that the term "data pointer" is adopted from rust-lang/rfcs#3559

See also this [internals thread](https://internals.rust-lang.org/t/can-unsafe-smart-pointer-downcasts-be-correct/20229/2).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.