-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clarify UB for raw ptr deref #1000
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,8 @@ code. | |
</div> | ||
|
||
* Data races. | ||
* Dereferencing (using the `*` operator on) a dangling or unaligned raw pointer. | ||
* Evaluating a dereference [place expression] (`*expr`) on a raw pointer that is | ||
[dangling] or unaligned. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this was raised in the Zulip conversation, but I'll nevertheless raise it here: I wonder if a word other than e-valu-ating could be used when talking about a place expression (although I couldn't come up with one satisfactory enough, so feel free to disregard this nit). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clearly the right term is "placating". On a more serious note, this is one reason why I argued fiercly against calling these things "values"/"value expressions", but I was unable to convince enough people to make a difference... |
||
* Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped | ||
[noalias] model, except if the `&T` contains an [`UnsafeCell<U>`]. | ||
* Mutating immutable data. All data inside a [`const`] item is immutable. Moreover, all | ||
|
@@ -45,7 +46,7 @@ code. | |
* A `!` (all values are invalid for this type). | ||
* An integer (`i*`/`u*`), floating point value (`f*`), or raw pointer obtained | ||
from [uninitialized memory][undef], or uninitialized memory in a `str`. | ||
* A reference or `Box<T>` that is dangling, unaligned, or points to an invalid value. | ||
* A reference or `Box<T>` that is [dangling], unaligned, or points to an invalid value. | ||
* Invalid metadata in a wide reference, `Box<T>`, or raw pointer: | ||
* `dyn Trait` metadata is invalid if it is not a pointer to a vtable for | ||
`Trait` that matches the actual dynamic trait the pointer or reference points to. | ||
|
@@ -62,6 +63,17 @@ a restricted set of valid values. In other words, the only cases in which | |
reading uninitialized memory is permitted are inside `union`s and in "padding" | ||
(the gaps between the fields/elements of a type). | ||
|
||
> **Note**: Undefined behavior affects the entire program. For example, calling | ||
> a function in C that exhibits undefined behavior of C means your entire | ||
> program contains undefined behaviour that can also affect the Rust code. And | ||
> vice versa, undefined behavior in Rust can cause adverse affects on code | ||
> executed by any FFI calls to other languages. | ||
|
||
[place expression]: expressions.md#place-expressions-and-value-expressions | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Dangling pointers | ||
[dangling]: #dangling-pointers | ||
|
||
A reference/pointer is "dangling" if it is null or not all of the bytes it | ||
points to are part of the same allocation (so in particular they all have to be | ||
part of *some* allocation). The span of bytes it points to is determined by the | ||
|
@@ -71,12 +83,6 @@ that slices and strings point to their entire range, so it is important that the | |
metadata is never too large. In particular, allocations and therefore slices and strings | ||
cannot be bigger than `isize::MAX` bytes. | ||
|
||
> **Note**: Undefined behavior affects the entire program. For example, calling | ||
> a function in C that exhibits undefined behavior of C means your entire | ||
> program contains undefined behaviour that can also affect the Rust code. And | ||
> vice versa, undefined behavior in Rust can cause adverse affects on code | ||
> executed by any FFI calls to other languages. | ||
|
||
[`bool`]: types/boolean.md | ||
[`const`]: items/constant-items.md | ||
[noalias]: http://llvm.org/docs/LangRef.html#noalias | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird to me, because a dereference is always a place expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it can be used as a value expression as in
Arguably, here it is correct to say that
*expr
is (used as) a value expression.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call that a "value expression context" (and "place expression context" for where a place is wanted). Is it only UB to evaluate an unaligned/dangling pointer in place expression context, or always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"value expression contexts" are a superset of "place expression contexts". IOW, the code above is sugar for
Evaluating a place expression in value expression context consists of first evaluating the place expression as normal, as then performing place-to-value conversion.
So, it is impossible to write
*expr
anywhere without it being also a place expression, but sometimes, it is both a place expression and a value expression (or really,place2value(*expr)
is the value expression, but since we have no syntax for this, people tend to say that*expr
is [used as] a value expression).That's the way I am thinking about this, anyway.
So, the answer is that it is always UB to evaluate an unaligned/dangling
*expr
since doing so always evaluates the place expression -- and then sometimes goes on performing place-to-value conversion. The important point that I hope to clarify in the docs is that it is the place expression evaluation, and not the place-to-value conversion, that is causing the UB.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a difference of perspectives here; these two categories are distinct and split the set of expression contexts. A place expression in value expression context is allowed and has the
place2value
effect, but it's still always a place expression.So, under my perspective, the
[place expression]
should be, even in [place expression contexts],
.I like this sentence. I think I'll try and get it into the PR I submitted last night (#1003 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, more like this?
@digama0 I'd be interested in your opinion here, is this current wording sufficiently clear? (I know you'd like the rules to be more relaxed; so do I. That is a longer process. For now the goal is to make sure that the rules as they currently are are described unambiguously.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM