Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify UB for raw ptr deref #1000

Merged
merged 3 commits into from
Apr 7, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions src/behavior-considered-undefined.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ code.
</div>

* Data races.
* Dereferencing (using the `*` operator on) a dangling or unaligned raw pointer.
* Evaluating a dereference [place expression] (`*expr`) on a raw pointer that is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird to me, because a dereference is always a place expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it can be used as a value expression as in

let x = *expr;

Arguably, here it is correct to say that *expr is (used as) a value expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call that a "value expression context" (and "place expression context" for where a place is wanted). Is it only UB to evaluate an unaligned/dangling pointer in place expression context, or always?

Copy link
Member Author

@RalfJung RalfJung Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"value expression contexts" are a superset of "place expression contexts". IOW, the code above is sugar for

let x = place2value(*expr);

Evaluating a place expression in value expression context consists of first evaluating the place expression as normal, as then performing place-to-value conversion.

So, it is impossible to write *expr anywhere without it being also a place expression, but sometimes, it is both a place expression and a value expression (or really, place2value(*expr) is the value expression, but since we have no syntax for this, people tend to say that *expr is [used as] a value expression).

That's the way I am thinking about this, anyway.

So, the answer is that it is always UB to evaluate an unaligned/dangling *expr since doing so always evaluates the place expression -- and then sometimes goes on performing place-to-value conversion. The important point that I hope to clarify in the docs is that it is the place expression evaluation, and not the place-to-value conversion, that is causing the UB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"value expression contexts" are a superset of "place expression contexts"

We have a difference of perspectives here; these two categories are distinct and split the set of expression contexts. A place expression in value expression context is allowed and has the place2value effect, but it's still always a place expression.

So, under my perspective, the [place expression] should be , even in [place expression contexts],.


Evaluating a place expression in value expression context consists of first evaluating the place expression as normal, as then performing place-to-value conversion.

I like this sentence. I think I'll try and get it into the PR I submitted last night (#1003 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, more like this?

@digama0 I'd be interested in your opinion here, is this current wording sufficiently clear? (I know you'd like the rules to be more relaxed; so do I. That is a longer process. For now the goal is to make sure that the rules as they currently are are described unambiguously.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

[dangling] or unaligned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was raised in the Zulip conversation, but I'll nevertheless raise it here: I wonder if a word other than e-valu-ating could be used when talking about a place expression (although I couldn't come up with one satisfactory enough, so feel free to disregard this nit).

Copy link
Member Author

@RalfJung RalfJung Apr 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly the right term is "placating".

On a more serious note, this is one reason why I argued fiercly against calling these things "values"/"value expressions", but I was unable to convince enough people to make a difference...

* Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped
[noalias] model, except if the `&T` contains an [`UnsafeCell<U>`].
* Mutating immutable data. All data inside a [`const`] item is immutable. Moreover, all
Expand All @@ -45,7 +46,7 @@ code.
* A `!` (all values are invalid for this type).
* An integer (`i*`/`u*`), floating point value (`f*`), or raw pointer obtained
from [uninitialized memory][undef], or uninitialized memory in a `str`.
* A reference or `Box<T>` that is dangling, unaligned, or points to an invalid value.
* A reference or `Box<T>` that is [dangling], unaligned, or points to an invalid value.
* Invalid metadata in a wide reference, `Box<T>`, or raw pointer:
* `dyn Trait` metadata is invalid if it is not a pointer to a vtable for
`Trait` that matches the actual dynamic trait the pointer or reference points to.
Expand All @@ -62,6 +63,17 @@ a restricted set of valid values. In other words, the only cases in which
reading uninitialized memory is permitted are inside `union`s and in "padding"
(the gaps between the fields/elements of a type).

> **Note**: Undefined behavior affects the entire program. For example, calling
> a function in C that exhibits undefined behavior of C means your entire
> program contains undefined behaviour that can also affect the Rust code. And
> vice versa, undefined behavior in Rust can cause adverse affects on code
> executed by any FFI calls to other languages.

[place expression]: expressions.md#place-expressions-and-value-expressions
RalfJung marked this conversation as resolved.
Show resolved Hide resolved

### Dangling pointers
[dangling]: #dangling-pointers

A reference/pointer is "dangling" if it is null or not all of the bytes it
points to are part of the same allocation (so in particular they all have to be
part of *some* allocation). The span of bytes it points to is determined by the
Expand All @@ -71,12 +83,6 @@ that slices and strings point to their entire range, so it is important that the
metadata is never too large. In particular, allocations and therefore slices and strings
cannot be bigger than `isize::MAX` bytes.

> **Note**: Undefined behavior affects the entire program. For example, calling
> a function in C that exhibits undefined behavior of C means your entire
> program contains undefined behaviour that can also affect the Rust code. And
> vice versa, undefined behavior in Rust can cause adverse affects on code
> executed by any FFI calls to other languages.

[`bool`]: types/boolean.md
[`const`]: items/constant-items.md
[noalias]: http://llvm.org/docs/LangRef.html#noalias
Expand Down