Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null trait object raw pointers are UB #63851

Open
nico-abram opened this issue Aug 24, 2019 · 25 comments
Open

null trait object raw pointers are UB #63851

nico-abram opened this issue Aug 24, 2019 · 25 comments
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull A-trait-objects Area: trait objects, vtable layout C-discussion Category: Discussion or questions that doesn't represent real issues. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team

Comments

@nico-abram
Copy link
Contributor

Opening after some discussion in rust-lang/miri#918 (comment)

This playground snippet crashes on an illegal instruction on release: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=7c69493026add62256996d204e1278c0
Are vtable ptrs in trait object ptrs NonNull? Is that intended? This (2 year old) comment would suggest so rust-lang/rfcs#433 (comment)

CC @RalfJung

@bjorn3
Copy link
Member

bjorn3 commented Aug 24, 2019

trait T { }

fn main() {
    a(unsafe { std::mem::zeroed() });
}

fn a(_: *mut dyn T) {}
; ...
    call void @_ZN10playground1a17hbc18b5e956a8e54dE({}* %1, [3 x i64]* noalias readonly align 8 dereferenceable(24) %2), !dbg !145
; ...

The vtable is marked as dereferencable(24).

@RalfJung
Copy link
Member

Thanks for reporting!

In rust-lang/unsafe-code-guidelines#166 we have been discussing, rather theoretically, the validity requirements for wide raw pointers. We were not actually aware that there already are validity requirements in place for them. That seems very surprising to me. Was this ever discussed in depth, or did it "just happen"?

Cc @rust-lang/wg-unsafe-code-guidelines @eddyb

@jonas-schievink jonas-schievink added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Aug 24, 2019
@RalfJung
Copy link
Member

@bjorn3 on top of that, *const dyn Send has size 16 -- so, we do enum niche optimizations.

@RalfJung RalfJung changed the title Possible null trait object pointer misscompolation null trait object raw pointers lead to SIGILL Aug 24, 2019
@RalfJung
Copy link
Member

If we decide this is not a bug, we'd have to adjust https://doc.rust-lang.org/nightly/nomicon/what-unsafe-does.html (and also the reference). It would be a very special case to require anything for raw pointers that are not dereferenced.

@eddyb
Copy link
Member

eddyb commented Aug 24, 2019

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

*mut dyn Trait corresponds to (pseudo-notation) dyn T: Trait.*mut T or, further expanded, exists T.(*mut T, &<T as Trait>::vtable).

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem (or rather, the supposed typesystem that may have never been fully formally modeled - I've been meaning to do a writeup on dyn for a while now).

In custom DST terms, we could treat *mut T as (*mut T::Data, MaybeUninit<T::Meta>) (whereas &T would always have an initialized T::Meta), so overall maybe the cost of supporting this difference between raw pointers and references might not be that large.

There's also something to be said about having a null vtable, that Go people have some experience with (but I don't, so I'm not the best person to comment on it).

@RalfJung
Copy link
Member

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

The curious part here is that no one involved in the UCG realized that this is the current status.

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem

Where is there a complication here? I mean not in terms of the implementation, that shouldn't be a primary concern in terms of language design, but in terms of documentation and teaching.

Requiring vtables of raw pointers to be valid complicates the language as well, it would be an exception from the principle to not require things off of raw pointers.

In custom DST terms, we could treat *mut T as (*mut T::Data, MaybeUninit<T::Meta>)

I was more thinking of *mut dyn Trait as (usize, usize). But your proposal scales better to more generalized metadata, indeed. I like it.

@eddyb
Copy link
Member

eddyb commented Aug 24, 2019

I don't even think I'm the first to come up with that, but I can't remember who did.

In the end, I guess it's a tradeoff between:

  • safe size_of_val for *mut T, even for T: !Sized
  • ptr::null::<T>(), even for T: !Sized
    • I think this was discussed before, since I remember Go being brought up

(these are not the only outcomes, but they should be somewhat representative)

@strega-nil
Copy link
Contributor

If one reads the custom DST proposal, there's a reason that we expect raw pointers to DSTs to have valid metadata. The rawness of the pointer is unrelated to the rawness of the metadata, imo -- even if you have *mut [T], you expect the length to be a usize.

@comex
Copy link
Contributor

comex commented Aug 24, 2019

  • safe size_of_val for *mut T, even for T: !Sized

There's also the ability for methods taking self: *const Self to be object safe. For now, such methods can be written and are treated as object safe, but the functionality is guarded by the arbitrary_self_types feature. The tracking issue for that feature has some discussion of whether raw pointer methods should be object safe.

(It claims that "even today you could UB in safe code with mem::size_of_val(foo)", where foo: *const dyn Foo, but I don't think that's true? size_of_val doesn't currently work on raw pointers.)

@scottmcm
Copy link
Member

It seems like this is at least somewhat expected, because if all pointers could be zeroed then we wouldn't have had so many conversations about how to make ptr::null() work with extern type, and would have just changed its implementation to unsafe { mem::zeroed() }.

Also, do any of the choices here have any impacts to the optimizations to never reload vtables when multiple calls get made?

@hanna-kruppe
Copy link
Contributor

FWIW, ptr::null being restricted to Sized types is not just because of trait object vtables, there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

@RalfJung
Copy link
Member

safe size_of_val for *mut T, even for T: !Sized

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object.
So, if we want to be future-compatible with custom DST, I think requiring the metadata to be valid for raw pointers buys us pretty much nothing?

There's also the ability for methods taking self: *const Self to be object safe.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

@eddyb
Copy link
Member

eddyb commented Aug 25, 2019

there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

But if we go for MaybeUninit<T::Meta>, then that's uniform across all types.

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object.

Yeah it would be nice to not tie anything to the T::Meta but always require a reference, since you wouldn't need several traits, just something like DynSized. I wonder what we lose in that case.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

So "it's UB to call with invalid Self::Meta"? Sounds reasonable.


Overall I feel like we should look into this a bit more, but I can see how the "low-level compromise" lets us focus on &T and not worry about raw pointers (e.g. regarding custom DSTs).

It's almost like raw pointers are union { &UnsafeCell<T>, usize }, huh.

@RalfJung
Copy link
Member

RalfJung commented Aug 26, 2019

Turns out that for raw slices, safe code can cause them to have invalid metadata:

123456789 as *const [(); usize::max_value()] as *const [()] as *const [u64]

Ouch.

@eddyb
Copy link
Member

eddyb commented Aug 26, 2019

@RalfJung Wow, so we couldn't actually have <[T]>::Meta = (T::Meta, usize), it would have to be something opaque that depends on the size of T (which in the future might get very complicated with T: !Sized slice elements).

@RalfJung
Copy link
Member

Looks like probably it should be just usize, and the max-size comes in via the requirement that references (including slices) must point to memory that is allocated in a single allocation -- which for LLVM means it cannot be bigger than isize::MAX.

So, I was just interpreting that length requirement the wrong way so far. I'll adjust the various PRs (Reference, Nomicon, Miri) accordingly.

@strega-nil
Copy link
Contributor

strega-nil commented Aug 26, 2019

@RalfJung The issue is not invalid metadata -- invalid metadata is fine. The big thing that you want is initialized metadata.

*const UnsizedType should contain <UnsizedType>::Meta no matter what, but there's no guarantee on the validity of that metadata for the pointed to object.

@RalfJung
Copy link
Member

"initialized" as a term turned out to be pretty unsuited. I mean "valid" specifically as in "validity invariant".

@strega-nil
Copy link
Contributor

Sure, whatever. It needs to be a valid value of type <T>::Meta

@eddyb
Copy link
Member

eddyb commented Aug 29, 2019

@ubsan You can circumvent that by defining raw pointers as having MaybeUninit<T::Meta> instead of T::Meta directly.

@strega-nil
Copy link
Contributor

@eddyb you can, but I don't think there's a reasonable argument for it. Creating and destructuring raw pointers safely seems really nice to me.

@RalfJung
Copy link
Member

"really nice" is IMO not a sufficient argument for introducing this kind of UB. We should have some optimization that's worth it, ir some amount of spec simplification that's worth it.

The "default" answer should be "let's avoid UB, to make unsafe code author's lives easier".

@strega-nil
Copy link
Contributor

strega-nil commented Aug 29, 2019

@RalfJung how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

I would argue that this is removing UB that people get when calling methods through a *mut Trait

@RalfJung
Copy link
Member

how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

That would be an example of an operation that becomes UB under your semantics. Doesn't seem too strange to me to zero-initialize a struct with a wide raw pointer field.

I would argue that this is removing UB that people get when calling methods through a *mut Trait

There is literally no program that is UB under my semantics but not under yours.

This is not about the safety invariant, it is about the validity invariant.

@strega-nil
Copy link
Contributor

strega-nil commented Aug 29, 2019

@RalfJung shrug

I don't have a horse in this race. I just think they're nicer semantics. If you think it's more important to allow mem::zeroed than access to the metadata being safe, that's your prerogative.

@RalfJung RalfJung changed the title null trait object raw pointers lead to SIGILL null trait object raw pointers are UB Feb 29, 2020
@fmease fmease added A-raw-pointers Area: raw pointers, MaybeUninit, NonNull T-opsem Relevant to the opsem team A-trait-objects Area: trait objects, vtable layout C-discussion Category: Discussion or questions that doesn't represent real issues. labels Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull A-trait-objects Area: trait objects, vtable layout C-discussion Category: Discussion or questions that doesn't represent real issues. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

No branches or pull requests

10 participants