Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transmute between functions with different return types, where one function never returns #266

Open
Michael-F-Bryan opened this issue Dec 15, 2020 · 34 comments

Comments

@Michael-F-Bryan
Copy link

A question came up on the community Discord today and I'd like to know if my reasoning was correct.

This was the original question:

Why is it impossible to cast fn() -> ! to fn() and would it be safe to perform the conversion with a transmute?

And the following conversation ensued:

Michael-F-Bryan: I think the only valid casts/coersions for a function are extending/shrinking lifetimes (e.g. between fn(&'static str) and fn<'a>(&'a str)) and non-capturing closures to function pointers.

#![cfg_attr(goat, no_std)]: yes, they are covariant over the return type
! is a subtype of all types, isn't it

Michael-F-Bryan: transmuting between them could be unsound because of things like Return Value Optimisation... Imagine you had a fn() -> [u8; 256] and LLVM decided to apply RVO, turning it into fn(&mut [u8; 256]) to elide the copy. Now you would be transmuting a function which accepts no arguments(fn() -> !) into a function that accepts one argument, and that may mess up your stack/registers depending on your calling convention.
just hypothesizing

#![cfg_attr(goat, no_std)]: the return type is ()
it's not subject to RVO
besides I think it is safe to ignore parameters in most if not all calling conventions
since you never return, you never observe side effects of failed RVO

(I then posted a playground example which runs the transmute with Miri, and Miri complains that it's UB)

#![cfg_attr(goat, no_std)]
oh
so ! is not memory layout compatible with any type

Michael-F-Bryan: I believe it's because the general case of transmuting between functions that return different things is unsound unless the returned value is a pointer
which is why it's valid for compilers to apply things like RVO
Miri was happy for me to turn a fn() -> *mut u8 into a fn() -> mut String

Is my reasoning correct that, generally, optimisations like RVO being sound is what makes this transmute is UB?

@bjorn3
Copy link
Member

bjorn3 commented Dec 15, 2020

Imagine you had a fn() -> [u8; 256] and LLVM decided to apply RVO, turning it into fn(&mut [u8; 256]) to elide the copy.

That is actually already done at the abi level without involving any optimizations. The x86_64 System-V abi among other abi's pass a return value pointer to a function when the return value can't be stored in registers. This extra argument always comes first. If it were to come last, it wouldn't really be a problem. At least for most abi's. https://github.com/rust-lang/rust/blob/e261649593cf9c2707f7b30a61c46c4469c67ebb/compiler/rustc_codegen_ssa/src/mir/block.rs#L623

@digama0
Copy link

digama0 commented Dec 15, 2020

For the case of extern "C" fn(T...) -> ! and extern "C" fn(T...), at least, I would assume that the calling conventions are in a subtype relationship, because a never-returning function uses the same ABI as a unit returning function (in particular it does not pass the return outparam in the first argument). For repr(Rust) I doubt we promise that, but I don't see a particular reason not to, and more to the point it should not be UB if you guess correctly (although I can imagine Miri would have a hard time checking ABI compatibility of functions with distinct types).

@Michael-F-Bryan
Copy link
Author

I would assume that the calling conventions are in a subtype relationship, because a never-returning function uses the same ABI as a unit returning function ... (although I can imagine Miri would have a hard time checking ABI compatibility of functions with distinct types)

It looks like Miri doesn't care about particular ABIs or subtyping and only lets you change the return type if it is a pointer. An example was made where we transmute from an extern "C" fn() -> u8 to a extern "C" fn() -> i8, which should have the same ABI shape, and Miri called it UB.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2813c647acecdbd5ea0b2dbe29f5cb29

The x86_64 System-V abi among other abi's pass a return value pointer to a function when the return value can't be stored in registers. This extra argument always comes first. If it were to come last, it wouldn't really be a problem.

Thanks. I had a feeling something like this could happen, but wasn't sure of the specifics.

For the case of extern "C" fn(T...) -> ! and extern "C" fn(T...), at least, I would assume that the calling conventions are in a subtype relationship

Is the overall area of subtyping function arguments/return values something that is worth specifying for people writing unsafe code? Or would we just leave it as the status quo, "it's UB to transmute between anything other than lifetimes (respecting variance) and pointer types"?

@digama0
Copy link

digama0 commented Dec 15, 2020

I would assume that the calling conventions are in a subtype relationship, because a never-returning function uses the same ABI as a unit returning function ... (although I can imagine Miri would have a hard time checking ABI compatibility of functions with distinct types)

It looks like Miri doesn't care about particular ABIs or subtyping and only lets you change the return type if it is a pointer. An example was made where we transmute from an extern "C" fn() -> u8 to a extern "C" fn() -> i8, which should have the same ABI shape, and Miri called it UB.

I just tried that with Wrapper<u8> where #[repr(C)] struct Wrapper<T>(T); and got the same error. I think the argument for the well-definedness of this transmutation is pretty strong, so I would say that Miri is simply being too conservative here.

For the case of extern "C" fn(T...) -> ! and extern "C" fn(T...), at least, I would assume that the calling conventions are in a subtype relationship

Is the overall area of subtyping function arguments/return values something that is worth specifying for people writing unsafe code? Or would we just leave it as the status quo, "it's UB to transmute between anything other than lifetimes (respecting variance) and pointer types"?

Here I mean subtyping in the dynamic sense, i.e. a function pointer of one kind can be used in all contexts where the other one could be used. This is really what unsafe authors care about, and in fact it may not be possible for the abstract machine to be able to tell the difference between a type and a transparent wrapper type around it, so I don't think it is possible for this to be declared as UB even if we wanted to. In particular, lifetimes don't actually exist in the rust abstract machine, just "stacked borrows" stored in the memory, so you can't use any wording like "lifetimes (respecting variance)" in the definition of whether this is UB.

As a more positive proposal, I would suggest not saying anything about function pointer casting being UB; all transmutes are okay, but calling a function requires that the input ABI match the ABI of the function, and returning from a function requires that the output ABI matches as well. This may have the effect of transmuting the function parameters if the ABI matches but the types don't, and this is not UB until the transmuted parameter is used, assuming the types don't match up.

@chorman0773
Copy link
Contributor

I just tried that with Wrapper where #[repr(C)] struct Wrapper(T);

The ABI considerations of those two types may be different.

I think that the original fn()->!->fn()->() should be permited, as I believe it's noted that an fn()->() pointer is ABI compatible with the C function type void(void), and that fn()->! is equivalent to _Noreturn void(void) (and because in C, _Noreturn does not affect the function type, those signatures must be compatible in C),

@Michael-F-Bryan
Copy link
Author

Michael-F-Bryan commented Dec 15, 2020

I would suggest not saying anything about function pointer casting being UB; all transmutes are okay, but calling a function requires that the input ABI match the ABI of the function, and returning from a function requires that the output ABI matches as well.

This sounds like a pretty reasonable rule-of-thumb 👍

It's also easier to specify what is valid instead of listing a bunch of exceptions where something isn't valid.

Here I mean subtyping in the dynamic sense, i.e. a function pointer of one kind can be used in all contexts where the other one could be used. This is really what unsafe authors care about, and in fact it may not be possible for the abstract machine to be able to tell the difference between a type and a transparent wrapper type around it, so I don't think it is possible for this to be declared as UB even if we wanted to.

This would be a big ticket item for me.

For example, I'd really like to declare FFI bindings like fn foo(Option<NonNull<Bar>>) instead of fn foo(*mut Bar), and that would only be possible if you can guarantee Option<NonNull<Bar>> is a transparent wrapper around *mut Bar.

You can do that in FFI code at the moment and it works because of #[repr(transparent)] and niche optimisation, but whenever I've talked to people they haven't been able to confidently say it's correct and guaranteed behaviour with respect to the Rust abstract machine.

@chorman0773
Copy link
Contributor

For example, I'd really like to declare FFI bindings like fn foo(Option<NonNull>) instead of fn foo(*mut Bar), and that would only be possible if you can guarantee Option<NonNull> is a transparent wrapper around *mut Bar.

This is already guaranteed, as part of Option.

@digama0
Copy link

digama0 commented Dec 15, 2020

I just tried that with Wrapper where #[repr(C)] struct Wrapper(T);

The ABI considerations of those two types may be different.

I think that the original fn()->!->fn()->() should be permited, as I believe it's noted that an fn()->() pointer is ABI compatible with the C function type void(void), and that fn()->! is equivalent to _Noreturn void(void) (and because in C, _Noreturn does not affect the function type, those signatures must be compatible in C),

I think you can substitute other types that are as layout equivalent as we have promises for, for example fn() -> ZST vs fn() -> (). If we consider all of these as distinct (following miri's lead) then even things like fn(Option<NonNull>) vs fn(*mut Bar) are UB, because we need a rule that says that fn(A, B, ...) -> R and fn(A', B', ...) -> R' are compatible if A,A', B,B', ..., R,R' are (for an appropriate sense of "compatible"), and it seems we don't have that.

@Nemo157
Copy link
Member

Nemo157 commented Dec 15, 2020

#[repr(C)] struct Wrapper<T>(T); specifically is not ABI compatible with T though, you probably want #[repr(transparent)] struct Wrapper<T>(T);

@burdges
Copy link

burdges commented Dec 15, 2020

I'd expect fn() -> [T] becomes a generator eventually ala rust-lang/rfcs#2884 but presumably via being sugar so the type itself becomes a generator and not problematic here.

@RalfJung
Copy link
Member

(I then posted a playground example which runs the transmute with Miri, and Miri complains that it's UB)

Note that the Miri checks for this are just "whatever seemed most sensible to me at the time". I tried to make them restrictive enough so that any code that is allowed is unambiguously correct, but it is very possible that the checks could be relaxed. This requires more knowledge about calling conventions and ABIs than I have, though. ;)

@chorman0773
Copy link
Contributor

It may be reasonable to adopt the C rules, which must work on any platform where the underlying ABI is used by C (and if there is no platform C abi, rust can do whatever it wants for extern"C"). Whether or not this would extend abis other than extern"C", and, in particular, if it extends to extern"Rust", would be up for debate.
C allows you to call a function via a function pointer, as long as the type of the function is compatible with a function pointer. Note that this definition determines whether the call can be performed at all, and would not determine library invariants, or permit violations of any validity invariants.
Compatibility is determined as follows (where the following describes an equivalence relation):

  • A type T is compatible with itself
  • In the return type position only, () is compatible with !. In all other positions they are not compatible(could also be unspecified).
  • A signed integer type is compatible with it's corresponding unsigned integer type
  • A type T is compatible with the type Transparent<T> where Transparent<T> is any type which is transparent over T.
  • All Inhabited Zero-sized types with alignment 1 are mutually compatible.
  • All of the following types are mutually compatible: NonNull<T>, Box<T>, &T, &mut T, an Option arround any of those types, *mut T, and *const T.
  • Two pointer types are compatible if both pointees are Sized, or the pointee types are compatible
  • Two slice types are compatible if the elements are compatible
  • Two structure types are compatible if they are both repr(C) structs, with the same number of members, and
  • Two union types are compatible if they are both repr(C) unions, with the same number of members, and each corresponding member has a compatible type. Note - the order of declaration of union fields is not considered for compatibility - End Note
  • A fundamental integer type T is compatible with NonZero<T>, where NonZero<T> is the corresponding NonZeroU* or NonZeroI* declared in core::num, and with Option<NonZero<T>>
  • bool is compatible with u8
  • An empty enum is compatible with !
  • Two function pointer types are compatible if both have the same ABI, both have the same number of parameters, where the corresponding parameters in declaration order are compatible, and the return types are compatible.
  • A function pointer type F is compatible with Option<F>.
  • An array type of length 1 is compatible with it's element type.
  • Two types T and U are compatible if there exists a type S, such that T is compatible with S, and S is compatible with U.

@RalfJung
Copy link
Member

RalfJung commented Dec 19, 2020

@chorman0773 that list is a great starting point, thanks!

In the return type position only, () is compatible with !. In all other positions they are not compatible(could also be unspecified).

C doesn't really have uninhabited types, so I am not entirely sure about this. You talked about _NoReturn above, but I am not sure if that is a fully adequate model of !.

By "in return type position", do you also mean nested positions in the return type? Like, is (!, ()) compatible with ((),()) as a return type?

A type T is compatible with the type Transparent where Transparent is any type which is transparent over T.

This seems to be doing a lot, and is getting close to the discussions around Wrapper<T> vs Wrapper<U>. In particular, "transparent over" is not defined.


I should also point at rust-lang/rust#56166: rustc's FnType encodes a lot (all?) of the information that is relevant for ABIs, so ideally it would be possible to have a check that is based solely on that. Having to do a type-based comparison would be somewhat unprecedented (except for validity, which is all about the type); I feel like this is closer to layouts and transmute, where everything Miri needs to know is determined by rustc's TyAndLayout.

@chorman0773
Copy link
Contributor

You talked about _NoReturn above, but I am not sure if that is a fully adequate model of !.

Well, in the return type position, I believe they are fundamentally equivalent. A function returning ! does not return (because it cannot return a value of type !, as no such value exists). Likewise, a function that is _Noreturn in C does not return, for a different reason (the fact that returning from a _Noreturn function is UB). I know that I have, in the past, relied on ->! being the same as _Noreturn void(..)/__attribute__((noreturn)) void(..). I'm sure others have as well.

By "in return type position", do you also mean nested positions in the return type? Like, is (!, ()) compatible with ((),()) as a return type?

Honestly, I do not know. My intention is that only in the top-level return-type position, that is, fn()->(), and fn()->!. However, it may be reasonable for things like fn()->UnsafeCell<()> and fn()->UnsafeCell<!> to be compatible. Right now, the example you provided is not compatible, because no compatibility is given to tuples (intentionally). One thing I want to do is exclude compatibility in the parameter position, so that a function that accepts a parameter of an uninhabited type can be converted into an uncallable function (which has a completely distinct ABI from one where all occurances of ! were replaced with ()).

This seems to be doing a lot, and is getting close to the discussions around Wrapper<T> vs Wrapper<U>. In particular, "transparent over" is not defined.

This rule is derived, it's just written for completeness. By definition, a transparent type has the same abi as it's transparent field, so they must be compatible types. For "transparent over" that's fine, it can be defined as the field of the transparent struct which is not a 1-ZST, or () if no such field exists. Additionally, MaybeUninit<T> is transparent over T by definition as well.

I feel like this is closer to layouts and transmute, where everything Miri needs to know is determined by rustc's TyAndLayout.

This was not an accident. The assumption is that if you pass in a parameter which is the wrong type, it gets transmuted into the correct one.

@RalfJung
Copy link
Member

This was not an accident. The assumption is that if you pass in a parameter which is the wrong type, it gets transmuted into the correct one.

Oh, sure. I am just saying, instead of having to implement your list, maybe it would also work for Miri to just check if the mode and size of all arguments and the return type match.

This rule is derived, it's just written for completeness. By definition, a transparent type has the same abi as it's transparent field

Well, a Transparent<T> could still be something like

#[repr(transparent)]
struct Transparent<T: Trait>(T::Assoc);

But I think this is just a matter of notation -- your use of generic notation <...> here seems suboptimal, but if all you want to say is that repr(transparent) structs are compatible with the type of their non-ZST field, that makes sense.

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 19, 2020

Well, a Transparent could still be something like

#[repr(transparent)]
struct Transparent<T: Trait>(T::Assoc);

That is true, though the definition implies that Transparent would have to be

#[repr(transparent)]
struct Transparent<T: Trait>(T);

with optional 1-ZST fields. It can also be a concrete type. As mentioned it's a matter of notation

instead of having to implement your list, maybe it would also work for Miri to just check if the mode and size of all arguments and the return type match.

Possibly, that could work. Though that may not work out using extern"C" on all platforms. That check would be a superset of the allowed rules in the compatibility list, types that are compatible would have the same size and alignment requirement, and would use the same ABI mode. This list could be constructed and implemented in a way that it is guaranteed to work correctly on any platform with a C target, as I mentioned.
It could also be noted that the list is non-exhaustive, and that other unspecified pairs can be included in the list. Though likely miri would want to go with the strict interpretation of whatever is chosen, certainly at least the rules that rustc itself would choose (so then if rustc choses that rule, miri could then implement only that rule).

@RalfJung
Copy link
Member

It could also be noted that the list is non-exhaustive, and that other unspecified pairs can be included in the list. Though likely miri would want to go with the strict interpretation of whatever is chosen, certainly at least the rules that rustc itself would choose (so then if rustc choses that rule, miri could then implement only that rule).

For type layouts, the stanza I took with Miri is that it will use whatever layout rustc happens to pick. IOW, Miri can not be used to determine if a program (incorrectly) relies on unspecified layout details. I think it makes sense to do the same for ABI concerns -- if, with whatever ABI rustc happens to pick, the call is well-defined, then Miri will let the program pass.

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 20, 2020

For type layouts, the stanza I took with Miri is that it will use whatever layout rustc happens to pick

Makes sense. In which case, stating that (in the absence of a preceeding rule), type compatibility is unspecified. So under this, it would be valid to rely only on the listed rules (which can be added to), and then miri can just check what rustc does.

Adding to this, should enums have compatibility guarantees at all? One thing that should definately be the case is that a repr(Int) enum with only unit variants are compatible with Int, and a repr(C) enum with only unit variants are compatible with an implementation-defined integer type.

@digama0
Copy link

digama0 commented Dec 20, 2020

I think these questions are already covered in other issues, and the UCG document. If there is layout compatibility, I don't see any reason to prevent indirectly transmuting parameters by calling a transmuted function pointer with a compatible ABI. We shouldn't be inventing rules beyond the already relatively well understood rules for layout compatibility (or lack thereof).

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 20, 2020 via email

@digama0
Copy link

digama0 commented Dec 20, 2020

I think that would be covered by ABI compatibility of the calls. If one function passes its 32 bit value in register EDI and the other passes it in a stack slot or puts a pointer to the data in EDI then that's an ABI incompatibility, and the function call is UB.

@chorman0773
Copy link
Contributor

My rules cover both, which would give the users the benefit of knowing what is not UB. Types which are compatible guaranteed to be both layout compatible (as far as raw data layout is concerned, disreguarding validity invariants) and abi compatible.

@digama0
Copy link

digama0 commented Dec 21, 2020

I don't know what the rules are on ABI compatibility for rust-call functions, but I think it needs its own issue.

@chorman0773
Copy link
Contributor

I don't know what the rules are on ABI compatibility for rust-call functions

This is why I left extern"Rust" functions asside for these purposes, and explicitly targeted extern"C". Other C abis could likewise use these rules.

@Michael-F-Bryan
Copy link
Author

Two pointer types are compatible if both pointees are Sized, or the pointee types are compatible

Does this need to have an "and" instead of "or"?

Or would accepting a pointer of a different type be equivalent to a normal pointer cast, with all the usual requirements that entails (alignment, out-of-bounds access, etc.).

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 21, 2020 via email

@comex
Copy link

comex commented Dec 23, 2020

Just be careful about guaranteeing compatibility between two function types if the function signatures as translated to C are different, and not guaranteed compatible by the C standard.

This could break under any control-flow integrity scheme that verifies at runtime, before each call to a function pointer, that the called function has a compatible type. Clang supports this in software, and it could also be implemented on top of ARMv8.3 hardware pointer authentication. (That said, Apple's ARM pointer authentication implementation at least does not verify types for C function pointers, since doing so breaks a lot of real code. But that might change in the future. It does already verify types for C++ vtable calls.)

Of course, this isn't a problem for types that only exist in Rust, such as ! and #[repr(transparent)] structs, where rustc can unilaterally guarantee that it will treat them, for code generation purposes, as if they were some canonical C type.

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 23, 2020 via email

@digama0
Copy link

digama0 commented Dec 23, 2020

This could break under any control-flow integrity scheme that verifies at runtime, before each call to a function pointer, that the called function has a compatible type.

In order for something like this to work, I assume that we have to say that the function being used has a type, though, with at least enough specificity for this check to be able to work with it. I think it is reasonable to consider that a separate calling convention of its own, possibly extern "C", and any function with that calling convention has to stick to the C rules for compatibility

@comex
Copy link

comex commented Dec 24, 2020

@chorman0773
Your list goes beyond C in some places, though – e.g. specifying that a signed integer type is compatible with its corresponding unsigned type, or that all pointer types are compatible as long as they point to Sized types.

And my point is that even if two types are similar enough that 'any sane ABI' would treat them the same way, if they're different C types and not C-spec compatible, then replacing one with the other in a function signature may actually cause incompatibility in practice – in the presence of a control-flow integrity system which verifies function signature compatibility on all calls through function pointers.

I don't think there is anything currently guaranteed by Rust that would prevent it from working with such a control-flow integrity system. Rust does make stronger guarantees than C about in-memory layout, and I think most if not all of your list would be justified there, but not for function calling ABI.

(Also, even without CFI, the standard 32-bit ARM ABI sometimes represents a length-1 array differently from its element type.)

@comex
Copy link

comex commented Dec 24, 2020

In order for something like this to work, I assume that we have to say that the function being used has a type, though, with at least enough specificity for this check to be able to work with it. I think it is reasonable to consider that a separate calling convention of its own, possibly extern "C", and any function with that calling convention has to stick to the C rules for compatibility

(Sorry for the double post.)

Yes, my point about CFI is mainly referring to extern "C". In principle, extern "Rust" could be more permissive. Intuitively, though, I would expect extern "Rust" to guarantee less than extern "C" about compatibility between not-quite-the-same declarations, just as repr(Rust) does compared to repr(C).

@digama0
Copy link

digama0 commented Dec 24, 2020

I agree, or rather I think this is an unfortunate overlapping of the meanings of extern "C" meaning "stable ABI" vs "what C does". I don't think we can really appropriate extern "C" to mean other than "what C does" because it's in the name, but this makes me wonder about some of the more unusual aspects of C/C++ types, like object lifetime. I guess FFI is still somewhat unspecced but it's really not obvious to me what happens to the identity of C types if they pass through Rust with its flat memory model. But I guess that's more to do with actual execution behavior once you enter the function, whereas CFI would interfere with the call itself.

Regarding extern "Rust" functions, I think the analogous approach to layout compatibility would be to say that as long as the ABIs match the call will transmute the parameters as I mentioned earlier, however we provide no (or extremely limited) stable guarantees about ABI matching for functions with different signatures unless they only differ up to lifetime args (which I think is what Miri currently tests for).

@chorman0773
Copy link
Contributor

Your list goes beyond C in some places

Actually, signed and unsigned integers are compatible, IIRC.
The pointer rule is fair, though it's really sort of implied by rust. It could probably be reduced to "pointer types are compatible if there pointees are compatible".

@chorman0773
Copy link
Contributor

the standard 32-bit ARM ABI sometimes represents a length-1 array differently from its element type

I'm not particularily attached to the [T;1]=>T rule, or the *mut T=>*mut U general rule. They just seem like the kind of rule that makes sense for rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants