Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr(int) fieldless enums are ABI-compatible with int #128600

Closed
wants to merge 2 commits into from

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Aug 3, 2024

Given that we already allow mixing char and u32, as well as usize/isize and the corresponding fixed-size type, it seems odd to not also do this for enums. However, if we declare this legal, then Rust code may start relying on it, so arguably CFI should also allow this, which could miss some real bugs where the mismatch is accidental.

So we have to decide -- do we say this is ABI-compatible, or do we declare this to be "erroneous behavior": similar to overflow in (non-wrapping) arithmetic, such behavior is always a bug, but implementations are not required to detect the bug (with a panic or abort), they can also continue execution in some well-defined way. (For the ABI question, that well-defined way is to transmute the argument from the caller type to the callee type. This is well-defined in the sense that we say exactly what happens, but it may be UB if the caller uses e.g. u32 and the callee expects a repr(u32) enum for which the chosen argument value is not valid.)

I am not sure if we are already guaranteeing anything like this somewhere. (The nomicon talks about these repr flags affecting the ABI, but the nomicon is not normative, and also it's not entirely clear whether ABI here also includes function call concerns.)
Cc @rust-lang/lang @rust-lang/opsem

This is part of a wider set of discussions around CFI vs ABI compatibility:

@rustbot
Copy link
Collaborator

rustbot commented Aug 3, 2024

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 3, 2024
@rustbot
Copy link
Collaborator

rustbot commented Aug 3, 2024

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@RalfJung RalfJung added the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 3, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Aug 3, 2024

@maurer do you have any concerns with this guarantee from a CFI perspective? Does CFI accept or reject calls where caller and callee disagree on whether to use a repr(iN) enum or the corresponding integer type?

@maurer
Copy link
Contributor

maurer commented Aug 5, 2024

Your change as it stands won't break anything, but it also won't make CFI across ABI-compatible enums succeed, so it would have the downside that there would be more behavior that is "legal Rust but not CFI legal", which we generally want to minimize as much as possible.

This does go against how CFI works in C/C++ - there, it's not legal to perform the function pointer casts you're legalizing here, so it would be different. It also does lessen the protections somewhat, as distinct enums with the same #[repr] will not be considered different type sfor defense purposes.

The tag for fn foo(x: MyEnum);, will currently encode MyEnum as an opaque type, so it'll be the equivalent to the C signature void (*)(MyEnum); and be processed that way.

If we want this ABI compatibility to extend to CFI, we should add a case to the transform which changes #[repr({u,i}N)] enums into {u,i}N during the type transformation before consumption.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2024

Thanks!

Given that I've not touched the CFI code before, I'd prefer if you could take care of that -- either as a parallel PR (I could then cherry-pick your patches) or as a follow-up.

@maurer
Copy link
Contributor

maurer commented Aug 5, 2024

OK. This does technically weaken CFI, as it means that in this case:

#[repr(u8)]
enum A {
  X = 0,
}

#[repr(u8)]
enum B {
  Y = 1,
}

fn foo(x: A) {
// ...
}
fn bar(x: B) {
// ...
}

fn baz(f: fn(x: A)) {
  f(A::X)
}

If an attacker got control over the baz parameter, they could send it to bar instead of foo. Prior to this change, CFI could prevent this.

I'm a firm believer that CFI should only enforce the language rules, so if we're going to flatten enums like this for purposes of function pointer casts, we'll need to weaken CFI like this, but I do want to point out that it will weaken CFI in Rust. It might weaken it enough that it might make sense to have this behavior flagged.

@maurer
Copy link
Contributor

maurer commented Aug 5, 2024

cc @rcvalle to avoid any surprise conflicts when it comes to implementation.

The tl;dr is that Ralf is legalizing this cast:

#[repr(i8)]
enum A {}
#[repr(i8)]
enum B {X = 0}
fn f(x: fn(A)) {
   let y: fn(B) = unsafe { std::mem::transmute(x) }
   y(B::X)
}

and we'd have to weaken (possibly behind a flag) the Rust CFI support so that it doesn't trigger on it.

@maurer
Copy link
Contributor

maurer commented Aug 5, 2024

Without my CFI hat on, what is your plan for the semantics of such casts in the event that they receive an out-of-range value for the enum?

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2024

I'm a firm believer that CFI should only enforce the language rules, so if we're going to flatten enums like this for purposes of function pointer casts, we'll need to weaken CFI like this, but I do want to point out that it will weaken CFI in Rust. It might weaken it enough that it might make sense to have this behavior flagged.

Right, that's why this is flagged for t-lang discussion and it is why I asked for your input for that discussion. :) We could also declare this to be "erroneous behavior", where we say that if this happens it is considered a bug, but detecting the bug is not required by implementations. If the bug is not detected, the call proceeds as-if there was a transmute.

@chorman0773 suggested this should be documented as guaranteed; not sure if they have a concrete usecase in mind or whether it just seems "obvious". Note that we already declare NonZeroI32 and i32 to be compatible which can also lead to UB when someone passes zero that way.

Without my CFI hat on, what is your plan for the semantics of such casts in the event that they receive an out-of-range value for the enum?

That's already documented as part of the general ABI compatibility rules: it acts like a transmute, so for out-of-range values the call is still UB.

@rcvalle
Copy link
Member

rcvalle commented Aug 5, 2024

As with char and u32 we could add this to the integer normalization option for CFI, but for non extern "C" functions only; otherwise, it'd break cross-language CFI support. @RalfJung does the ABI provide any guarantees about cross-language compatibility of those?

@maurer
Copy link
Contributor

maurer commented Aug 5, 2024

Shouldn't x-lang be a non-issue, as those all need to be #[repr(C)] if the enum is present in C, which is excluded from this flattening?

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2024

The ABI compatibility rules apply to all ABIs, including extern "C". So this should be accepted for extern "C" as well if the PR lands. (And the existing compatible pairs like char-u32 or i32-NonZeroI32, should already be accepted over extern "C".)

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2024

does the ABI provide any guarantees about cross-language compatibility of those?

AFAIK we guarantee that i32 is ABI-compatible with int32_t etc, though I do not know where this is documented exactly. By transitivity this makes a repr(i32) enum (and NonZeroI32) ABI-compatible with int32_t (for any ABI string).

@rcvalle
Copy link
Member

rcvalle commented Aug 5, 2024

Shouldn't x-lang be a non-issue, as those all need to be #[repr(C)] if the enum is present in C, which is excluded from this flattening?

It seems these reprs aren't mutually exclusive (see https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc); otherwise yes, we could add it to the integer normalization option for CFI for non #[repr(C)] enums only instead, but it doesn't seem we could here.

@rcvalle
Copy link
Member

rcvalle commented Aug 5, 2024

@RalfJung would, for example, #[repr(C, u8)] and #[repr(u8)] be ABI compatible?

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2024

@RalfJung would, for example, #[repr(C, u8)] and #[repr(u8)] be ABI compatible?

That is an odd combination. The docs say

For field-less enums, the C representation has the size and alignment of the default enum size and alignment for the target platform's C ABI.

but for repr(C, u8) this can't be right. So seems to me like the C is basically a NOP here so I would not expect it to affect layout or ABI in any way. (We are only talking about fieldless enums here.)

@maurer
Copy link
Contributor

maurer commented Aug 5, 2024

If #[repr(C, u8)] is legal, I think it would be correct to treat the C as a suppression indicator for this - if the other language is C++, two distinct enum types which have the same representation are still assumed non-aliasing when behind a pointer, so flattening them to make them compatible seems weird.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2024

I don't see how the C/C++ pointer aliasing rules should be relevant here. We aren't even talking about pointer types here, and furthermore Rust deliberately does not want C/C++-style "strict aliasing" rules, so Rust-to-Rust calls should not be subject to such rules even when they go via extern "C". (Rust also already declares all pointer types to be mutually ABI compatible.)

This PR is entirely about by-values passing of enums, let's stick to that in the discussion here.

@rcvalle
Copy link
Member

rcvalle commented Aug 5, 2024

but for repr(C, u8) this can't be right. So seems to me like the C is basically a NOP here so I would not expect it to affect layout or ABI in any way. (We are only talking about fieldless enums here.)

If that is the case, I think we could use @maurer's suggestion and add it to the integer normalization option for CFI for non #[repr(C)] enums only.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2024

I said the C should be a NOP, i.e. both repr(u8) and repr(C, u8) should be compatible with u8 if the enum is fieidless. So, they should both be integer normalized.

@maurer
Copy link
Contributor

maurer commented Aug 5, 2024

Aliasing is relevant here because CFI is essentially about labeling function pointers based on aliasing rules.

If I make a function pointer out of a Rust extern "C" fn foo(x: SomeEnum) and SomeEnum is flattened despite being #[repr(C)], and pass the result to C++, and C++ expects that function pointer to take a SomeEnum, when C++ tries to call in, it'll fail.

I'm not suggesting we add strict-aliasing semantics to Rust, but make it possible for Rust to communicate with a language which assumes it.

@rcvalle
Copy link
Member

rcvalle commented Aug 5, 2024

Could we filter out by #[repr(C], and non #[repr(C, X)] and fieldless? Is #[repr(C, X)] and fieldless expected to be used cross-language (i.e., with FFI or across the FFI boundary)?

@chorman0773
Copy link
Contributor

@RalfJung I was mostly asking a general question as I was formalizing the abi compat rules - whether or not #[repr(Int)] enum Foo { Var0 = 1, ...} (where Foo has no fields) is #[repr(C)] struct Foo(Int);, #[repr(transparent)] struct Foo(Int); or something else entirely for abi purposes is unclear.
My more general use is compatibility between a fieldless enum and the appropriate C/C++ enum type, which I do rely on when writing C++ bindings, and currently writing my crabi++ library in C++. I believe that, except for CFI, this matches the underlying type on all abis I'm aware of (in particular, it doesn't have aggregate abi on arm, where that matters).

@fee1-dead
Copy link
Member

The implementation looks fine, though I'm also curious on the answer to

Without my CFI hat on, what is your plan for the semantics of such casts in the event that they receive an out-of-range value for the enum?

On the compiler side, we seem to separate the caller ty and the callee ty, so it would be possible to make this one-direction only. (i.e. legalize only fn(i32) casted to fn(A) where A is repr(i32) but not the other way around)

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2024

The implementation looks fine, though I'm also curious on the answer to

I have already answered this above and it is answered in the stable docs:

If the declared signature and the signature of the function pointer are ABI-compatible, then the function call behaves as if every argument was transmuted from the type in the function pointer to the type at the function declaration, and the return value is transmuted from the type in the declaration to the type in the pointer. All the usual caveats and concerns around transmutation apply; for instance, if the function expects a NonZero and the function pointer uses the ABI-compatible type Option<NonZero>, and the value used for the argument is None, then this call is Undefined Behavior since transmuting None::<NonZero> to NonZero violates the non-zero requirement.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2024

Could we filter out by #[repr(C], and non #[repr(C, X)] and fieldless? Is #[repr(C, X)] and fieldless expected to be used cross-language (i.e., with FFI or across the FFI boundary)?

I don't understand what you mean by "filter out".

However, when I try to use this repr I can an error:

error[E0566]: conflicting representation hints
  --> tests/pass/function_calls/abi_compat.rs:83:12
   |
LL |     #[repr(C, u64)]
   |            ^  ^^^
   |
   = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
   = note: for more information, see issue #68585 <https://github.com/rust-lang/rust/issues/68585>
   = note: `#[deny(conflicting_repr_hints)]` on by default

So I think we do not have to worry about repr(C, u64).

@RalfJung RalfJung force-pushed the fieldless-enum-abi-compat branch from f5b183f to 6e42535 Compare August 6, 2024 09:11
@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2024

My more general use is compatibility between a fieldless enum and the appropriate C/C++ enum type

That seems like something to be added to some sort of FFI documentation explaining our cross-language ABI compatibility rules.

@bors
Copy link
Contributor

bors commented Aug 7, 2024

☔ The latest upstream changes (presumably #128761) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung RalfJung force-pushed the fieldless-enum-abi-compat branch from 6e42535 to 82ee3cd Compare August 7, 2024 09:51
@RalfJung
Copy link
Member Author

RalfJung commented Aug 7, 2024

If I make a function pointer out of a Rust extern "C" fn foo(x: SomeEnum) and SomeEnum is flattened despite being #[repr(C)], and pass the result to C++, and C++ expects that function pointer to take a SomeEnum, when C++ tries to call in, it'll fail.

Isn't this a problem with all kinds of "normalization", e.g. int vs int32_t being the same type in Rust but potentially considered different types in C++ that must not be mixed up across function call boundaries? Or am I misunderstanding something?

@fee1-dead
Copy link
Member

Marking as waiting on team.

@fee1-dead fee1-dead added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 8, 2024
@CAD97
Copy link
Contributor

CAD97 commented Aug 8, 2024

I'm in favor of #[repr(int)] enum being ABI/CFI compatible with int. Imo if transmute-by-abi is allowed for #[repr(transparent)], it only makes sense to also allow it for fieldless #[repr(int)]. You can even make a decent argument that dataful #[repr(int)] have their ABI structurally defined as #[repr(C)] union per the repr RFC.

C says that enumerated types are compatible with their underlying type. That would suggest that we should make repr(int) ABI/CFI compatible with int. However, C's concept of type compatibility is not transitive. So C allows the declaration void f(enum E) to link to the definition void f(int), but not to the definition void f(enum Q).

I'm also not sure exactly how compatibility is determined between C and Rust. Do we ignore the tag equivalence requirement so that enum E can be compatible with enum _RNtC3lib1E? Since Rust doesn't have #[no_mangle] for types, the only answer I can come up with is that Rust uses a separate tag namespace from C and a tag in one namespace is compatible with all tag names1 in the other namespace.

At an absolute minimum, C enum needs to be CFI compatible with Rust iN. Given a C enum that is used as bitflags, the proper translation into Rust isn't #[repr(int)] enum E { … } but #[repr(transparent)] struct E(int); so that all values are valid, not just the enumerated ones.

Footnotes

  1. Note that I'm using tag name to refer to only the name; types must also be structurally compatible to be considered compatible.

@traviscross traviscross added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Aug 8, 2024
@RalfJung
Copy link
Member Author

I'm going to close this PR... it probably doesn't make a lot of sense to discuss this in isolation; instead, there is a wider discussion to be had about CFI vs ABI compatibility. Also see:

This probably warrants a t-lang design meeting, but I don't currently have the capacity to drive something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-nominated Nominated for discussion during a lang team meeting. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants