-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple incompatible function declarations #198
Comments
Interaction with C quickly gets messy to consider (and we can probably put this under "FFI is unsafe"), but what seems worrisome to me is the safe pure Rust version of this: #[no_mangle]
/* extern "Rust" */ // this is implicit
unsafe fn foo(x: *mut i32, y: *mut i32) -> i32 {
*x = 42;
*y = 13;
return *x;
}
fn main() {
let mut x = 0;
let raw = &mut x as *mut _;
let r = unsafe { foo(raw, raw); }
println!("{}", r);
}
#[cfg(bad)]
mod bad_but_unused {
extern "Rust" { fn foo(x: &mut i32, y: &mut i32); }
} Without That seems like a problem. |
Maybe we should just not add |
Yes, agreed.
No, I wasn't able to come up with an example where that happened. What LLVM currently does is merge the attributes of declarations "somehow". So here: mod bad0 { extern "C" { fn foo(x: NonNull<i32>); } }
mod bad1 { extern "C" { fn foo(x: *mut i32); } } it will make the argument of The moment you add a definition, a different kind of merging happens, but I wasn't able to get LLVM to propagate attributes from declarations to definitions, only from definitions to declarations. So here: extern "C" fn foo(x: NonNull<i32>) {}
mod bad1 { extern "C" { fn foo(x: *mut i32); } }
extern "C" fn foo(x: *mut i32) {}
mod bad1 { extern "C" { fn foo(x: NonNull<i32>); } } That is, there,
The question that we should resolve is what transformations are valid for LLVM-IR ? What you propose might work as long as it is invalid for LLVM to propagate attributes from a declaration to a definition. It wouldn't be great if adding a declaration without Alternatively, maybe there is a pass that merges function declarations, and we can just tune that? (e.g. not run the pass). |
I see. So what we really need is a clear statement from the LLVM devs about propagation of attributes between declaration(s) and definition of a function.
It wouldn't be great but it also wouldn't be UB. |
Obvious in hindsight, but this also applies to function attributes like #![feature(unwind_attributes)]
pub mod bad0 {
extern "Rust" { /* #[unwind(allow)] */ fn foo(); }
pub unsafe fn call() -> usize { foo as usize }
}
pub mod bad1 {
extern "Rust" { #[unwind(allow)] fn foo(); }
pub unsafe fn call() { foo() }
} where because |
An alternative here might be to have slightly more complex rules. For example, providing "incorrect" declarations is ok, as long as no declaration to the same symbol is called in the whole program. That is, safe Rust code that only provides declarations would be UB free. The moment one declaration is called, the We would then need to define what "incorrect" is, potentially on an attribute-by-attribute basis. For example, if a function can unwind, adding a declaration that's This would potentially result in action as a distance. For example, a crate like this: // crate A
extern "C" { fn foo(); } contains no unsafe code and has no UB. This other crate contains no UB either: // crate B
extern "C" { #[unwind(allows)] fn foo(); }
pub unsafe fn bar() { foo() }
fn main() { unsafe { bar() } } However, if Maybe we could track all declarations in the dependency graph, metadata of rlibs, static libs, etc. and make sure that compilation fails if two mismatching declarations exist anywhere ? This doesn't solve the problem if xLTO with C though. |
Yes, that's kind of how I think about it: when you call the function, you are actually asserting that all the declarations that exist are valid, not just the one this gets resolved to. Not pretty. We can maybe help with a lint as long as everything is Rust code, but it'd be a cross-crate lint..
Are these attributes sufficiently well-behaved that we can take their "conjunction" and then say that that defines the actual effective attributes that the call must abide to? |
LLVM doesn't say, and I don't think we can say anything for them. We probably would need to open an LLVM bug report, and ask them to document the behavior or explicitly say that it is not allowed. Whether they provide some rule for all attributes, or whether they do this on an attribute-by-attribute basis, is kind of up to them. |
This would be really hard to check. For example: // Rust:
extern "C" {
fn foo(); // nounwind
#[unwinds] fn bar();
}
pub fn api() { unsafe { bar() } } // c++ (in C++ `extern "C" == #[no_mangle]` only)
extern "C" void foo() { throw; } // can unwind
extern "C" void bar() { foo() } // can unwind Here the Rust |
True. But given the behavior at hand, I don't see a better way. Maybe we can get an (optional) pass into LLVM that makes it a hard error when attributes conflict. Or something that uses the least restrictive set of attributes when multiple declarations coexist. But with the current behavior, I don't see a better spec that we could write. |
This safe Rust program came up on Zulip: extern "C" { static FOO: &'static u8; }
fn main() { } the value of I think this is at least tangentially related to this issue, because for |
Re-reading my last comment, I think the most appealing solution for me would be to make |
If this is UB, it's a bit of a problem for rust/objc FFI, since even the stdlib uses this to call The alternative is explicitly transmuting the function, which I had always assumed was equivalent. |
I don't think the One Definition Rule applies here. The One Definition Rule is a C++ rule that forbids multiple definitions, not declarations. I don't think LLVM is applying C++ rules to non-C++ programs that are incompatible with C rules. ISO C says about declarations "All declarations that refer to the same object or function shall have compatible type; otherwise, he behavior is undefined." The ISO C spec emphasizes that "compatible" doesn't mean "identical" and gives rules for when certain different type declarations must be considered identical. I think the discussion above, and common sense, indicates that LLVM tries to implement rules that are at least close to ISO C's compatibility rules. For example, "For two array types to be compatible, both shall have compatible element types, and if both size specifiers are present, and are integer constant expressions, then both size specifiers shall have the same constant value." This means declarations Note that MSVC has a quite powerful declaration enrichment called SAL; see https://learn.microsoft.com/en-us/cpp/code-quality/annotating-function-parameters-and-return-values?view=msvc-170 to see the extensive set of enrichment features it provides for function declarations. We should be striving to have the same--better--in Rust. And to the extent that multiple incompatible declarations would cause UB, it's the Rust compiler's job to reject at compile-time programs that have incompatible declarations. In other words, we shouldn't settle for whatever we think LLVM might currently implement. If LLVM needs to be improved in this area then we should do that, to get safe semantics. |
I doubt LLVM will consider
Fully agreed. Someone just needs to figure out what semantics we want, and do the work of implementing that. :) |
ISO C says However, there's in general no way to write a Rust function declaration that has the same qualifiers as a C function. For example: " void *memcpy(void * restrict s1, const void * restrict s2, size_t n); and the musl libc declaration:
Here's the Rust libc declaration:
So we are relying on LLVM (and really all toolchains) considering non- |
ISO C also has the concept of "composite type," which is basically the union of all the restrictions imposed by all compatible declarations. It gives an example:
This concept of composite type seems useful here. |
Here is one way to express this in C:
ISO C also says "A declaration of a parameter as 'array of type' shall be adjusted to 'qualified pointer to type', and Note that totally different language features (Rust: For |
We are relying on the fact that as long as every caller satisfies the preconditions implied by every declaration, LLVM considers everything to be fine. The situation that must not happen is that This does sound a lot like that notion of "composite type" indeed! |
rust-lang/rust#46188 seems to be saying that just the presence of the declaration, which is never used, affected the compilation. |
Yes, that is implied by what I said. Every declaration must be satisfied, whether it is used or not. |
Yes. Note though that every caller can satisfy the preconditions of all the declarations, but things can still go wrong, because LLVM might assume stronger postconditions than what the function actually provides; e.g. the non-null constraint on the result of |
Oh sure, but that is a separate issue -- all these declarations refer to the same implementation after all, so as long as each declaration individually can ensure that the linked implementation satisfies the given postconditions, we are fine on the postcondition side. Though I guess this could become interesting when a postcondition holds for all the ways that one declaration is called, but not for others... silly example: consider a function |
This entire thread was based on the assumption that there's UB in C here and LLVM is in its right to inherit that UB. (Though AFAIK we never tried to convince them to not do that.) However, turns out that this actually is not UB in C. So we should possibly go back to square one here. Can we convince LLVM to not do this kind of attribute merging? Can we make it use the weakest attributes from all declarations, rather than the strongest? That would nicely resolve this entire mess without having to burden our users with new hard-to-check safety requirements. Cc @nikic for some LLVM expertise. The issue is about what happens when there are multiple function declarations with different attributes in their signatures. It should be enough to read the first 2 messages in this issue and this one. |
@RalfJung Can you share an example of C code that gets miscompiled by Clang? It's been a while since I last looked at this topic, but it was my understanding that this is more about rustc behavior than LLVM behavior. I believe that Clang does not convert Within a single module, there is no such thing as "multiple declarations" from an LLVM perspective. How multiple source language declarations/definitions are mapped onto a single LLVM declaration/definition is entirely up to the frontend. There may be some LTO considerations here, but as far as I know all the examples discussed are within a single module. I believe the solution we have discussed in the past (in one of these threads...) is to place the attributes only on calls and definitions, rather than declarations. |
I don't think we have a miscompilation example, maybe @gnzlbg has one. To reconstruct this in C or C++ we need some functions with interesting attributes. In C I only know of In C you can't even have multiple declarations of the same function and then "call one of them" for lack of namespaces. With C++, I think C++ references may generate Are there clang C++ extensions for attributes on pointers, so that we could declare a function with and without |
Based on further discussion in rust-lang/rust#46188, it turns out this is actually more of a rustc issue than an LLVM issue. Also see the discussion here. |
Does the following example Rust code has undefined behavior ?
Note: the
#[no_mangle]
definition is used for exposition, and we can replace itwith the following C function definition in a TU that's linked with that Rust program:
There is a lot of Rust code using Rust types to "enrich" C APIs, the unsafe keyword does not appear anywhere in the example, and the equivalent C code would have undefined behavior: it is instant undefined behavior in C to provide a function declaration whose types do not "properly" match the function definition - LLVM uses this information to optimize the LLVM-IR that we currently produces for this example, and ends up adding
noalias
,nonnull
,dereferenceable
, etc. to themaybe_bad2::foo
declaration whose type actually matches the definition. This is an instance of rust-lang/rust#46188 . It is unclear whether it would be a legal optimization on LLVM-IR to propagate such attributes to the function definition, which would result in severe mis-compilations. This interacts with LTO and therefore probably with xLTO as well (e.g. a Rust declaration can probably propagate attributes to a C declaration when xLTO is involved).In C, declarations are not only unsafe to call, but also unsafe to declare. I don't think we can do that in Rust, since that would break pretty much all existing FFI code, and it would not allow users to use Rust types to better express the API of C function declarations.
So AFAICT, we have to rule the examples above as correct, and implement them in such a way that does not cause miscompilations - that is, we could close this and just handle this by fixing: rust-lang/rust#46188 , and maybe adding a PR to the reference explaining that this is explicitly ok.
The text was updated successfully, but these errors were encountered: