-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlining causes miscompilation of code that mixes target features #116573
Comments
This still miscompiles with |
Okay so it's an LLVM bug then it seems. Cc @nikic |
Is there a way to reproduce this without (Incidentally, there was an attempt to not do that in LLVM 17, but this was reverted due to the large amount of regressions it caused. People rely on that a lot, including in Rust.) |
Are you saying |
Yeah that is no good, we can't have (safe!) attributes just override checks which are needed for soundness. I don't know a reproducer without |
Perf regressions are acceptable when fixing soundness bugs. We then have to see how much of the perf we can get back without compromising soundness. |
There wouldn't necessarily need to be a perf regression. I would expect it to compile down to the code that would exist as if the intermediate function were not there: use std::mem::transmute;
#[cfg(target_arch = "x86")]
use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
extern "C" fn no_target_feature(_dummy: f32, x: __m256) {
let val = unsafe { transmute::<_, [u32; 8]>(x) };
dbg!(val);
}
#[target_feature(enable = "avx")]
unsafe fn with_target_feature(x: __m256) {
// Critical call: caller and callee have different target features.
// The compiler needs to deal with the ABI transition here.
no_target_feature(0.0, x);
}
fn main() {
assert!(is_x86_feature_detected!("avx"));
// SAFETY: we checked that the `avx` feature is present.
unsafe {
with_target_feature(transmute([1; 8]));
}
} I would expect that a function, when inlined, doesn't have effects on ABI issues in itself. BTW, I hope to be writing code just like this very soon, but instead of |
That's what it does, and that's the bug. That code is wrong, see #116558. Basically LLVM tied together flags affecting ABI and flags relevant for codegen, and I think that was a huge mistake. This issue and #116558 show why.
This issue affects all non-"Rust" ABIs. |
If |
It's a bit more complex than that. Examples of regressions this caused are:
You might call this "just a perf issue", but inlining of platform vector intrinsics is an important part of their semantics. They are useless if this does not happen reliably. These issues are not fundamental, but caused by target feature checks being too conservative, especially for non-X86 targets. The semantics of always_inline can be changed, but it would require some work to make sure we have at least somewhat accurate compatibility checks across targets. I believe something that was discussed in the past but never happened, is that we should add a lint for calling a function with less target features, while passing vector values to it. Independent of the soundness issues discussed here, the lack of inlining makes this a performance footgun, and it's almost certainly not what people want to do. |
Pretty much anybody who would write code like the above would very much appreciate at least a warning if that is going to happen. If/when I see such a warning I would remove the "intermediate" wrapper. Then I would rewrite the code into the form I shared.
Then how is this particular issue a distinct bug from #116558, especially considering that nobody wants their |
Completely understandable. We should design a lint that will fire on all cases here.
Sounds good to me.
And yeah, nikic is right here. We might have to hack in an |
This is a clear soundness bug IMO, #116558 is "just" very odd semantics and ABI footguns. I think we should resolve #116558 by refusing to compile the example there but I'm not convinced that will suffice to fix this soundness bug.
If the caller had the target feature, they should still get inlined, no? And if someone calls an AVX2 intrinsic from a function that doesn't have the AVX2 feature then surely exploding that code is fine, it should probably not even compile... |
People use dynamic feature dispatch, however? |
The relevant case is more along the lines of: The caller has features +a,+b and the platform intrinsic has +a. LLVM refuses to inline because this is potentially unsafe. LLVM's default assumption about what is safe to inline are very conservative. If the target doesn't tell it that e.g. subset inlining is always safe, it's only going to inline if the target features are exactly the same. Not all targets implement the necessary hook to provide a more precise compatibility check. Or to give a less obvious example, you have a function with |
Inlining across Arm "major versions" is honestly pretty dangerous because they routinely retire older instructions on the majors. |
I am trying to catch LLVM in the act of moving a Here's another version, still doesn't get inlined though. |
I think I finally got it. Not sure what is different about this than my previous attempts... |
Here's an LLVM issue for the problem: llvm/llvm-project#70563 |
i found an example that doesn't use extern "C" it should print use core::arch::x86_64::__m256i;
use core::hint::black_box;
use core::mem::transmute;
#[allow(non_camel_case_types)]
#[derive(Copy, Clone, Debug)]
pub struct u64x4(u64, u64, u64, u64);
#[inline(never)]
#[target_feature(enable = "avx")]
unsafe fn return_as_is_avx(a: __m256i) -> __m256i {
a
}
#[inline(never)]
unsafe fn return_as_is(a: u64x4) -> u64x4 {
transmute(return_as_is_avx(transmute(a)))
}
#[target_feature(enable = "avx")]
#[inline]
unsafe fn imbue_avx<F: Fn()>(f: F) -> F::Output {
f()
}
pub unsafe fn buggy() {
imbue_avx(
#[inline(always)]
|| {
dbg!(return_as_is(black_box(u64x4(0, 1, 2, 3))));
},
);
}
pub fn main() {
assert!(is_x86_feature_detected!("avx"));
unsafe {
buggy();
}
} |
On Zulip, someone suggested this might be due to LLVM turning a (Please mention such observations when carrying issues from Zulip to Github, or else people will have to waste time re-discovering the same thing!) |
im not sure if that's what's causing the issue. even when passing the argument with multiple indirections and black_boxing the reference so it doesn't get promoted, i still get the same issue https://godbolt.org/z/EaGxGjWhT #[inline(never)]
#[target_feature(enable = "avx")]
unsafe fn return_as_is_avx(a: &&__m256i) -> u64x4 {
transmute(**black_box(a))
}
#[inline(never)]
unsafe fn return_as_is(a: u64x4) -> u64x4 {
return_as_is_avx(&&transmute(a))
} this is the asm for example::return_as_is_avx:
mov qword ptr [rsp - 8], rsi
lea rax, [rsp - 8]
mov rax, qword ptr [rsp - 8]
mov rax, qword ptr [rax]
vmovaps ymm0, ymmword ptr [rax]
vmovups ymmword ptr [rdi], ymm0
vzeroupper
ret output
|
this part looks suspicious to me i might be misreading this, but it looks like example::return_as_is:
push rbp
mov rbp, rsp
and rsp, -32
sub rsp, 96
movaps xmmword ptr [rsp + 48], xmm1 // <--
movaps xmmword ptr [rsp + 32], xmm0 // <--
lea rax, [rsp + 32]
mov qword ptr [rsp + 24], rax
lea rsi, [rsp + 24]
call example::return_as_is_avx
mov rsp, rbp
pop rbp
ret but in example::imbue_avx:
push r14
push rbx
sub rsp, 168
vmovaps ymm0, ymmword ptr [rip + .LCPI6_0]
vmovups ymmword ptr [rsp + 80], ymm0
lea r14, [rsp + 80]
vmovups ymm0, ymmword ptr [rsp + 80] // <--
lea rbx, [rsp + 136]
mov rdi, rbx
call example::return_as_is |
Hm, strange. Maybe the ABI for closures is buggy and doesn't do the "ptr" indirection the way our normal ABI does.
|
i don't think it's a closure issue, still happens if i get rid of it https://godbolt.org/z/cW9GdPWdM |
Then the only other idea I have is that LLVM tries to optimize passing Interestingly, one can even remove the target-feature from |
could you post an example? i can't reproduce this |
Here you go: https://godbolt.org/z/nqf8Ee9PM |
thanks! i tried reproducing the issue outside of godbolt/playground and i noticed an interesting pattern. this is the project structure // src/lib.rs
use std::arch::x86_64::__m256i;
use std::hint::black_box;
use std::mem::transmute;
#[allow(non_camel_case_types)]
#[derive(Copy, Clone)]
pub struct u64x4(u64, u64, u64, u64);
#[inline(never)]
pub unsafe fn return_as_is_avx(a: &&__m256i) -> u64x4 {
transmute(**black_box(a))
}
#[inline(never)]
pub unsafe fn return_as_is(a: u64x4) -> u64x4 {
return_as_is_avx(&&transmute(a))
}
#[inline(always)]
pub unsafe fn buggy_intermediate() {
let result = return_as_is(black_box(u64x4(13, 14, 15, 16)));
println!("({}, {}, {}, {})", result.0, result.1, result.2, result.3)
}
#[target_feature(enable = "avx")]
#[inline(never)]
pub unsafe fn buggy_avx() {
buggy_intermediate();
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
pub fn test_inner() {
if !is_x86_feature_detected!("avx") {
return;
}
unsafe { buggy_avx() };
}
} // tests/bug.rs
use abi_bug::*;
#[test]
pub fn test_outer() {
if !is_x86_feature_detected!("avx") {
return;
}
unsafe { buggy_avx() };
}
after disassembling the test binaries, it looks like the one in so this might be a bug with |
here's an example of the buggy llvm-ir (i think, im not very familiar with llvm) |
actually, i don't think im not sure how it's being promoted to |
I'm pretty sure rustc doesn't automatically do such transformations, so it's likely an LLVM optimization. |
looks like you're right, with |
#127731 will make the original example not compile any more, and thus fix the easiest way to hit the soundness bug. The "obvious" way to still reproduce the issue involves swapping the role of which functions have target features and which don't. However, Rust rejects having Now I wonder, how does Basically the plan would be to compile one crate with #[inline(never)]
unsafe extern "C" fn with_target_feature(_dummy: f32, x: __m256) {
let val = unsafe { transmute::<_, [u32; 8]>(x) };
dbg!(val);
}
#[inline(always)]
unsafe fn with_target_feature_intermediate(dummy: f32, x: __m256) {
with_target_feature(dummy, x);
} And then another crate without #[inline(never)]
unsafe fn no_target_feature(x: __m256) {
assert!(is_x86_feature_detected!("avx"));
with_target_feature_intermediate(0.0, x);
}
fn main() {
no_target_feature(transmute([1; 8]));
} Now if the target feature information is preserved per-function, LLVM should be inlining But is there a way to get rustc + LLVM to actually do that? |
Assuming that If we had |
The original example seems to work correctly with rustc 1.80.1:
|
@nikic any idea if something changed on the LLVM side that would explain why this does not reproduce any more? |
Ah, try running the example without optimizations. That still reproduces it on the playground.
|
To be on the safer side, it can be useful to add
|
Okay, so the issue still exists. But the question remains whether there's a reproducer that works even with #127731, i.e. a reproducer that only uses vector types on The original reproducer was a "no target feature" function taking a vector and having its call site incorrectly inlined into a "with target feature" function. That can't happen any more, so we need a "with target feature" function and coerce LLVM into inlining its call site into a "no target feature" function. It will only do that when the "with target feature" function has |
The following code ought to be completely fine and UB-free:
There's some unsafe going on, but the safety comment explains why that is okay. We are even taking care to follow the target-feature related ABI rules (see #115476); all calls between functions with different target-features use the "Rust" ABI.
And yet, this prints (when built without optimizations)
The value got clobbered while being passed through the various functions.
Replacing
inline(always)
byinline(never)
makes the issue disappear. Butinline
attributes must never cause miscompilation, so there's still a soundness bug here.I don't know if this is the MIR inliner (Cc @rust-lang/wg-mir-opt) or the LLVM inliner going wrong.
Here's an LLVM issue for the problem: llvm/llvm-project#70563
The text was updated successfully, but these errors were encountered: