LLVM ERROR with avx2 instructions #253

Cocalus · 2017-12-28T01:43:04Z

when running "cargo test" on the following code with "rustc 1.24.0-nightly (1abeb436d 2017-12-27)" I get
LLVM ERROR: Do not know how to split the result of this operator!

extern crate stdsimd;

use stdsimd::vendor::*;
use stdsimd::simd::u32x8;

//Find minimum val within 3 registers and splat it to a new register
pub fn splat_min(v1: u32x8, v2: u32x8, v3: u32x8) -> u32x8 {
    unsafe {
        //across registers
        let mut min = _mm256_min_epu32(_mm256_min_epu32(v1, v2), v3);

        //within 128 bit halfs
        let mut tmp = _mm256_shuffle_epi32(min.into(), 0x4E).into();
        min = _mm256_min_epu32(min, tmp);
        tmp = _mm256_shuffle_epi32(min.into(), 0xB1).into();
        min = _mm256_min_epu32(min, tmp);


        //across 128 bit halfs
        tmp = _mm256_permute2f128_si256(min.into(), min.into(), 1).into();
        _mm256_min_epu32(min, tmp)
    }
}


#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test() {
        let v1 = u32x8::new(0,1,2,3,4,5,6,7);
        let v2 = u32x8::new(8,9,10,11,12,13,14,15);
        let v3 = u32x8::new(16,17,18,19,20,21,22,23);

        assert_eq!(splat_min(v1,v2,v3), u32x8::splat(0));
    }
}

AdamNiederer · 2017-12-28T02:59:53Z

I'm having a similar issue with non-avx2 instructions - I think the issue is caused by calling an intrinsic in an assert! macro, but I've only came across the issue in my doctests.

See here for another example of this issue.

Cocalus · 2017-12-28T04:01:07Z

I still get the error even if I remove the assert

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test() {
        let v1 = u32x8::new(0,1,2,3,4,5,6,7);
        let v2 = u32x8::new(8,9,10,11,12,13,14,15);
        let v3 = u32x8::new(16,17,18,19,20,21,22,23);

        splat_min(v1,v2,v3);
    }
}

Cocalus · 2017-12-28T04:25:41Z

Here's a smaller example which I copied from the stdsimd test for _mm256_permute2f128_si256

If get the error with
cargo run

but not with
cargo run --release

adding
#[target_feature = "+avx"]
just before main, also fixes it

Which mean's it's probably not going to compile to the right instruction anyway. But it would nice if it returned a more useful error if it shouldn't compile.

extern crate stdsimd;

use stdsimd::vendor::_mm256_permute2f128_si256;
use stdsimd::simd::i32x8;

fn main() {
    unsafe {
        let a = i32x8::new(1, 2, 3, 4, 1, 2, 3, 4);
        let b = i32x8::new(5, 6, 7, 8, 5, 6, 7, 8);
        let r = _mm256_permute2f128_si256(a, b, 0x20);
        let e = i32x8::new(1, 2, 3, 4, 5, 6, 7, 8);
        assert_eq!(r, e);
    }
}

gnzlbg · 2017-12-29T09:12:30Z

@Cocalus last example is compiling main with SSE, but unconditionally calling an AVX2 function in it so unless it is compiled with -C target-feature=+avx2 this example has undefined behavior. In --release LLVM probably just deduces that fn main() can only run in AVX2 targets (it's UB otherwise) and just compiles it with AVX2 (which will work on AVX2 hosts, and crash otherwise).

In debug mode, main is compiled with SSE, and because i32x8 has a different ABI in SSE than in AVX2 this should trigger: rust-lang/rust#44367 , why it triggers the LLVM error instead, I don't know (sometimes this works and crashes, sometimes it doesn't compile; once the rust issue is fixed the ABIs will be automatically converted).

However, I don't understand why using #[target_feature = "+avx"] fn main() { ... } fixes this issue in debug builds. It should not: i32x8 also has a different ABI in AVX than in AVX2, which should again trigger: rust-lang/rust#44367

@alexcrichton ?

Cocalus · 2017-12-29T16:19:19Z

I was wondering why all the vendor functions were marked unsafe, I guess the requirement on the programmer is that they are only used a section of code where the relevant target features are enabled. It seems like it should be possible to statically check that _mm256_permute2f128_si256 is only called in a an avx enabled section of code, then a lot of unsafe blocks could be avoided. I'm might be missing other reasons the vendor functions are tagged as unsafe.

In my original code I had the splat_min as an inner function and I expected the target_feature to apply to everything within the function, including inner functions and closures. Which is apparently not the case. I haven't figured out the incantation to get closures to work. Though this could also be related to repr(simd) issue.

Using cargo run on this only produces the right answer for good. Notable there is no unsafe here.

#![feature(target_feature)]

extern crate stdsimd;

use stdsimd::vendor::*;
use stdsimd::simd::i32x8;

#[target_feature = "+avx,+avx2"]
fn main() {
    let x = i32x8::splat(0);
    let y = i32x8::splat(1);

    let bad = |a,b| {a+b};

    fn alsobad(a: i32x8, b: i32x8) -> i32x8{
        a+b
    }

    #[target_feature = "+avx,+avx2"]
    fn good(a: i32x8, b: i32x8) -> i32x8{
        a+b
    }

    #[target_feature = "+avx,+avx2"]
    let stillbad = |a,b| {a+b};


    let alsostillbad = |a,b| {
        #[target_feature = "+avx,+avx2"]
        {a+b}
    };


    println!("{:?}", bad(x,y));
    println!("{:?}", alsobad(x,y));
    println!("{:?}", good(x,y));
    println!("{:?}", stillbad(x,y));
    println!("{:?}", alsostillbad(x,y));
}

alexcrichton · 2017-12-29T16:47:12Z

In general these are probably either LLVM bugs or otherwise consequences of rust-lang/rust#44367 where we're mismatching enabled target features per function.

gnzlbg · 2017-12-29T17:38:54Z

It seems like it should be possible to statically check that _mm256_permute2f128_si256 is only called in a an avx enabled section of code, then a lot of unsafe blocks could be avoided.

@Cocalus There is an RFC for that: rust-lang/rfcs#2212

This commit blanket changes all `#[inline(always)]` annotations to `#[inline]`. Fear not though, this should not be a regression! To clarify, though, this change is done out of correctness to ensure that we don't hit stray LLVM errors. Most of the LLVM intrinsics and various LLVM functions we actually lower down to only work correctly if they are invoked from a function with an appropriate target feature set. For example if we were to out-of-the-blue invoke an AVX intrinsic then we get a [codegen error][avx-error]. This error comes about because the surrounding function isn't enabling the AVX feature. Now in general we don't have a lot of control over how this crate is consumed by downstream crates. It'd be a pretty bad mistake if all mistakes showed up as scary un-debuggable codegen errors in LLVM! On the other side of this issue *we* as the invokers of these intrinsics are "doing the right thing". All our functions in this crate are tagged appropriately with target features to be codegen'd correctly. Indeed we have plenty of tests asserting that we can codegen everything across multiple platforms! The error comes about here because of precisely the `#[inline(always)]` attribute. Typically LLVM *won't* inline functions across target feature sets. For example if you have a normal function which calls a function that enables AVX2, then the target, no matter how small, won't be inlined into the caller. This is done for correctness (register preserving and all that) but is also how these codegen errors are prevented in practice. Now we as stdsimd, however, are currently tagging all functions with "always inline this, no matter what". That ends up, apparently, bypassing the logic of "is this even possible to inline". In turn we start inlining things like AVX intrinsics into functions that can't actually call AVX intrinsics, creating codegen errors at compile time. So with all that motivation, this commit switches to the normal inline hints for these functions, just `#[inline]`, instead of `#[inline(always)]`. Now for the stdsimd crate it is absolutely critical that all functions are inlined to have good performance. Using `#[inline]`, however, shouldn't hamper that! The compiler will recognize the `#[inline]` attribute and make sure that each of these functions is *candidate* to being inlined into any and all downstream codegen units. (aka if we were missing `#[inline]` then LLVM wouldn't even know the definition to inline most of the time). After that, though, we're relying on LLVM to naturally inline these functions as opposed to forcing it to do so. Typically, however, these intrinsics are one-liners and are trivially inlineable, so I'd imagine that LLVM will go ahead and inline everything all over the place. All in all this change is brought about by rust-lang#253 which noticed various codegen errors. I originally thought it was due to ABI issues but turned out to be wrong! (although that was also a bug which has since been resolved). In any case after this change I was able to get the example in rust-lang#253 to execute in both release and debug mode. Closes rust-lang#253 [avx-error]: https://play.rust-lang.org/?gist=50cb08f1e2242e22109a6d69318bd112&version=nightly

alexcrichton · 2018-01-29T04:58:47Z

I've opened #306 to close out this issue as my suspected cause, rust-lang/rust#44367 , has since been fixed and it unfortunately didn't fix this issue!

* Move from #[inline(always)] to #[inline] This commit blanket changes all `#[inline(always)]` annotations to `#[inline]`. Fear not though, this should not be a regression! To clarify, though, this change is done out of correctness to ensure that we don't hit stray LLVM errors. Most of the LLVM intrinsics and various LLVM functions we actually lower down to only work correctly if they are invoked from a function with an appropriate target feature set. For example if we were to out-of-the-blue invoke an AVX intrinsic then we get a [codegen error][avx-error]. This error comes about because the surrounding function isn't enabling the AVX feature. Now in general we don't have a lot of control over how this crate is consumed by downstream crates. It'd be a pretty bad mistake if all mistakes showed up as scary un-debuggable codegen errors in LLVM! On the other side of this issue *we* as the invokers of these intrinsics are "doing the right thing". All our functions in this crate are tagged appropriately with target features to be codegen'd correctly. Indeed we have plenty of tests asserting that we can codegen everything across multiple platforms! The error comes about here because of precisely the `#[inline(always)]` attribute. Typically LLVM *won't* inline functions across target feature sets. For example if you have a normal function which calls a function that enables AVX2, then the target, no matter how small, won't be inlined into the caller. This is done for correctness (register preserving and all that) but is also how these codegen errors are prevented in practice. Now we as stdsimd, however, are currently tagging all functions with "always inline this, no matter what". That ends up, apparently, bypassing the logic of "is this even possible to inline". In turn we start inlining things like AVX intrinsics into functions that can't actually call AVX intrinsics, creating codegen errors at compile time. So with all that motivation, this commit switches to the normal inline hints for these functions, just `#[inline]`, instead of `#[inline(always)]`. Now for the stdsimd crate it is absolutely critical that all functions are inlined to have good performance. Using `#[inline]`, however, shouldn't hamper that! The compiler will recognize the `#[inline]` attribute and make sure that each of these functions is *candidate* to being inlined into any and all downstream codegen units. (aka if we were missing `#[inline]` then LLVM wouldn't even know the definition to inline most of the time). After that, though, we're relying on LLVM to naturally inline these functions as opposed to forcing it to do so. Typically, however, these intrinsics are one-liners and are trivially inlineable, so I'd imagine that LLVM will go ahead and inline everything all over the place. All in all this change is brought about by #253 which noticed various codegen errors. I originally thought it was due to ABI issues but turned out to be wrong! (although that was also a bug which has since been resolved). In any case after this change I was able to get the example in #253 to execute in both release and debug mode. Closes #253 [avx-error]: https://play.rust-lang.org/?gist=50cb08f1e2242e22109a6d69318bd112&version=nightly * Add inline(always) on eflags intrinsics Their ABI actually relies on it! * Leave #[inline(always)] on portable types They're causing test failures on ARM, let's investigate later.

alexcrichton added A-llvm A-x86 labels Jan 29, 2018

alexcrichton mentioned this issue Jan 29, 2018

Move from #[inline(always)] to #[inline] #306

Merged

alexcrichton closed this as completed in #306 Jan 29, 2018

dignifiedquire mentioned this issue Apr 17, 2019

Implement AVX-512 intrinsics #310

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM ERROR with avx2 instructions #253

LLVM ERROR with avx2 instructions #253

Cocalus commented Dec 28, 2017

AdamNiederer commented Dec 28, 2017 •

edited

Loading

Cocalus commented Dec 28, 2017

Cocalus commented Dec 28, 2017 •

edited

Loading

gnzlbg commented Dec 29, 2017 •

edited

Loading

Cocalus commented Dec 29, 2017

alexcrichton commented Dec 29, 2017

gnzlbg commented Dec 29, 2017 •

edited

Loading

alexcrichton commented Jan 29, 2018

LLVM ERROR with avx2 instructions #253

LLVM ERROR with avx2 instructions #253

Comments

Cocalus commented Dec 28, 2017

AdamNiederer commented Dec 28, 2017 • edited Loading

Cocalus commented Dec 28, 2017

Cocalus commented Dec 28, 2017 • edited Loading

gnzlbg commented Dec 29, 2017 • edited Loading

Cocalus commented Dec 29, 2017

alexcrichton commented Dec 29, 2017

gnzlbg commented Dec 29, 2017 • edited Loading

alexcrichton commented Jan 29, 2018

AdamNiederer commented Dec 28, 2017 •

edited

Loading

Cocalus commented Dec 28, 2017 •

edited

Loading

gnzlbg commented Dec 29, 2017 •

edited

Loading

gnzlbg commented Dec 29, 2017 •

edited

Loading