-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avx512f #917
Avx512f #917
Conversation
merge from base
merge base
…ps512, castps256_ps512
…pd512, castpd256_pd512
…astsi512_ps, castsi512_pd
r? @Amanieu (rust_highfive has picked a reviewer for you, use r? to override) |
For boradcast_f32x4,i32x4,f64x4,i64x4 |
Add |
crates/core_arch/src/x86/avx512f.rs
Outdated
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vbroadcas))] //should be vpbroadcastq | ||
pub unsafe fn _mm512_broadcastq_epi64(a: __m128i) -> __m512i { | ||
simd_shuffle8(a, a, [1, 1, 1, 1, 1, 1, 1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be [0, 0, 0, 0, 0, 0, 0, 0]
? The documentation says the low element is broadcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
[0, 0, 0, 0, 0, 0, 0, 0]
? The documentation says the low element is broadcast.
Yes. You are correct. Thanks.
crates/core_arch/src/x86/avx512f.rs
Outdated
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vperm))] //should be vpunpckhqdq | ||
pub unsafe fn _mm512_unpackhi_epi64(a: __m512i, b: __m512i) -> __m512i { | ||
simd_shuffle8(a, b, [2, 10, 3, 11, 2 + 4, 10 + 4, 3 + 4, 11 + 4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct shuffle should be [1, 9, 3, 11, 5, 13, 7, 15]
.
You can get this by compiling the intrinsic with Clang and extracting the LLVM IR: https://godbolt.org/z/PzoKPz
b, | ||
[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 12, 13, 14, 15], | ||
), | ||
_ => simd_shuffle16(a, b, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 19]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this to panic if an invalid imm8
is passed in? Same for all the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this to panic if an invalid
imm8
is passed in? Same for all the others.
As an example _mm512_inserti32x4(a, b, imm8)
In clang, it will check if imm8 is [0..3] in compile time. But in Rust we need to use constify_imm2 to check it in the run time.
But, constify_imm2 uses & 0b11 to make everything pass.
macro_rules! constify_imm2 {
($imm8:expr, $expand:ident) => {
#[allow(overflowing_literals)]
match ($imm8) & 0b11 {
0 => $expand!(0),
1 => $expand!(1),
2 => $expand!(2),
_ => $expand!(3),
}
};
}
So I need to create something like constify_imm2_sae?
match ($imm8)
0=>...
1=>...
...
_=> panic!(xxxxx)?
In x86_64: avx. Clang will check if index is [0..3], so we need to add the invalid check on this?
pub unsafe fn _mm256_insert_epi64(a: __m256i, i: i64, index: i32) -> __m256i {
let a = a.as_i64x4();
match index & 3 {
0 => transmute(simd_insert(a, 0, i)),
1 => transmute(simd_insert(a, 1, i)),
2 => transmute(simd_insert(a, 2, i)),
_ => transmute(simd_insert(a, 3, i)),
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we just use "assert!(imm8 >= 0 && imm8 <= 255)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the compiler would reject out-of-range values at compile-time, but rustc doesn't support this yet. So instead we should panic if an out-of-range value is passed. This allows us to reject out-of-range values in the future once rustc has that functionality.
You can either do the same as consitfy_imm2_sae
or just assert!(imm8 <= 3)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will do assert!(imm8 <= 3) and update a new version.
crates/core_arch/src/x86/avx512f.rs
Outdated
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vperm))] //should be vunpackhpd | ||
pub unsafe fn _mm512_unpackhi_pd(a: __m512d, b: __m512d) -> __m512d { | ||
simd_shuffle8(a, b, [2, 10, 3, 11, 2 + 4, 10 + 4, 3 + 4, 11 + 4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct shuffle here should be [1, 9, 3, 11, 5, 13, 7, 15]
.
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=512_mask_broadcastd_epi32&expand=546) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vpbroadcast))] //should be vpbroadcastd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the one below generate the correct vpbroadcastd
instruction. The comment is wrong.
crates/core_arch/src/x86/avx512f.rs
Outdated
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=512_broadcast_i32x4&expand=510) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
//#[cfg_attr(test, assert_instr(vbroadcast))] should be vbroadcasti32x4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining that LLVM doesn't generate a vbroadcast
instruction for this intrinsic. Please don't leave commented-out code lying around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining that LLVM doesn't generate a
vbroadcast
instruction for this intrinsic. Please don't leave commented-out code lying around.
Yes, For boradcast_f32x4,i32x4,f64x4,i64x4
The instruction generated from Linux and msvc are totally different. (shuffle vs broadcast).
So I remove the test and add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can remove the test and add a comment.
The primary purpose is to get the fixes from rust-lang/stdarch#920 and rust-lang/stdarch#922. The other changes included are rust-lang/stdarch#917 and rust-lang/stdarch#919.
castps128_512, castps_pd, castps_si512, castps512_ps256, castps512_ps128, castps128_ps512, castps256_ps512
castpd_ps, castpd_si512, castpd512_pd256, castpd512_pd128, castpd128_pd512, castpd256_pd512
castsi512_si128, castsi512_si256, castsi128_si512, castsi256_si512, castsi512_ps, castsi512_pd
broadcastsd_pd, broadcastss_ps, broadcastd_epi32, broadcastq_epi64
broadcast_i32x4, broadcast_i64x4, broadcast_f32x4, broadcast_f64x4
addnot: epi32, epi64
insertf32x4, insertf64x4, inserti32x4, inserti64x4
mask_blend: epi32,epi64,ps,pd
unpackhi: epi32,epi64,ps,pd
unpacklo: epi32,epi64,ps,pd