-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86][AVX] lowerShuffleAsLanePermuteAndPermute - incomplete lane shuffle mask #40076
Comments
assigned to @RKSimon |
The issue looks like it first appeared in rL344446, so is a regression in the 8.0 branch |
https://gcc.godbolt.org/z/9BAQca contains the generic shuffle as well as the constant folding bugged version |
Test case added at rL354034 |
That should've been: .LCPI0_0: |
Fixed in trunk at rL354117 @Hans - please give it a while and then cherry pick r354034 + r354117 |
Thanks! Merged them together in r354260. Please let me know if there are any follow-ups. |
Extended Description
https://gcc.godbolt.org/z/tLJJE0
define <8 x i32> @shuffle_v8i32_0dcd3f14(<8 x i32> %a, <8 x i32> %b) {
%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 13, i32 12, i32 13, i32 3, i32 15, i32 1, i32 4>
ret <8 x i32> %shuffle
}
define <8 x i32> @shuffle_v8i32_0dcd3f14_constant(<8 x i32> %a0) {
%res = shufflevector <8 x i32> %a0, <8 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, <8 x i32> <i32 0, i32 13, i32 12, i32 13, i32 3, i32 15, i32 1, i32 4>
ret <8 x i32> %res
}
When the shuffle gets lowered, the constant argument gets incorrectly folded. This appears to be due to the shuffle mask that (correctly) lowers to vperm2f128, containing undef elements in the wrong place that allows undef propagation that leads to incorrect constant folding.
shuffle_v8i32_0dcd3f14:
vextractf128 $1, %ymm0, %xmm2
vblendps $1, %xmm2, %xmm0, %xmm2 # xmm2 = xmm2[0],xmm0[1,2,3]
vpermilps $23, %xmm2, %xmm2 # xmm2 = xmm2[3,1,1,0]
vinsertf128 $1, %xmm2, %ymm0, %ymm0
vperm2f128 $17, %ymm0, %ymm1, %ymm1 # ymm1 = ymm1[2,3,2,3]
vpermilpd $4, %ymm1, %ymm1 # ymm1 = ymm1[0,0,3,2]
vblendps $209, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5],ymm0[6,7]
retq
.LCPI0_0:
.quad 60129542157 # 0x0000000E0000000D
.quad 60129542157 # 0x0000000E0000000D
.zero 8 <-- INCORRECT - should be 0x0000000F00000000
.quad 60129542157 # 0x0000000E0000000D
shuffle_v8i32_0dcd3f14_constant:
vextractf128 $1, %ymm0, %xmm1
vblendps $1, %xmm1, %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[1,2,3]
vpermilps $23, %xmm1, %xmm1 # xmm1 = xmm1[3,1,1,0]
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vblendps $46, .LCPI0_0(%rip), %ymm0, %ymm0 # ymm0 = ymm0[0],mem[1,2,3],ymm0[4],mem[5],ymm0[6,7]
retq
Reduced from an internal fuzz test.
The text was updated successfully, but these errors were encountered: