Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][AVX] lowerShuffleAsLanePermuteAndPermute - incomplete lane shuffle mask #40076

Closed
RKSimon opened this issue Feb 14, 2019 · 8 comments
Closed
Assignees
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 14, 2019

Bugzilla Link 40730
Resolution FIXED
Resolved on Feb 18, 2019 03:22
Version trunk
OS Windows NT
Blocks #39678
CC @topperc,@zmodem,@RKSimon,@rotateright
Fixed by commit(s) r354034,r354117

Extended Description

https://gcc.godbolt.org/z/tLJJE0

define <8 x i32> @​shuffle_v8i32_0dcd3f14(<8 x i32> %a, <8 x i32> %b) {
%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 13, i32 12, i32 13, i32 3, i32 15, i32 1, i32 4>
ret <8 x i32> %shuffle
}

define <8 x i32> @​shuffle_v8i32_0dcd3f14_constant(<8 x i32> %a0) {
%res = shufflevector <8 x i32> %a0, <8 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, <8 x i32> <i32 0, i32 13, i32 12, i32 13, i32 3, i32 15, i32 1, i32 4>
ret <8 x i32> %res
}

When the shuffle gets lowered, the constant argument gets incorrectly folded. This appears to be due to the shuffle mask that (correctly) lowers to vperm2f128, containing undef elements in the wrong place that allows undef propagation that leads to incorrect constant folding.

shuffle_v8i32_0dcd3f14:
vextractf128 $1, %ymm0, %xmm2
vblendps $1, %xmm2, %xmm0, %xmm2 # xmm2 = xmm2[0],xmm0[1,2,3]
vpermilps $23, %xmm2, %xmm2 # xmm2 = xmm2[3,1,1,0]
vinsertf128 $1, %xmm2, %ymm0, %ymm0
vperm2f128 $17, %ymm0, %ymm1, %ymm1 # ymm1 = ymm1[2,3,2,3]
vpermilpd $4, %ymm1, %ymm1 # ymm1 = ymm1[0,0,3,2]
vblendps $209, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5],ymm0[6,7]
retq

.LCPI0_0:
.quad 60129542157 # 0x‭0000000E0000000D‬
.quad 60129542157 # 0x‭0000000E0000000D‬
.zero 8 <-- INCORRECT - should be 0x‭0000000F00000000
.quad 60129542157 # 0x‭0000000E0000000D‬
shuffle_v8i32_0dcd3f14_constant:
vextractf128 $1, %ymm0, %xmm1
vblendps $1, %xmm1, %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[1,2,3]
vpermilps $23, %xmm1, %xmm1 # xmm1 = xmm1[3,1,1,0]
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vblendps $46, .LCPI0_0(%rip), %ymm0, %ymm0 # ymm0 = ymm0[0],mem[1,2,3],ymm0[4],mem[5],ymm0[6,7]
retq

Reduced from an internal fuzz test.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 14, 2019

assigned to @RKSimon

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 14, 2019

The issue looks like it first appeared in rL344446, so is a regression in the 8.0 branch

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 14, 2019

https://gcc.godbolt.org/z/9BAQca contains the generic shuffle as well as the constant folding bugged version

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 14, 2019

Test case added at rL354034

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 14, 2019

That should've been:

.LCPI0_0:
.quad 60129542157 # 0x0000000E0000000D
.quad 60129542157 # 0x0000000E0000000D
.zero 8 <-- INCORRECT - should be 0x0000001000000000
.quad 60129542157 # 0x0000000E0000000D
shuffle_v8i32_0dcd3f14_constant:
vextractf128 $1, %ymm0, %xmm1
vblendps $1, %xmm1, %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[1,2,3]
vpermilps $23, %xmm1, %xmm1 # xmm1 = xmm1[3,1,1,0]
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vblendps $46, .LCPI0_0(%rip), %ymm0, %ymm0 # ymm0 = ymm0[0],mem[1,2,3],ymm0[4],mem[5],ymm0[6,7]
retq

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 14, 2019

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 15, 2019

Fixed in trunk at rL354117

@​Hans - please give it a while and then cherry pick r354034 + r354117

@zmodem
Copy link
Collaborator

zmodem commented Feb 18, 2019

Fixed in trunk at rL354117

@​Hans - please give it a while and then cherry pick r354034 + r354117

Thanks! Merged them together in r354260. Please let me know if there are any follow-ups.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants