-
Notifications
You must be signed in to change notification settings - Fork 43
Conversation
9412e76
to
d85e2f5
Compare
d85e2f5
to
d4e1f89
Compare
d4e1f89
to
9b40fdd
Compare
The need for integer absolute value instructions was suggested by @jan-wassenberg in the recent SIMD WG sync, and in #176. As no one voiced either support nor critique for the proposal in this PR, I'd like to explicitly put it to vote:
|
Thanks @Maratyszcza for adding this poll, voted, but also explicitly in favor of adding this set of operations to the proposal as it benefits different sets of applications and maps to one instruction on most relevant platforms. As the votes look in favor, this will be merged after waiting for a reasonable time to vote, please respond here for any concerns/objections to including the integer value operations to the current SIMD proposal. |
Thank you for this PR. |
@AlphaHot as the dissenting vote, would you like to share why you object to adding these operations to the proposal? |
9b40fdd
to
e261fdd
Compare
Summary: These were merged to the SIMD proposal in WebAssembly/simd#128. Depends on D76397 to avoid merge conflicts. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76399
Adds full support for the {i8x16,i16x8,i32x4}.abs instructions merged to the SIMD proposal in WebAssembly/simd#128 as well as the {i8x16,i16x8,i32x4}.bitmask instructions proposed in WebAssembly/simd#201.
Adds full support for the {i8x16,i16x8,i32x4}.abs instructions merged to the SIMD proposal in WebAssembly/simd#128 as well as the {i8x16,i16x8,i32x4}.bitmask instructions proposed in WebAssembly/simd#201.
Summary: These were merged to the SIMD proposal in WebAssembly/simd#128. Depends on D76397 to avoid merge conflicts. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76399
The illustrated i32x4.abs lowering for SSE2 is incorrect.
|
@aqrit Thanks for reporting. Fixed. |
Introduction
Integer absolute value instructions are well-supported on the most popular architecture (on x86 since SSSE3, on ARM since the first version of NEON), and naturally complement floating-point absolute value instructions already existing in WebAssembly SIMD.
This PR introduce three new WebAssembly instructions for integer absolute value operations,
i8x16.abs
,i16x8.abs
, andi32x4.abs
, which operate on vectors of 8-bit, 16-bit, and 32-bit integers accordingly. 64-bit version is omitted due to lack of support in common SIMD instruction sets.Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX instruction set
y = i8x16.abs(x)
is lowered toVPABSB xmm_y, xmm_x
y = i16x8.abs(x)
is lowered toVPABSW xmm_y, xmm_x
y = i32x4.abs(x)
is lowered toVPABSD xmm_y, xmm_x
x86/x86-64 processors with SSSE3 instruction set
y = i8x16.abs(x)
is lowered toPABSB xmm_y, xmm_x
y = i16x8.abs(x)
is lowered toPABSW xmm_y, xmm_x
y = i32x4.abs(x)
is lowered toPABSD xmm_y, xmm_x
x86/x86-64 processors with SSE2 instruction set
x = i8x16.abs(x)
is lowered toPXOR xmm_tmp, xmm_tmp + PSUBB xmm_tmp, xmm_x + PMINUB xmm_x, xmm_tmp
y = i8x16.abs(x)
is lowered toPXOR xmm_y, xmm_y + PSUBB xmm_y, xmm_x + PMINUB xmm_y, xmm_x
x = i16x8.abs(x)
is lowered toPXOR xmm_tmp, xmm_tmp + PSUBW xmm_tmp, xmm_x + PMAXSW xmm_x, xmm_tmp
y = i16x8.abs(x)
is lowered toPXOR xmm_y, xmm_y + PSUBW xmm_y, xmm_x + PMAXSW xmm_y, xmm_x
y = i32x4.abs(x)
is lowered to:PXOR xmm_tmp, xmm_tmp
PCMPGT xmm_tmp, xmm_x
MOVDQA xmm_y, xmm_x
PXOR xmm_y, xmm_tmp
PSUBD xmm_y, xmm_tmp
ARM64 processors
y = i8x16.abs(x)
is lowered toABS Vy.16B, Vx.16B
y = i16x8.abs(x)
is lowered toABS Vy.8H, Vx.8H
y = i32x4.abs(x)
is lowered toABS Vy.4S, Vx.4S
ARMv7 processors with NEON instruction set
y = i8x16.abs(x)
is lowered toVABS.S8 Qy, Qx
y = i16x8.abs(x)
is lowered toVABS.S16 Qy, Qx
y = i32x4.abs(x)
is lowered toVABS.S32 Qy, Qx
POWER processors with VMX (Altivec) instruction set
y = i8x16.abs(x)
is lowered toVXOR VRtmp, VRtmp, VRtmp + VSUBUBM VRtmp, VRtmp, VRx + VMAXSB VRy, VRx, VRtmp
y = i16x8.abs(x)
is lowered toVXOR VRtmp, VRtmp, VRtmp + VSUBUHM VRtmp, VRtmp, VRx + VMAXSH VRy, VRx, VRtmp
y = i32x4.abs(x)
is lowered toVXOR VRtmp, VRtmp, VRtmp + VSUBUWM VRtmp, VRtmp, VRx + VMAXSW VRy, VRx, VRtmp
MIPS processors with MSA instruction set
y = i8x16.abs(x)
is lowered toLDI.B Wtmp, 0 + ASUB_S.B Wy, Wx, Wtmp
y = i16x8.abs(x)
is lowered toLDI.H Wtmp, 0 + ASUB_S.H Wy, Wx, Wtmp
y = i32x4.abs(x)
is lowered toLDI.W Wtmp, 0 + ASUB_S.W Wy, Wx, Wtmp