Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Integer absolute value instructions #128

Merged
merged 1 commit into from
Feb 11, 2020

Conversation

Maratyszcza
Copy link
Contributor

@Maratyszcza Maratyszcza commented Oct 28, 2019

Introduction

Integer absolute value instructions are well-supported on the most popular architecture (on x86 since SSSE3, on ARM since the first version of NEON), and naturally complement floating-point absolute value instructions already existing in WebAssembly SIMD.

This PR introduce three new WebAssembly instructions for integer absolute value operations, i8x16.abs, i16x8.abs, and i32x4.abs, which operate on vectors of 8-bit, 16-bit, and 32-bit integers accordingly. 64-bit version is omitted due to lack of support in common SIMD instruction sets.

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

  • i8x16.abs
    • y = i8x16.abs(x) is lowered to VPABSB xmm_y, xmm_x
  • i16x8.abs
    • y = i16x8.abs(x) is lowered to VPABSW xmm_y, xmm_x
  • i32x4.abs
    • y = i32x4.abs(x) is lowered to VPABSD xmm_y, xmm_x

x86/x86-64 processors with SSSE3 instruction set

  • i8x16.abs
    • y = i8x16.abs(x) is lowered to PABSB xmm_y, xmm_x
  • i16x8.abs
    • y = i16x8.abs(x) is lowered to PABSW xmm_y, xmm_x
  • i32x4.abs
    • y = i32x4.abs(x) is lowered to PABSD xmm_y, xmm_x

x86/x86-64 processors with SSE2 instruction set

  • i8x16.abs
    • x = i8x16.abs(x) is lowered to PXOR xmm_tmp, xmm_tmp + PSUBB xmm_tmp, xmm_x + PMINUB xmm_x, xmm_tmp
    • y = i8x16.abs(x) is lowered to PXOR xmm_y, xmm_y + PSUBB xmm_y, xmm_x + PMINUB xmm_y, xmm_x
  • i16x8.abs
    • x = i16x8.abs(x) is lowered to PXOR xmm_tmp, xmm_tmp + PSUBW xmm_tmp, xmm_x + PMAXSW xmm_x, xmm_tmp
    • y = i16x8.abs(x) is lowered to PXOR xmm_y, xmm_y + PSUBW xmm_y, xmm_x + PMAXSW xmm_y, xmm_x
  • i32x4.abs
    • y = i32x4.abs(x) is lowered to:
      • PXOR xmm_tmp, xmm_tmp
      • PCMPGT xmm_tmp, xmm_x
      • MOVDQA xmm_y, xmm_x
      • PXOR xmm_y, xmm_tmp
      • PSUBD xmm_y, xmm_tmp

ARM64 processors

  • i8x16.abs
    • y = i8x16.abs(x) is lowered to ABS Vy.16B, Vx.16B
  • i16x8.abs
    • y = i16x8.abs(x) is lowered to ABS Vy.8H, Vx.8H
  • i32x4.abs
    • y = i32x4.abs(x) is lowered to ABS Vy.4S, Vx.4S

ARMv7 processors with NEON instruction set

  • i8x16.abs
    • y = i8x16.abs(x) is lowered to VABS.S8 Qy, Qx
  • i16x8.abs
    • y = i16x8.abs(x) is lowered to VABS.S16 Qy, Qx
  • i32x4.abs
    • y = i32x4.abs(x) is lowered to VABS.S32 Qy, Qx

POWER processors with VMX (Altivec) instruction set

  • i8x16.abs
    • y = i8x16.abs(x) is lowered to VXOR VRtmp, VRtmp, VRtmp + VSUBUBM VRtmp, VRtmp, VRx + VMAXSB VRy, VRx, VRtmp
  • i16x8.abs
    • y = i16x8.abs(x) is lowered to VXOR VRtmp, VRtmp, VRtmp + VSUBUHM VRtmp, VRtmp, VRx + VMAXSH VRy, VRx, VRtmp
  • i32x4.abs
    • y = i32x4.abs(x) is lowered to VXOR VRtmp, VRtmp, VRtmp + VSUBUWM VRtmp, VRtmp, VRx + VMAXSW VRy, VRx, VRtmp

MIPS processors with MSA instruction set

  • i8x16.abs
    • y = i8x16.abs(x) is lowered to LDI.B Wtmp, 0 + ASUB_S.B Wy, Wx, Wtmp
  • i16x8.abs
    • y = i16x8.abs(x) is lowered to LDI.H Wtmp, 0 + ASUB_S.H Wy, Wx, Wtmp
  • i32x4.abs
    • y = i32x4.abs(x) is lowered to LDI.W Wtmp, 0 + ASUB_S.W Wy, Wx, Wtmp

@Maratyszcza
Copy link
Contributor Author

The need for integer absolute value instructions was suggested by @jan-wassenberg in the recent SIMD WG sync, and in #176. As no one voiced either support nor critique for the proposal in this PR, I'd like to explicitly put it to vote:

  • In favor of including Integer Absolute Value ops in the current proposal, please respond with 👍
  • Against including Integer Absolute Value ops in the current proposal, please respond with 👎

@dtig
Copy link
Member

dtig commented Jan 21, 2020

Thanks @Maratyszcza for adding this poll, voted, but also explicitly in favor of adding this set of operations to the proposal as it benefits different sets of applications and maps to one instruction on most relevant platforms. As the votes look in favor, this will be merged after waiting for a reasonable time to vote, please respond here for any concerns/objections to including the integer value operations to the current SIMD proposal.

@munrocket
Copy link

Thank you for this PR.

@ngzhian ngzhian requested a review from dtig February 6, 2020 18:53
@dtig
Copy link
Member

dtig commented Feb 7, 2020

@AlphaHot as the dissenting vote, would you like to share why you object to adding these operations to the proposal?

@dtig dtig merged commit 77e7fda into WebAssembly:master Feb 11, 2020
tlively added a commit to llvm/llvm-project that referenced this pull request Mar 20, 2020
Summary:
These were merged to the SIMD proposal in
WebAssembly/simd#128.

Depends on D76397 to avoid merge conflicts.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76399
tlively added a commit to tlively/binaryen that referenced this pull request Mar 20, 2020
Adds full support for the {i8x16,i16x8,i32x4}.abs instructions merged
to the SIMD proposal in WebAssembly/simd#128
as well as the {i8x16,i16x8,i32x4}.bitmask instructions proposed in
WebAssembly/simd#201.
tlively added a commit to WebAssembly/binaryen that referenced this pull request Mar 20, 2020
Adds full support for the {i8x16,i16x8,i32x4}.abs instructions merged
to the SIMD proposal in WebAssembly/simd#128
as well as the {i8x16,i16x8,i32x4}.bitmask instructions proposed in
WebAssembly/simd#201.
arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Apr 2, 2020
Summary:
These were merged to the SIMD proposal in
WebAssembly/simd#128.

Depends on D76397 to avoid merge conflicts.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76399
@aqrit
Copy link

aqrit commented Feb 19, 2021

The illustrated i32x4.abs lowering for SSE2 is incorrect.

(x - y) ^ y should have been (x ^ y) - y or (x + y) ^ y.

@Maratyszcza
Copy link
Contributor Author

@aqrit Thanks for reporting. Fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants