Relaxed Rounding Q-format Multiplication #40

Maratyszcza · 2021-10-01T21:39:43Z

What are the instructions being proposed?

I propose a relaxed version of the Saturating Rounding Q-format Multiplication i16x8.q15mulr_sat_s introduced in WebAssembly/simd#365. I suggest i16x8.q15mulr_s as the tentative name for the relaxed instruction.

What are the semantics of these instructions?

i16x8.q15mulr_sat_s implements the mathematical operation of multiplication of fixed-point numbers in Q15 format (see WebAssembly/simd#365 for details). The multiplication overflows if and only if both inputs are INT16_MIN, and x86 SSSE3 and ARM NEON instructions differ in how they handle this situation: x86 version wraps around while ARM version saturates. WebAssembly SIMD instruction i16x8.q15mulr_sat_s standardized on the ARM overflow semantics, resulting in additional overflow checks on x86. However, as the case of both inputs INT16_MIN is rare and often can be guaranteed to never happen due to higher-level structure of an algorithm, having an relaxed version that allows both overflow options would help performance on x86.

The proposed i16x8.q15mulr_s Relaxed SIMD instruction computes the lane-wise rounded multiplication of Q15 numbers, and allows for either saturation or wrap-around behavior in the overflow case (where both inputs are INT16_MIN).

How will these instructions be implemented?

x86/x86-64 processors with AVX instruction set

y = i16x8.q15mulr_s(a, b) is lowered to VPMULHRSW xmm_y, xmm_a, xmm_b

x86/x86-64 processors with SSSE3 instruction set

y = i16x8.q15mulr_s(a, b) is lowered to MOVDQA xmm_y, xmm_a + PMULHRSW xmm_y, xmm_b

x86/x86-64 processors with SSE2 instruction set

y = i16x8.q15mulr_s(a, b) (y is NOT a and y is NOT b) is lowered to
- MOVDQA xmm_y, xmm_a
- MOVDQA xmm_tmp, xmm_a
- PMULLW xmm_y, xmm_b
- PMULHW xmm_tmp, xmm_b
- PSRLW xmm_y, 14
- PADDW xmm_tmp, xmm_tmp
- PAVGW xmm_y, wasm_i16x8_splat(0)
- PADDW xmm_y, xmm_tmp

ARM64 processors

y = i16x8.q15mulr_s(a, b) is lowered to SQRDMULH Vy.8H, Va.8H, Vb.8H

ARMv7 processors with NEON instruction set

y = i16x8.q15mulr_s(a, b) is lowered to VQRDMULH.S16 Qy, Qa, Qb

Reference lowering through the WAsm SIMD128 instruction set

y = i16x8.q15mulr_s(a, b) is lowered as y = i16x8.q15mulr_sat_s(a, b)

How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

When both inputs are INT16_MIN, x86/x86-64 will produce INT16_MIN result while ARM/ARM64 will produce INT16_MAX result. x86/x86-64 can already be distinguished from ARM/ARM64 based on NaN behavior, so this instruction doesn't add any new fingerprinting surfaces.

What use cases are there?

The text was updated successfully, but these errors were encountered:

ngzhian · 2022-02-18T18:24:36Z

Instruction LGTM, please leave comments or thumbs up. I will add this to overview some time next week.

As proposed in WebAssembly/relaxed-simd#40.

Maratyszcza added the instruction-proposal label Oct 1, 2021

dtig mentioned this issue Feb 17, 2022

SIMD subgroup meeting on 2022-02-18 #50

Closed

ngzhian added the outstanding instruction proposed instructions not yet added to overview label Feb 18, 2022

ngzhian mentioned this issue Mar 1, 2022

Add Relaxed Rounding Q-format Multiplication to overview #59

Merged

ngzhian added in-overview Instruction has been added to Overview.md and removed outstanding instruction proposed instructions not yet added to overview labels Mar 7, 2022

tlively added a commit to WebAssembly/binaryen that referenced this issue Apr 7, 2022

Implement i16x8.relaxed_q15mulr_s

33e5943

As proposed in WebAssembly/relaxed-simd#40.

tlively mentioned this issue Apr 7, 2022

Implement i16x8.relaxed_q15mulr_s WebAssembly/binaryen#4583

Merged

tlively added a commit to WebAssembly/binaryen that referenced this issue Apr 7, 2022

Implement i16x8.relaxed_q15mulr_s (#4583)

094deb0

As proposed in WebAssembly/relaxed-simd#40.

dtig mentioned this issue Jun 16, 2022

WebAssembly Relaxed SIMD mozilla/standards-positions#651

Open

mr-c mentioned this issue Nov 21, 2023

Relaxed SIMD support simd-everywhere/simde#856

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relaxed Rounding Q-format Multiplication #40

Relaxed Rounding Q-format Multiplication #40

Maratyszcza commented Oct 1, 2021 •

edited

Loading

ngzhian commented Feb 18, 2022

Relaxed Rounding Q-format Multiplication #40

Relaxed Rounding Q-format Multiplication #40

Comments

Maratyszcza commented Oct 1, 2021 • edited Loading

What are the instructions being proposed?

What are the semantics of these instructions?

How will these instructions be implemented?

x86/x86-64 processors with AVX instruction set

x86/x86-64 processors with SSSE3 instruction set

x86/x86-64 processors with SSE2 instruction set

ARM64 processors

ARMv7 processors with NEON instruction set

Reference lowering through the WAsm SIMD128 instruction set

How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

What use cases are there?

ngzhian commented Feb 18, 2022

Maratyszcza commented Oct 1, 2021 •

edited

Loading