wasm: prefer pmin/pmax #361

myfreeer · 2023-12-02T02:03:32Z

According to emscripten and v8, [f32x4|f64x2].[min|max] compiles to much more instructions than [f32x4|f64x2].[pmin|pmax]. It is defined in spec that the difference between pmin/pmax and min/max is NaN-propagating behavior, and the equivalent to the x86 _mm_min_ps/_mm_max_ps is pmin/pmax in v8. This should make functions with min/max faster on webassembly, and align with the existing behavior with x86 sse.

According to [emscripten](https://emscripten.org/docs/porting/simd.html) and [v8](https://github.com/v8/v8/blob/b6520eda5eafc3b007a5641b37136dfc9d92f63d/src/compiler/backend/x64/code-generator-x64.cc#L2661-L2699), `[f32x4|f64x2].[min|max]` compiles to much more instructions than `[f32x4|f64x2].[pmin|pmax]`. It is defined in [spec](https://github.com/WebAssembly/spec/blob/main/proposals/simd/SIMD.md#floating-point-min-and-max) that the difference between pmin/pmax and min/max is NaN-propagating behavior, and the equivalent to the x86 `_mm_min_ps`/`_mm_max_ps` is pmin/pmax in [v8](https://github.com/v8/v8/blob/b6520eda5eafc3b007a5641b37136dfc9d92f63d/src/compiler/backend/x64/code-generator-x64.cc#L2740-L2747). This should make functions with min/max faster on webassembly, and align with the existing behavior with x86 sse.

recp · 2023-12-02T08:12:27Z

@myfreeer many thanks for contributions for WASM support and its stability, the PR is merged 🚀

gottfriedleibniz · 2023-12-20T16:28:07Z

Should we aim for consistent Nan-propagating behaviour here? The Intel ISA states:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result.

From the V8 link above:

The maxps instruction doesn't propagate NaNs and +0's in its first operand

Therefore to match _mm_max_ps(a, b): wasm_f32x4_pmax(a, b) should be wasm_f32x4_pmax(b, a). For example, consider:

vec4 a, b, c;

glm_vec4_broadcast(NAN, a);
glm_vec4_broadcast(0.0, b);
glm_vec4_maxv(a, b, c);

printf("%f %f %f %f\n", c[0], c[1], c[2], c[3]);

// SSE: 0.000000 0.000000 0.000000 0.000000
// WASM: nan nan nan nan
// ARM: Another can of worms

Generalizing the above: should these functions follow x86 behaviour, IEEE behaviour, or just aim for maximum performance ignoring these edge-cases.

recp · 2023-12-22T19:03:39Z

@gottfriedleibniz many thanks for catching edge-cases like this and help to fix them.

I think we can wrap min/max by glmm_ e.g. glmm_min() and glmm_max() to make behaviors identical

recp merged commit 9b26aff into recp:master Dec 2, 2023
33 checks passed

myfreeer deleted the myfreeer-patch-4 branch December 2, 2023 09:22

Porkepix mentioned this pull request Dec 31, 2023

cglm 0.9.2 Homebrew/homebrew-core#158655

Merged

recp mentioned this pull request Jan 9, 2024

simd: min / max helpers #379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm: prefer pmin/pmax #361

wasm: prefer pmin/pmax #361

myfreeer commented Dec 2, 2023

recp commented Dec 2, 2023

gottfriedleibniz commented Dec 20, 2023 •

edited

Loading

recp commented Dec 22, 2023

wasm: prefer pmin/pmax #361

wasm: prefer pmin/pmax #361

Conversation

myfreeer commented Dec 2, 2023

recp commented Dec 2, 2023

gottfriedleibniz commented Dec 20, 2023 • edited Loading

recp commented Dec 22, 2023

gottfriedleibniz commented Dec 20, 2023 •

edited

Loading