🚧 POC - support NaNs for SSE & AVX2 f32 #18

jvdd · 2023-02-04T23:20:18Z

This PR aims at handling Nans, #16.

For the scalar implementation I check whether the scalar value is not equal to itself. This check will only be true if the scalar value is NAN, as the following is correct in Rust:

let v = f32::NAN;
assert!(v != v);

For the SIMD implementation I used a transformation similar to the one used in #1 - this transformation projects the NANs to integer values that are either higher / lower than the "real" floating point values. The transformation leverages the 2-complement https://observablehq.com/@rreusser/half-precision-floating-point-visualized

Some remarks:

+ inf & - inf will get projected as well
- validate if these are inbetween the NaNs & the "real" floating point values (pretty sure this is the case)
  => this is indeed the case - see plot ⬇️
there are a plethora of ways to represent NaNs in floating point representation: exponent should be 11111 and the fraction should be non-zero. Thus the sign bit may be 1 or 0 -> resulting in half of the NaNs getting projected above and the other half below the "real" floating point values. Thus, only 1 of the 2 checks (either > or <) will fire & thus detect the NaN. This might be a problem when we want to implement argmin and argmax as separate functions..

Paths I looked into but did not seem worthwhile:

I tried to handle NaNs with SIMD instructions, but the SIMD instruction to check whether there are a NaNs in the register turns out to be rather computationally expense

codspeed-hq · 2023-02-04T23:23:20Z

CodSpeed Performance Report

Merging #18 NaNs (df40466) will degrade performances by 2.4%.

Summary

🔥 6 improvements
❌ 5 regressions
✅ 33 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`NaNs`	Change
❌	`scalar_random_long_f32`	1.7 ms	2.2 ms	-29.87%
❌	`sse_random_long_f32`	712.1 µs	946.8 µs	-32.96%
❌	`avx_random_long_f32`	402.9 µs	467.1 µs	-15.91%
❌	`impl_random_long_f32`	403.2 µs	467.2 µs	-15.88%
🔥	`sse_random_long_i32`	797.3 µs	776 µs	2.68%
❌	`scalar_random_long_f64`	1.9 ms	2.4 ms	-27.06%
🔥	`avx2_random_long_i64`	868.4 µs	847.1 µs	2.45%
🔥	`impl_random_long_i64`	868.6 µs	847.3 µs	2.45%
🔥	`sse_random_long_u32`	861.4 µs	840 µs	2.48%
🔥	`avx2_random_long_u64`	868.4 µs	847.1 µs	2.45%
🔥	`impl_random_long_u64`	868.7 µs	847.3 µs	2.46%

jvdd · 2023-02-04T23:48:36Z

@varon I just wrote out very quickly what I pursued in this PR (is more a POC than a proper PR).
I'm interested in hearing your feedback on this! You are - of course - free to start pursuing totally different paths on how we can tackle this issue (& thus ignore what I just tried here).

P.S.: I'll improve the phrasing tomorrow - just wanted to quickly push & share the code 🙃

…min / max

jvdd · 2023-02-06T09:45:21Z

CodSpeed Performance Report

| ❌ | sse_random_long_f32 | 712.1 µs | 946.8 µs | -32.96% |

When I run the benchmarks on my local machine I notice only a 3-4% regression 🤔

-> main branch - ignores NaN unless NaN present in first lane
sse_random_long_f32     time:   [41.074 µs 41.109 µs 41.148 µs]
-> this branch - returns NaN index if NaN present
sse_random_long_f32     time:   [42.572 µs 42.660 µs 42.816 µs]

varon · 2023-02-17T19:41:42Z

Is this superseded by #21 ?

jvdd · 2023-02-17T19:59:25Z

Is this superseded by #21 ?

Yes! This was some sort of a proof of concept, showcasing the utility of ord_transform (ordinal transformation of float to int) for NaN handling.
Will close this PR now.

🚧 partially supported NaNs for SSE f32

6ffb4db

varon mentioned this pull request Feb 5, 2023

Add tests for NaN handling #19

Closed

jvdd added 3 commits February 5, 2023 17:56

🧹 cleanup code + return NaN index twice if only identified as either …

188635f

…min / max

🧹 formatting

7ac2504

🔍 enable lto

bf1725b

jvdd added 6 commits February 6, 2023 12:55

🤔 implement avx2 as well

0a66127

🙈 check for avx2 instead of avx

08588bb

🐛 check for avx2 instead of avx in benches

2bd0f8e

🙈 remove avx from runtime feature detection checks

faa3f56

🙈

50ee420

🖊️ undo rename so that codspeed can compare

4f9b767

jvdd changed the title ~~🚧 partially supported NaNs for SSE f32~~ 🚧 suppor NaNs for SSE & AVX2 f32 Feb 6, 2023

jvdd changed the title ~~🚧 suppor NaNs for SSE & AVX2 f32~~ 🚧 support NaNs for SSE & AVX2 f32 Feb 6, 2023

jvdd mentioned this pull request Feb 6, 2023

💪 handle NaNs #16

Closed

1 task

jvdd added 2 commits February 7, 2023 09:43

🐛 load unaligned

bae6139

⚡ use faster SIMD intrinsics

df40466

jvdd mentioned this pull request Feb 12, 2023

🚀 float NaN handling #21

Merged

23 tasks

jvdd changed the title ~~🚧 support NaNs for SSE & AVX2 f32~~ 🚧 POC - support NaNs for SSE & AVX2 f32 Feb 17, 2023

jvdd closed this Feb 17, 2023

jvdd deleted the NaNs branch February 28, 2023 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 POC - support NaNs for SSE & AVX2 f32 #18

🚧 POC - support NaNs for SSE & AVX2 f32 #18

jvdd commented Feb 4, 2023 •

edited

Loading

codspeed-hq bot commented Feb 4, 2023 •

edited

Loading

jvdd commented Feb 4, 2023

jvdd commented Feb 6, 2023 •

edited

Loading

CodSpeed Performance Report

varon commented Feb 17, 2023

jvdd commented Feb 17, 2023

🚧 POC - support NaNs for SSE & AVX2 f32 #18

🚧 POC - support NaNs for SSE & AVX2 f32 #18

Conversation

jvdd commented Feb 4, 2023 • edited Loading

codspeed-hq bot commented Feb 4, 2023 • edited Loading

CodSpeed Performance Report

Summary

Benchmarks breakdown

jvdd commented Feb 4, 2023

jvdd commented Feb 6, 2023 • edited Loading

CodSpeed Performance Report

varon commented Feb 17, 2023

jvdd commented Feb 17, 2023

jvdd commented Feb 4, 2023 •

edited

Loading

codspeed-hq bot commented Feb 4, 2023 •

edited

Loading

jvdd commented Feb 6, 2023 •

edited

Loading