Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚧 POC - support NaNs for SSE & AVX2 f32 #18

Closed
wants to merge 12 commits into from
Closed

🚧 POC - support NaNs for SSE & AVX2 f32 #18

wants to merge 12 commits into from

Conversation

jvdd
Copy link
Owner

@jvdd jvdd commented Feb 4, 2023

This PR aims at handling Nans, #16.

For the scalar implementation I check whether the scalar value is not equal to itself. This check will only be true if the scalar value is NAN, as the following is correct in Rust:

let v = f32::NAN;
assert!(v != v);

For the SIMD implementation I used a transformation similar to the one used in #1 - this transformation projects the NANs to integer values that are either higher / lower than the "real" floating point values. The transformation leverages the 2-complement https://observablehq.com/@rreusser/half-precision-floating-point-visualized

Some remarks:

  • + inf & - inf will get projected as well
    • validate if these are inbetween the NaNs & the "real" floating point values (pretty sure this is the case)
      => this is indeed the case - see plot ⬇️
      image
  • there are a plethora of ways to represent NaNs in floating point representation: exponent should be 11111 and the fraction should be non-zero. Thus the sign bit may be 1 or 0 -> resulting in half of the NaNs getting projected above and the other half below the "real" floating point values. Thus, only 1 of the 2 checks (either > or <) will fire & thus detect the NaN. This might be a problem when we want to implement argmin and argmax as separate functions..

Paths I looked into but did not seem worthwhile:

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 4, 2023

CodSpeed Performance Report

Merging #18 NaNs (df40466) will degrade performances by 2.4%.

Summary

🔥 6 improvements
❌ 5 regressions
✅ 33 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main NaNs Change
scalar_random_long_f32 1.7 ms 2.2 ms -29.87%
sse_random_long_f32 712.1 µs 946.8 µs -32.96%
avx_random_long_f32 402.9 µs 467.1 µs -15.91%
impl_random_long_f32 403.2 µs 467.2 µs -15.88%
🔥 sse_random_long_i32 797.3 µs 776 µs 2.68%
scalar_random_long_f64 1.9 ms 2.4 ms -27.06%
🔥 avx2_random_long_i64 868.4 µs 847.1 µs 2.45%
🔥 impl_random_long_i64 868.6 µs 847.3 µs 2.45%
🔥 sse_random_long_u32 861.4 µs 840 µs 2.48%
🔥 avx2_random_long_u64 868.4 µs 847.1 µs 2.45%
🔥 impl_random_long_u64 868.7 µs 847.3 µs 2.46%

@jvdd
Copy link
Owner Author

jvdd commented Feb 4, 2023

@varon I just wrote out very quickly what I pursued in this PR (is more a POC than a proper PR).
I'm interested in hearing your feedback on this! You are - of course - free to start pursuing totally different paths on how we can tackle this issue (& thus ignore what I just tried here).

P.S.: I'll improve the phrasing tomorrow - just wanted to quickly push & share the code 🙃

@varon varon mentioned this pull request Feb 5, 2023
@jvdd
Copy link
Owner Author

jvdd commented Feb 6, 2023

CodSpeed Performance Report

| ❌ | sse_random_long_f32 | 712.1 µs | 946.8 µs | -32.96% |

When I run the benchmarks on my local machine I notice only a 3-4% regression 🤔

-> main branch - ignores NaN unless NaN present in first lane
sse_random_long_f32     time:   [41.074 µs 41.109 µs 41.148 µs]
-> this branch - returns NaN index if NaN present
sse_random_long_f32     time:   [42.572 µs 42.660 µs 42.816 µs]

@jvdd jvdd changed the title 🚧 partially supported NaNs for SSE f32 🚧 suppor NaNs for SSE & AVX2 f32 Feb 6, 2023
@jvdd jvdd changed the title 🚧 suppor NaNs for SSE & AVX2 f32 🚧 support NaNs for SSE & AVX2 f32 Feb 6, 2023
@jvdd jvdd mentioned this pull request Feb 6, 2023
1 task
@jvdd jvdd mentioned this pull request Feb 12, 2023
23 tasks
@varon
Copy link
Contributor

varon commented Feb 17, 2023

Is this superseded by #21 ?

@jvdd jvdd changed the title 🚧 support NaNs for SSE & AVX2 f32 🚧 POC - support NaNs for SSE & AVX2 f32 Feb 17, 2023
@jvdd
Copy link
Owner Author

jvdd commented Feb 17, 2023

Is this superseded by #21 ?

Yes! This was some sort of a proof of concept, showcasing the utility of ord_transform (ordinal transformation of float to int) for NaN handling.
Will close this PR now.

@jvdd jvdd closed this Feb 17, 2023
@jvdd jvdd deleted the NaNs branch February 28, 2023 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants