Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_mm_rsqrt_ss not matching simde_mm_rsqrt_ss fail #1222

Open
YileKu opened this issue Sep 15, 2024 · 8 comments
Open

_mm_rsqrt_ss not matching simde_mm_rsqrt_ss fail #1222

YileKu opened this issue Sep 15, 2024 · 8 comments

Comments

@YileKu
Copy link

YileKu commented Sep 15, 2024

A0: 00 00 40 40 00 00 00 00 00 00 00 00 00 00 00 00
B0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00

  auto mul_A0 = _mm_mul_ss(A0,A0);
   auto mul_B0 = _mm_mul_ss(B0,B0);
   auto add_ss = _mm_add_ss(mul_A0, mul_B0 );

mul_a0: 00 00 10 41 00 00 00 00 00 00 00 00 00 00 00 00
mul_b0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00
add_ss: 00 00 20 41 00 00 00 00 00 00 00 00 00 00 00 00

   auto root = _mm_rsqrt_ss( add_ss );

root: 00 E0 A1 2E 00 00 00 00 00 00 00 00 00 00 00 00

On a Cortex-A72 using simde_mm_rsqrt_ss:

A0: 00 00 40 40 00 00 00 00 00 00 00 00 00 00 00 00
B0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00
add_ss: 00 00 20 41 00 00 00 00 00 00 00 00 00 00 00 00

root: 00 80 A1 3E 00 00 00 00 00 00 00 00 00 00 00 00

@YileKu
Copy link
Author

YileKu commented Sep 16, 2024

This code gives different results when run on intel and with simd-everywhere headers on cortex-a72

void ldump_debug (char *t, void *_d, int len)
{
fprintf(stdout,"%s: ",t);
unsigned char *cp = (unsigned char *)_d;
for (int i= 0; i<len; i++, cp++)
fprintf(stdout,"%02X ", *cp );
fprintf(stdout,"\n");
}

__m128 t = { 0x00002041, 00, 00, 00 } ;
auto out = _mm_rsqrt_ss(t);
ldump_debug("LOCAL", &out, sizeof(out));

On Cortex-a72: LOCAL: 00 00 34 3C 00 00 00 00 00 00 00 00 00 00 00 00
On Intel : LOCAL: 00 48 34 3C 00 00 00 00 00.....

@mr-c
Copy link
Collaborator

mr-c commented Sep 17, 2024

Hello @YileKu and thank you for your report

Did you try compiling with -DSIMDE_ACCURACY_PREFERENCE=2, or adding #define SIMDE_ACCURACY_PREFERENCE 2 before including the SIMDe header in your application?

@YileKu
Copy link
Author

YileKu commented Sep 18, 2024 via email

@YileKu
Copy link
Author

YileKu commented Sep 18, 2024 via email

@mr-c
Copy link
Collaborator

mr-c commented Sep 18, 2024

So isn’t the precision implicit in the API?
Are there other AVX apis that need a clarification when being mapped to
NEON?

That's a good question. I didn't write this code. I think https://github.com/simd-everywhere/simde?tab=readme-ov-file#caveats should be updated with this information

@YileKu
Copy link
Author

YileKu commented Sep 19, 2024

Tried with the #define above and it still didn't work.

@nemequ
Copy link
Member

nemequ commented Sep 26, 2024

The rsqrt instructions are interesting. They're not actually specified to require bit-accurate implementations, but are instead specified as being mathematically accurate to a given precision. See the Intel API docs:

The maximum relative error for this approximation is less than 1.5*2^-12.

The instructions aren't even bit-compatible across CPU manufacturers… Intel and AMD return different values.

I'm not saying the implementation is perfect, only that bit-accurate results are not expected. It's possible some implementations have a higher maximum relative error than specified, but they should be pretty comparable, at least with a higher accuracy preference selected.

@YileKu
Copy link
Author

YileKu commented Sep 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants