add arm/aarch64 table lookup and vector combine intrinsics #546

gnzlbg · 2018-08-01T12:53:24Z

No description provided.

gnzlbg · 2018-08-01T14:16:46Z

@Amanieu please do a review of this when you have time just in case, I can fix things in a subsequent PR.

Amanieu · 2018-08-01T15:44:17Z

One thing that concerns me is whether these intrinsics work correctly on big-endian targets (see this link).

For example, Clang's arm-neon.h has the following code:

#ifdef __LITTLE_ENDIAN__
__ai int8x8_t vtbl1_s8(int8x8_t __p0, int8x8_t __p1) {
  int8x8_t __ret;
  __ret = (int8x8_t) __builtin_neon_vtbl1_v((int8x8_t)__p0, (int8x8_t)__p1, 0);
  return __ret;
}
#else
__ai int8x8_t vtbl1_s8(int8x8_t __p0, int8x8_t __p1) {
  int8x8_t __rev0;  __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
  int8x8_t __rev1;  __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
  int8x8_t __ret;
  __ret = (int8x8_t) __builtin_neon_vtbl1_v((int8x8_t)__rev0, (int8x8_t)__rev1, 0);
  __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
  return __ret;
}
#endif

gnzlbg · 2018-08-01T16:08:07Z

@Amanieu they definitely won't work correctly in big endian systems :/

Do we have any big endian arm and aarch64 targets in rustc ? I can't find any. If one can target these via xargo we should probably add them to rustup so that we can set them up in CI here.

Without a system to test on, this would be something that I cannot really fix.

Amanieu · 2018-08-01T16:09:30Z

coresimd/aarch64/neon.rs

+        int8x16x2_t(vcombine_s8(b.0, b.1), vcombine_s8(b.2, ::mem::zeroed())),
+        ::mem::transmute(c),
+    );
+    let m: int8x8_t = simd_lt(c, ::mem::transmute(i8x8::splat(24)));


Should this be an unsigned comparison instead of a signed one? Negative index values are still out of range.

Yes. This bug affects the arm and aarch64 vtbx{N}_s8 intrinsics - the tests should be updated to test these with a negative index value.

So I think that this doesn't really matter.

Note how the signed integer vector is transmuted above into an unsigned integer vector when calling the vqbtx aarch64 wider intrinsic. The negative indices will be out-of-bounds, and will fetch the appropriate value into the result. Only those integers larger than the original vector size and less than twice the original vector size will have an incorrect value, which is what has to be corrected with the mask.

Amanieu · 2018-08-01T16:15:40Z

You are right that there are no big-endian targets at the moment, and it is somewhat a hassle to set one up since big-endian ARM systems are very rare.

So we don't forget about this in the future, maybe use compile_error with a #[cfg] to indicate that these intrinsics are broken on big-endian.

gnzlbg · 2018-08-01T16:23:27Z

You are right that there are no big-endian targets at the moment, and it is somewhat a hassle to set one up since big-endian ARM systems are very rare.

I think we should be able to set up one on qemu as long as we had a proper target right ?

So we don't forget about this in the future, maybe use compile_error with a #[cfg] to indicate that these intrinsics are broken on big-endian.

I am not sure if preventing core from being compiled on arm/aarch64 + big endian is the best approach. Whoever is attempting to target these systems is going to be having a pretty bad day already, and having to fix or comment out dozens of intrinsics that they might not be using won't make it better.

Panicking isn't a good idea either, so.. ideally, I think we should emit a warning if somebody tries to use these while targetting big-endian.

Maybe I could just use the deprecated attribute behind a cfg on arm/aarch64+big endian to warn that these are broken? Or should I use compile error for this and prevent core from being compiled at all?

cc @alexcrichton thoughts?

Amanieu · 2018-08-01T16:29:30Z

I think we should be able to set up one on qemu as long as we had a proper target right ?

The tricky part is getting a big-endian toolchain (GCC and libc) to build libstd and libtest. These aren't available in distro repositories (AFAIK) but you can probably use one from Linaro.

alexcrichton · 2018-08-01T17:24:32Z

Elsewhere in the standard library for modules like this we'd just omit the intrinsics, so could the ones only available on little-endian only be available for little-endian targets?

gnzlbg · 2018-08-01T18:56:05Z

That sounds good, I'll send a PR to put them behind a cfg macro for little endian.

add arm/aarch64 table lookup and vector combine intrinsics

b72014d

gnzlbg requested review from alexcrichton and Amanieu August 1, 2018 12:53

alexcrichton merged commit dac28e1 into rust-lang:master Aug 1, 2018

Amanieu reviewed Aug 1, 2018

View reviewed changes

rubdos mentioned this pull request Dec 5, 2022

Move vector combine intrisics to arm/neon.rs #1363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add arm/aarch64 table lookup and vector combine intrinsics #546

add arm/aarch64 table lookup and vector combine intrinsics #546

gnzlbg commented Aug 1, 2018

gnzlbg commented Aug 1, 2018

Amanieu commented Aug 1, 2018

gnzlbg commented Aug 1, 2018

Amanieu Aug 1, 2018

gnzlbg Aug 1, 2018

gnzlbg Aug 1, 2018

Amanieu commented Aug 1, 2018

gnzlbg commented Aug 1, 2018

Amanieu commented Aug 1, 2018

alexcrichton commented Aug 1, 2018

gnzlbg commented Aug 1, 2018

add arm/aarch64 table lookup and vector combine intrinsics #546

add arm/aarch64 table lookup and vector combine intrinsics #546

Conversation

gnzlbg commented Aug 1, 2018

gnzlbg commented Aug 1, 2018

Amanieu commented Aug 1, 2018

gnzlbg commented Aug 1, 2018

Amanieu Aug 1, 2018

Choose a reason for hiding this comment

gnzlbg Aug 1, 2018

Choose a reason for hiding this comment

gnzlbg Aug 1, 2018

Choose a reason for hiding this comment

Amanieu commented Aug 1, 2018

gnzlbg commented Aug 1, 2018

Amanieu commented Aug 1, 2018

alexcrichton commented Aug 1, 2018

gnzlbg commented Aug 1, 2018