Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add arm/aarch64 table lookup and vector combine intrinsics #546

Merged
merged 1 commit into from
Aug 1, 2018

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Aug 1, 2018

No description provided.

@gnzlbg gnzlbg requested review from alexcrichton and Amanieu August 1, 2018 12:53
@alexcrichton alexcrichton merged commit dac28e1 into rust-lang:master Aug 1, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 1, 2018

@Amanieu please do a review of this when you have time just in case, I can fix things in a subsequent PR.

@Amanieu
Copy link
Member

Amanieu commented Aug 1, 2018

One thing that concerns me is whether these intrinsics work correctly on big-endian targets (see this link).

For example, Clang's arm-neon.h has the following code:

#ifdef __LITTLE_ENDIAN__
__ai int8x8_t vtbl1_s8(int8x8_t __p0, int8x8_t __p1) {
  int8x8_t __ret;
  __ret = (int8x8_t) __builtin_neon_vtbl1_v((int8x8_t)__p0, (int8x8_t)__p1, 0);
  return __ret;
}
#else
__ai int8x8_t vtbl1_s8(int8x8_t __p0, int8x8_t __p1) {
  int8x8_t __rev0;  __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
  int8x8_t __rev1;  __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
  int8x8_t __ret;
  __ret = (int8x8_t) __builtin_neon_vtbl1_v((int8x8_t)__rev0, (int8x8_t)__rev1, 0);
  __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
  return __ret;
}
#endif

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 1, 2018

@Amanieu they definitely won't work correctly in big endian systems :/

Do we have any big endian arm and aarch64 targets in rustc ? I can't find any. If one can target these via xargo we should probably add them to rustup so that we can set them up in CI here.

Without a system to test on, this would be something that I cannot really fix.

int8x16x2_t(vcombine_s8(b.0, b.1), vcombine_s8(b.2, ::mem::zeroed())),
::mem::transmute(c),
);
let m: int8x8_t = simd_lt(c, ::mem::transmute(i8x8::splat(24)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an unsigned comparison instead of a signed one? Negative index values are still out of range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This bug affects the arm and aarch64 vtbx{N}_s8 intrinsics - the tests should be updated to test these with a negative index value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that this doesn't really matter.

Note how the signed integer vector is transmuted above into an unsigned integer vector when calling the vqbtx aarch64 wider intrinsic. The negative indices will be out-of-bounds, and will fetch the appropriate value into the result. Only those integers larger than the original vector size and less than twice the original vector size will have an incorrect value, which is what has to be corrected with the mask.

@Amanieu
Copy link
Member

Amanieu commented Aug 1, 2018

You are right that there are no big-endian targets at the moment, and it is somewhat a hassle to set one up since big-endian ARM systems are very rare.

So we don't forget about this in the future, maybe use compile_error with a #[cfg] to indicate that these intrinsics are broken on big-endian.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 1, 2018

You are right that there are no big-endian targets at the moment, and it is somewhat a hassle to set one up since big-endian ARM systems are very rare.

I think we should be able to set up one on qemu as long as we had a proper target right ?

So we don't forget about this in the future, maybe use compile_error with a #[cfg] to indicate that these intrinsics are broken on big-endian.

I am not sure if preventing core from being compiled on arm/aarch64 + big endian is the best approach. Whoever is attempting to target these systems is going to be having a pretty bad day already, and having to fix or comment out dozens of intrinsics that they might not be using won't make it better.

Panicking isn't a good idea either, so.. ideally, I think we should emit a warning if somebody tries to use these while targetting big-endian.

Maybe I could just use the deprecated attribute behind a cfg on arm/aarch64+big endian to warn that these are broken? Or should I use compile error for this and prevent core from being compiled at all?

cc @alexcrichton thoughts?

@Amanieu
Copy link
Member

Amanieu commented Aug 1, 2018

I think we should be able to set up one on qemu as long as we had a proper target right ?

The tricky part is getting a big-endian toolchain (GCC and libc) to build libstd and libtest. These aren't available in distro repositories (AFAIK) but you can probably use one from Linaro.

@alexcrichton
Copy link
Member

Elsewhere in the standard library for modules like this we'd just omit the intrinsics, so could the ones only available on little-endian only be available for little-endian targets?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 1, 2018

That sounds good, I'll send a PR to put them behind a cfg macro for little endian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants