Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the alignr/alignl simd functions planned? #78

Open
Kerollmops opened this issue Feb 23, 2021 · 12 comments
Open

Are the alignr/alignl simd functions planned? #78

Kerollmops opened this issue Feb 23, 2021 · 12 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@Kerollmops
Copy link

Kerollmops commented Feb 23, 2021

Hello!

I am, and I know a lot of people are, very interested in this library and was wondering if it was planned to add the _mm_alignr_epi8 functions family? I have found this issue that lists the next features to add but mine was not listed, will maybe be added as a next step?

Thank you very much!

@Kerollmops Kerollmops added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Feb 23, 2021
@Lokathor
Copy link
Contributor

Seems reasonable to put in. I'm not sure how people would want to define it for things other than 128-bit size, but a guess a general byte rotation might be fine.

@bjorn3
Copy link
Member

bjorn3 commented Feb 23, 2021

This is a highliy specialized instruction that is only available on x86. This makes it a bad fit for stdsimd. Stdsimd is supposed to be roughly the biggest common denominator of all platforms supported by rust. Of course LLVM is allowed to optimize a sequence of functions that behaves identical to that intrinsic to a single instruction.

@Lokathor
Copy link
Contributor

Naw it's got a very clear semantics though, "rotate the value by N bytes", which makes it at worst a slightly odd shuffle. It's a reasonable helper method to have i think.

@bjorn3
Copy link
Member

bjorn3 commented Feb 23, 2021

@Lokathor It isn't a byte rotate at all as far as I know. It concatenates blocks from both arguments, shifts a given amount and then takes the lower half of each block.

@thomcc
Copy link
Member

thomcc commented Feb 23, 2021

Yeah, they're not really rotate. They're really useful where available though... I called it out a long time ago as the kind of instruction that would be useful to support but might be hard to describe semantically...

@Lokathor
Copy link
Contributor

ah my mistake, i remember now, it's only a rotate if you pass the same register as both arguments.

the general two-arg form might be weird enough to be very low priority or even out of scope.

@thomcc
Copy link
Member

thomcc commented Feb 23, 2021

This kind of thing is why I was hoping we'd land on some generalization of permutation, which would handle a lot of these styles of intrinsics... but I don't really know what that would look like.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 23, 2021

@Kerollmops No x86 intrinsic per se will be "added", so in a strict sense, the answer is simply No.

...but we will probably offer general APIs that do similar things. The result may be less terse, as e.g. it is quite likely we will offer safe transmutation functions that allow you to use to_ne_bytes and then do the byte rotation (and then interleaving) on your own and then cast from_ne_bytes, and hopefully LLVM will optimize that correctly. There is not actually a whole lot we can do if it doesn't, honestly, as we have a fairly limited amount of power over codegen on this end.

A generalized byte permutation in a single function seems plausible but that's going to take Some Design, especially given the obstacles we already have w/r/t shuffle APIs.

Also that intrinsic is already supported in core::arch and this sort of request reinforces why we will allow people to cast into hardware types and use such intrinsics if they need that kind of optimization.

@thomcc
Copy link
Member

thomcc commented Feb 24, 2021

It's not bytewise, it's bitwise. to/from_ne_bytes doesn't really help.

@Lokathor
Copy link
Contributor

the intel guide says

Operation
tmp[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
dst[127:0] := tmp[127:0]

which seems byte-wise to me.

@thomcc
Copy link
Member

thomcc commented Feb 24, 2021

Ah, right, hmm, my bad. There are some bitwise permutation operations but I'm mistaken here.

@Kerollmops
Copy link
Author

Kerollmops commented Feb 24, 2021

Thank you very much for all your fast answers, I wasn't expecting this amount of interest here 😄

The fact that we will rely on the LLVM codegen suits me and as you say I can use the core::intrinsic function on x86.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

5 participants