Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there any performance bugs that prevent the usage of shuffle! in the AVX2 backend? #181

Closed
gnzlbg opened this issue Aug 2, 2018 · 4 comments

Comments

@gnzlbg
Copy link

gnzlbg commented Aug 2, 2018

I just read a blog post about this library and was wondering why does the AVX2 backend do not use the shuffle! macro instead of the many other intrinsics for re-ordering vector elements.

If there are any performance issues with it, it would really help if bugs could be filled in packed_simd upstream.

@gnzlbg
Copy link
Author

gnzlbg commented Aug 2, 2018

Duh... the shuffle! macro was never merged into std::simd because of problems with exporting macros from core/std... so I guess that's the reason.

@hdevalence
Copy link
Contributor

Thanks for pointing this out! There's no reason other than that when I wrote the original code last November, the intrinsics were using the packed vector types (which were later moved to packed_simd), and I thought these were on track for stabilization, while the shuffle! macro wasn't.

Later on, the intrinsics were changed to use the bag-of-bits types __m256i and friends, but all the code was already using the u32x8 etc. types and doing operations like + on them. All these would have had to have been replaced by add intrinsics etc., so I used the unstable std::simd stuff instead of staying just with the std::arch intrinsics.

Now, there's no reason not to use the shuffle! macro -- in fact, the existing code is already relying on the fact that the general AVX2 vector permute intrinsic is constant-folded into an LLVM shuffle then and lowered to a faster shuffle-by-immediate instruction.

Using the shuffle! macro also seems like it would be helpful on NEON (#147, cc @isislovecruft), and it would be great to be able to help upstream.

@gnzlbg
Copy link
Author

gnzlbg commented Aug 2, 2018

Looking at the code, the shuffle macro just takes a constant vector of indices.

I think (not sure, haven't tried) that you might be able to keep using the AAAA notation with shuffle! by doing something like:

const AAAA: [i32; 4] = [0, 0, 0, 0];
shuffle!(vec, AAAA); 

@hdevalence
Copy link
Contributor

Closing this for now just because I don't have any plans to refactor the AVX2 backend right now; it could be reopened later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants