Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide Standard for x86 __m128/256i on stable Rust, add 128xN/sizexN SIMD types #1162

Closed
wants to merge 4 commits into from

Conversation

TheIronBorn
Copy link
Contributor

No description provided.

@vks
Copy link
Collaborator

vks commented Aug 23, 2021

Looks good! I wonder how to document this, it is not very discoverable at the moment. Should we document this in the README?

@TheIronBorn
Copy link
Contributor Author

Hmm. We don't have any documentation on Standard for SIMD stuff. We could probably mention the features there. Perhaps Uniform as well. There's also the experimental rustdoc cfg stuff of course

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of unsafe needs attention; after that I'd like to do another review.

src/distributions/integer.rs Outdated Show resolved Hide resolved
src/distributions/integer.rs Outdated Show resolved Hide resolved
@newpavlov
Copy link
Member

newpavlov commented Aug 24, 2021

I don't think we need the SampleNativeEndian trait. Instead it should be enough to directly implement intrinsic_native_le_impl by simply copying simd_impl without the to_le call. Amount of code duplication will be minuscule.

@TheIronBorn
Copy link
Contributor Author

The extra internal NE trait might reduce code duplication for future architectures, but it's still probably minimal

@TheIronBorn
Copy link
Contributor Author

packed_simd failure on latest nightly. Exactly why this is useful

@TheIronBorn
Copy link
Contributor Author

Noticed we mention SIMD Standard and Uniform here https://rust-random.github.io/book/guide-dist.html#uniform-sampling-by-type

Comment on lines 13 to 14
#[cfg(target_arch = "x86")] use core::arch::x86::*;
#[cfg(target_arch = "x86_64")] use core::arch::x86_64::*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want two items, right? I'm not so keen on using glob imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 items now. Added 2 setzero intrinsics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, though if you make the change below those will go away.

Comment on lines 168 to 169
(__m128i, _mm_setzero_si128),
(__m256i, _mm256_setzero_si256)
Copy link
Member

@dhardy dhardy Sep 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm baffled: (1) the types exist without additional target features while the constructors require (sse2 / avx), and (2) the constructors are unsafe. Maybe I should learn a little more about SIMD here...

Stupid questions, but:

  1. This code will fail to compile without sse2 / avx, right?
  2. Is there a reason we shouldn't simply transmute an array with suitable alignment? Especially since we're mostly doing that with the pointer-cast anyway.

Copy link
Member

@newpavlov newpavlov Sep 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there are no dedicated instructions for the setzero intrinsics. Usually they get compiled either down to XORing the same register or to writing zero bytes to memory. I am also a bit surprised that they are gated on sse2/avx, while types themselves are not.

I agree that transmuting arrays would be a simpler solution, but instead of creating an array with proper alignment I think it will be easier to write something like this:

let mut buf = [0u8; mem::size_of::<$ty>()];
rng.fill_bytes(&mut buf);
unsafe {  mem::transmute_copy(&buf) }

transmute_copy will handle the alignment requirements and in practice should be properly optimized out by compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will compile just fine but without see/avx it will fail to run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheIronBorn
Using intrinsics without properly checking required target features (either at compile or at run time) is considered UB.

@dhardy dhardy added the D-changes Do: changes requested label Sep 13, 2021
@TheIronBorn TheIronBorn changed the title provide Standard for x86 __m128/256i on stable Rust, add 128xN SIMD types provide Standard for x86 __m128/256i on stable Rust, add 128xN/sizexN SIMD types Sep 23, 2021
Comment on lines +118 to +124
let mut vec: $ty = <$ty>::default();
unsafe {
let ptr = &mut vec;
let b_ptr = &mut *(ptr as *mut $ty as *mut [u8; mem::size_of::<$ty>()]);
rng.fill_bytes(b_ptr);
}
vec.to_le()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, but we should really use from_bits like the old code to avoid unsafe (but do use fill_bytes instead of gen).

Unfortunately from_bits is not documented on docs.rs; I just dropped a PR for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. Do you mean use fill_bytes on a regular array and then from_slice_unaligned? That would avoid all unsafe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I hadn't figured on Simd<[u8; 2]> etc. being hard to construct from an array. Maybe my suggestion doesn't make sense then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something like

let mut bytes = [0_u8; mem::size_of::<$ty>()];
rng.fill_bytes(&mut bytes);
let vec = $ty::from_bits($u8xN::from_slice_unaligned(&bytes));
vec.to_le()

but usizexN don't have from_bits,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-changes Do: changes requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants