Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD distributions aren't well documented #1227

Closed
clarfonthey opened this issue Apr 17, 2022 · 6 comments
Closed

SIMD distributions aren't well documented #1227

clarfonthey opened this issue Apr 17, 2022 · 6 comments

Comments

@clarfonthey
Copy link
Contributor

clarfonthey commented Apr 17, 2022

Right now, the exact definitions for the distributions for SIMD types aren't documented at all, and I personally had to delve into the source code to (after much difficulty) discover which actual implementation was provided. Going to separate out some thoughts into sections so it's a bit easier to read.

Uniform distributions

Uniform distributions for SIMD types have two natural definitions: either as a linear interpolation between the two values, or as a random value inside the box enclosed by the two provided points.

For floating-point SIMD vectors, both definitions would make logical sense to include somewhere, even though the latter can easily be computed from existing float distributions with one caveat: does "excluding" the end point from the distribution mean that all the facets around the final point are excluded (no dimensions may be equal to the final point) or that just the final point itself is excluded? The former is easily composable with existing distributions, but the latter is not, although I fail to find a case where the latter is that useful.

For integers, the bounding-box definition still is natural, but the linear interpolation is not, since it would require computing the distance between the points and removing common factors from all components before adding. And, unlike floats where precision can be lost, the linear interpolation could easily be done losslessly.

As expected, integers use the bounding-box approach. However, floats use the linear interpolation approach, at least from what I can see. While the linear interpolation is (IMHO) the objectively more useful version to provide, I think that both could be useful in their own right, and there's not really any documentation anywhere that describes what version is provided, whereas floats have entire pages written about what the distributions do.

Standard distributions

While it seems obvious, the standard distributions (and the open and open-closed distributions) simply generate a random number for each lane of the vector for floats. However, a mathematically useful alternative could be generating a random point inside a unit hypersphere, which is not trivial, but definitely a reasonable distribution to add.

Const generics

One thing that would be useful along the SIMD types' distributions is const-generic versions of the existing gen_iter methods that provide a set amount of points. This could potentially mitigate concerns about the distributions' implementations as well by allowing users to generate values as f64x4::from_array(rng.gen_array::<4>()). While not absolutely required for SIMD support, it's something else I'd like to see that's probably mentioned in another issue, so, I won't focus too much on it.

Non-SIMD versions

And finally… it would be useful to have these distributions available outside of the SIMD implementations. Although it would require extra computation, it would be useful to be able to compute uniform ranges along a line without the loss of accuracy that could be accumulated via multiplication. I'm not sure what this API would look like, but IMHO, the SIMD versions should just be wrappers around these distributions, where the distributions themselves might be optimised to use SIMD operations internally regardless of the end result. (once SIMD support is stabilised, that is)

@clarfonthey
Copy link
Contributor Author

This also might be related to #496 but that issue hasn't had any comments for 4 years and a lot has been done since then, so, I'm not sure what the status of that is.

@dhardy
Copy link
Member

dhardy commented Apr 18, 2022

Thanks for the detailed issue. I won't read into the specifics now, but do plan to go over SIMD Uniform distributions as part of #1196 (and would welcome relevant comments on the report there).

@clarfonthey
Copy link
Contributor Author

I should also add that before I wrote this, I didn't know about the rand_distr crate which does have unit ball distributions. So, I guess there is precedent that that kind of distribution wouldn't go in the rand crate, but elsewhere.

@dhardy
Copy link
Member

dhardy commented Feb 6, 2023

I really should have read this earlier, apologies.

either as a linear interpolation between the two values

This is a line inside the plane (or space). I'm surprised you'd think we might have a sampler for that, and especially surprised you might think SIMD algorithms would do that. Granted, we do have UnitCircle which samples from a line in the plane, but... SIMD is Single Instruction Multiple Data, not interpolation. You can think of a SIMD sampler as sampling in (hyper-)box defined by two points or you can think of it as independently sampling N values from N independent ranges. This applies to both int and float variants.

SIMD stuff isn't well documented because it's experimental and not even stable really, but... maybe we should add some basic docs.

generating a random point inside a unit hypersphere

This is a different problem solved by UnitBall.

Const generics

gen_iter was removed a long time ago. In it's place there is sample_iter, e.g. rng.sample_iter(Standard). As here, it can output arrays just fine. I guess you could use this to construct SIMD values, though you can sample them directly too.

This works (using rand master since the previous release used packed_simd_2):

[package]
name = "simd-tests"
version = "0.1.0"
edition = "2021"

[dependencies.rand]
git = "https://github.com/rust-random/rand.git"
rev = "7d73990096890960dbc086e5ad93c453e4435b25"
features = ["simd_support"]
#![feature(portable_simd)]

use rand::prelude::*;
use std::simd::Simd;

fn main() {
    let mut rng = rand::thread_rng();

    let x: Simd<i8, 4> = rng.gen();
    println!("x = {x:?}");

    let y: Simd<f32, 4> = rng.gen_range(
        Simd::<f32, 4>::splat(0.0) ..=
        Simd::from_array([1.0, 2.0, 3.0, 1.0])
    );
    println!("y = {y:?}");

    let z: Simd<i32, 4> = rng.gen_range(
        Simd::<i32, 4>::splat(0) ..=
        Simd::from_array([10, 100, 1000, 10000])
    );
    println!("z = {z:?}");
}

Non-SIMD versions

Sounds like you are talking about UnitBall etc. again.


Action

We should add a little documentation clarifying exactly what SIMD stuff is good for.

Maybe we should also support something like rng.gen_range([0u8; 4] ..= [255u8; 4]), I don't know.

Problem: rng.gen() is generic, but should have stable output. It generates tuple and array output by calling rng.gen() for each element; we can't exactly optimise this properly. The same would be true of rng.gen_range using arrays as above.

@TheIronBorn
Copy link
Contributor

When I ported to std::simd I added a small bit in Standard's doc

/// * SIMD types like x86's [`__m128i`], `std::simd`'s [`u32x4`]/[`f32x4`]/
/// [`mask32x4`] (requires [`simd_support`]), where each lane is distributed
/// like their scalar `Standard` variants. See the list of `Standard`
/// implementations for more.

Though this issue still deserves more thought

@dhardy
Copy link
Member

dhardy commented Nov 20, 2024

Some doc was added in #1526

@dhardy dhardy closed this as completed Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants