Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD Support #15

Closed
wants to merge 3 commits into from
Closed

SIMD Support #15

wants to merge 3 commits into from

Conversation

Aatch
Copy link
Contributor

@Aatch Aatch commented Mar 19, 2014

RFC for improving SIMD support.

First time I've done something like this, so apoligies for anything I may have gotten wrong.

Acknowledgements to @jensnockert's post here: http://blog.aventine.se/2013/07/16/my-vision-for-rust-simd.html for forming the basis of this RFC

As such, these would be used like so:

```rust
fn make_vec3() -> simd![f32,..3] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax extensions cannot currently expand into types, FYI. I don't believe that it would be a hard change to support it, but it's something that should probably at least be discussed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point on this that I wasn't sure how to fit into the RFC itself is that simd types could be special-cased in the parser, so they aren't actually macros there. That's what I've done in my current branch.

@huonw
Copy link
Member

huonw commented Mar 19, 2014

cc @jensnockert @cartazio @sanxiyn


# Unresolved questions

1. Syntax - should it stay as proposed or is there a better alternative?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one you proposed in IRC is also reasonable: let v: <f32, .. 4> = <0.0, 1.0, 2.0, 3.0> + <0.0, .. 4>;.

(Although I'm mildly worried about some subtle grammar interaction appearing.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the simd![] syntax

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the <f32, ..4> syntax, but on the other hand, we may want to save special syntax for things that appear more often in normal code. The simd! one could just be a normal syntax extension, and then people could add more crazy types that they need. (MIPS accumulators comes to mind)

Unless we decide that cornering the dense linear algebra/DSP/GPGPU market is important for Rust, then really slick SIMD syntax might be worth it.

@cartazio
Copy link

I like the skeleton i see here.

Theres still the need to have a type safe way of expressing shuffles (but nothing in this proposal precludes figuring out that support later). Not just the "llvm shuffle", but also the the various cpu target microarch specific shuffles. This is loosely related to "compile time evaluation", but where you really really need to have those arguments be fixed at compile time.

@cartazio
Copy link

I kinda like the simd![ ] type syntax. It like putting a warning on the tin that simd vectors != normal rust vectors/arrays. (and I think thats a very valuable thing! )

# Unresolved questions

1. Syntax - should it stay as proposed or is there a better alternative?
2. Shuffle support via field access. It's a nice feature that I would like to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be essentially sugar for the llvm_shuffle right?

Aatch added 2 commits March 20, 2014 16:46
Clarify repeat syntax requirements.
Clear up section on comparisons.
Fix error in example code.
Elaborate on similarity to Open CL Vector support
Clarify link between element access and shuffle syntax
Add more detail to the "shuffle assign" operation.
Remove unresolved question regarding shuffling. It's here to stay,
even if not in the current format.
@Aatch
Copy link
Contributor Author

Aatch commented Mar 23, 2014

I've implemented a fair amount of this RFC here: https://github.com/Aatch/rust/tree/simd-support

I'd also like some feedback on the syntax. I'm not too bothered as to what it actually is, but I want it resolved quickly.

@brendanzab
Copy link
Member

+1 on the syntax for me. Looks quite nice, and fits in with the fixed length vector syntax.

@Aatch
Copy link
Contributor Author

Aatch commented Mar 24, 2014

By syntax, I'm also including the field access syntax. As strange as it may seem, it's actually simpler to implement most of the features this way, instead of potentially adding yet another expression node and all that.

let x = v + u
```

will be quivalent to:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "equivalent"

@nikomatsakis
Copy link
Contributor

I feel a bit confused as to what the alternatives are. I'd like to know how this might look if it were more purely macro- or library-based. (For example, the swizzle ops might operate over any type T that implements a SIMD trait and so on).

I'd also like to know what the path is for extending the set of operations beyond "the basics" -- i.e., to cover things like Neon's fancy table lookup instructions and so on. Ideally we'd be able to smoothly extend the set of ops.

I'm surprised that 3-vectors work, but I guess LLVM handles that.

@auroranockert
Copy link

3-vectors in LLVM are 4-vectors due to OpenCL, iirc.

EDIT: I was wrong on the reason… the new rules for type legalization is http://blog.llvm.org/2011/12/llvm-31-vector-changes.html

@Aatch
Copy link
Contributor Author

Aatch commented Mar 25, 2014

@nikomatsakis LLVM handles every size vector you throw at it. I had a test function that swizzled a 3-element vector up to a 6-element vector.

I'm really, really reluctant to even start writing up what it would look like as a library type without associated items. Hell, there's a ton of stuff I want to do with this implementation that won't really work without it. We need associated items in order for any API around this to be at all palatable without compiler integration. There is no way to avoid massive API explosion otherwise, traits or not. With a stronger type system than C/C++, we would end up needing a frankly obscene number of types and intrinsics to make this work. It's why nobody has done anything with the minimal support we have now!

@pnkfelix
Copy link
Member

@Aatch on the flip side, how reasonable does a library type look once one adds associated items?

I'm not really opposed to adding something (as long as it looks reasonable) now under a feature guard, but if there's a chance that there's a cleaner variant that makes use of something like associated items that I'm pretty sure we'll want to put in post-1.0, then that affects the decision about what to do now (namely in terms of what to identify as "interim syntax until other language features land" versus "syntactic forms we all stand behind as being good regardless of what other features are likely to land in the future").

@Aatch
Copy link
Contributor Author

Aatch commented Mar 25, 2014

@pnkfelix it does look better with associated items. However, there is technically no new syntax here, except I suppose the macro-in-type-position, but I don't think that should count. That was a part of my reasoning for it, as well. I guess the only major issue is the swizzle/shuffling syntax, which does require specific support from the compiler, however it is syntactically just field access. I'd be sad to see it go, though.

@pnkfelix
Copy link
Member

@Aatch well, since you brought it up, regarding the field access syntax for swizzle/shuffle, I think part of my problem there is that it is a conceptual mismatch. You cannot take the address of a shuffle's field nor assign into it, right? (That was my motivation for suggesting the use of method inovcations rather than field accesses for swizzles/shuffles in the team mtg)

Is there some way in which field access is the "right thing" here? Or are you merely trying to avoid putting a trailing () (i.e. for the method invocation syntax) on a bunch of swizzle/shuffles?


Update: Sorry, I overlooked line 142: "The same field accessor syntax may be used to set arbitrary components of a vector all at once:in the RFC."

Still not sure I actually like it, nonetheless. (But I can at least see why this is more appealing than a method.) E.g. you still hit the problem that you cannot take the address of these "fields"

@cartazio
Copy link

https://github.com/mozilla/rust/wiki/Meeting-weekly-2014-03-25#simd better link for those trying to look up the thread via email

@cartazio
Copy link

I'm not sure how the shuffle complexity can be punted to macros (cf the meeting notes), would be interesting to see such a design

@dobkeratops
Copy link

very interesting stuff. i agree with the final point in the original linked post, it would be nice to have direct overrides to avoid relying on compiler optimization

Q1) would you consider supporting comparison operations generating select masks?
.. comparison yielding a vectorized bool - and a way to do selects. I guess many of the operators would just be vectorized. would you just catch additional case with more ! directives, or start adding intrinsic functions like C. rust safety being what it is, how might you represent it in the type system... a:simd![f32,..4] b:simd![f32,.4] cmp=simd![bool32,..4] = a>b; c=simd_sel(cmp, d,e) /d,e = another 2 unrelated vectors.. could have written c=simd_sel( a>b, d,e)/

Q2)would you be supporting intel haswell gather (simd indexing) .. less important to me since its not widespread across platforms. (i think the vload/vstore are just misaligned loads?) perhaps it would actually make sense to allow passing a vector index into a [] operator on a vector, yielding a vector .. eg foo[index_vector] = simd![ foo[index_vector[0]], foo[index_vector[1]], foo[index_vector[2]], foo_index_vector[3]] ]. Or perhaps thats best left to an intrinsic :)

Q3)tangential question, I had been wondering if you had considered a simple space in the type for [T,..N] eg [T N] .. but is that too ambiguous/ just not clear. i realise why [T*N] didnt stay

Q4) would you have monstrous u128, u256 .. types that these can be cooerced too (.. i can see you wouldn't want to call them int/unsigned int actually since they wont support int arithmetic, they'd just be raw bit patterns). i suppose you'd coerce to simd![u32,..4] for vectorized bitwise operations.. fancy compression/conversions... - but you want to say its an operation on the bit-pattern , not a float converted

I'm sure the idea of arbitrary data in raw bit fields is going to raise alarm bells , but sometimes data is flipped (aos<->soa).. loaded from standard non-simd structs, permuted into simd vectors, worked on, permuted back & stored.. in the middle you're saying you want to just transpose 4x4 blocks of memory basically..

@Aatch
Copy link
Contributor Author

Aatch commented Mar 27, 2014

@dobkeratops

  • A1 To fit in with the type system, equality operations yielding vectors would be intrinsics something like fn simd_eq<T, n>(v1: simd![T,..n], v2: simd![T,..n]) -> simd![bool,..n]. In LLVM IR, the result of a comparison is a vector of i1. Those vectors are what LLVM expects to be passed to its select instruction that is used for generating blend instructions and what any intrinsic expecting a mask would take too.
  • A2 Again, LLVM supports this via vectors of pointers, which is something I am currently undecided on supporting. I certainly see the utility, however I would have to think about the best way to present the functionality. I have been careful to try and avoid limit future extensions to SIMD support though.
  • A3 I'm not sure what the question is here...
  • A4 Again, I'm not sure what the question is here. However, I would not support coercion, in fact I removed a similar coercion from this RFC after finding that it didn't feel at all natural in the rest of Rust, which largely avoids most coercions.

@cartazio
Copy link

  1. certain classes of shuffle masks have to fixed a compile time, or the CPU and LLVM will both barf. this isn't always true, but on most archs is tis

  2. that should be an intrinsic

  3. no clue/ opinion

  4. i'd be slightly meh about that, though LLVM does have "target lowering" that can do this. But the this would have a lowering that could be very very bad depending on the -march=FOO settings.

@dobkeratops
Copy link

@Aatch @cartazio cartazio let me clarifyQ4 with some C pseudocode
this is a technique that was efficient for some cases on PS3 CELL

struct Foo { f32 x; f32 y; f32 z; i32 flags; 
/* some packed integer control data. could be anything , the point is this is 3 floats, not 4
   could have been {posx,posy,posz, velx,velz,velz,flags, pad}.. whatever.
 */ }

process_foo_x4(Foo* f0,Foo* f1,Foo* f2,Foo* f3  ,..outputs..) {
     raw128simd_t    r0=(raw128simd_t&) *f0;
     raw128simd_t    r1=(raw128simd_t&) *f1;
     raw128simd_t    r2=(raw128simd_t&) *f2;
     raw128simd_t    r3=(raw128simd_t&) *f3;
     raw128simd_t x0123, y0123, z0123, flags0123;
     transpose_4x4(/*input*/ r0,r1,r2,r3,   /*output:*/ &x0123, &y0123, &z0123, &flags0123); // just       transposes 4x4 32bits, opaquely
     // now operate on 4 elements.  eg. lengths.
     len_squared_0123 = x0123*x0123+y0123*y0123+z0123*z0123
     // store square lengths in output

   // could have been anything, eg doing some update on x/y/z, permuting it back to write out.
}

i guess you might call the innerloop "AOSOA4" or something .. batching in scalar structs in 4's with permuting.

@dobkeratops
Copy link

r.e. the vectorized comparison, intrinsics would be perfectly ok and maybe even preferable at this level of granularity.

(I have seen people use operator overloading in C++ for this.... but was never so keen on that , at that level c++ was a hazard not a benefit, you're probably writing something where you want to reason about the asm more..)

@cartazio
Copy link

I agree that having the proper simd vector immediate type is a good idea, just that it shouldn't be conflated with some "really really wide (un)signed int" type.

pcwalton added a commit that referenced this pull request May 2, 2014
@dobkeratops
Copy link

is it complex to allow simd support for (T,T,T,T) tuples and struct {T x,T y,T z,T w} aswell; sometimes these are just more appealing than [T,..4] , e.g. reserving bracket syntax for larger collections; [i].x vs [i][0] .. this is just an issue of personal style, i know.

@DiamondLovesYou
Copy link

@dobkeratops Already done: use #[simd] on your type declarations.

@pczarn
Copy link

pczarn commented May 15, 2014

A distant idea of mine is

type UInt<N: static uint = 64, V: static uint = 1>
type Float<N: static uint, V: static uint = 1>

@sparrisable
Copy link

I came here from the reddit discussion: http://www.reddit.com/r/rust/comments/25mdvz/is_it_time_to_integrate_vectors_like_vec4f_as/

boost.simd is mentioned in there and another possible inspiration might be sierra for c++ : http://www.cdl.uni-saarland.de/papers/lhh14.pdf

I don't know if it is realistic to include a general abstract simd implementation in the scope of this rfc but I thought it would be nice to have the paper referenced.

@brson
Copy link
Contributor

brson commented Jun 5, 2014

I would love to make progress on SIMD, but this is an incredibly important subject that we don't want to risk doing wrong while we're focused on more high-priority tasks.

Right now, the most promising way forward is to get an RFC that has minimal language impact, and that lets authors experiment with SIMD libraries out-of-tree. If somebody wants to keep pushing SIMD forward please propose an RFC that does nothing but add experimental intrinsics, no new types.

Closing.

@brson brson closed this Jun 5, 2014
@brson brson added the postponed label Jun 5, 2014
@pnkfelix
Copy link
Member

pnkfelix commented Jun 5, 2014

FYI, just to follow up on some discussion of swizzle syntax as proposed here, I made some macros for generating the full set of swizzle accessors in my rust-glm port, see here for example usage: https://github.com/pnkfelix/rust-glm/blob/17c4e4de2cc59b174ef644a3beeb1eca38365933/src/vector.rs#L1110

I haven't tried making mutators yet (i.e. a method for assigning to a swizzled name), but IMO I am not convinced that one really needs to use field access syntax to express this.

@rust-highfive rust-highfive mentioned this pull request Sep 24, 2014
@petrochenkov petrochenkov added T-lang Relevant to the language team, which will review and decide on the RFC. and removed postponed RFCs that have been postponed and may be revisited at a later time. labels Feb 24, 2018
wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.