-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal path to stabilization #159
Comments
cc @withoutboats (typo in issue) |
I agree 100% with everything you said. I'd like to add one thing:
I think we should prototype a "super portable SIMD library" that uses I've started already to do this and will commit it as an Just to be clear: I think a portable SIMD library does not belong in And I have one nitpick:
I think that inserting shims is a great idea! The shims are a zero-cost abstraction, that is, if you want to do this there is no more efficient way of doing it. However, if I ever want to avoid the cost of the shims, I'd like to have enough control to do that. So I would prefer if the compiler would hard error when this happens (so that I know where these cases are), and we would provide some functionality to manually tell the compiler "insert shims here". We can always make it automatic in the future without breaking backward compatibility. Anyhow, I don't want to derail this thread with the nitpick, we can discuss that somewhere else. |
@alexcrichton This plan sounds solid to me. 👍 In terms of ordering work, does anything block putting the intrinsics into libcore behind a feature flag right now? While all of these items need to be done before stabilizing anything, I think they could be done in any order.
Since we're talking about just exposing the lowest-level platform intrinsics, I don't see a lot of choices between what paths we're on. This proposal seems like stabilizing just the things there are no alternatives between. Do you have concrete examples of things you think are uncertain now and that you don't want to lock down through stabilization?
Isn't this supported by using the |
@withoutboats check out the For example, a safe SIMD library can do the runtime detection on each intrinsic call, but that's basically adding an atomic load to each simd operation in the library. Ideally, the dispatch would happen at a higher level (maybe even at the level of
By cost, I meant the cost of converting e.g. 2x 128bit-wide registers into a 256 bit one and vice-versa. The shims do this conversion, which is not free. This conversion can be done manually today, e.g., by manually writing the shims. The question is whether we want to induce this cost "silently" or whether we want instead offer a way to automatically generate these shims, but such that the user has to say that it wants them. I'd rather start conservatively, but either way we are moving in the right direction. Once the shim generation is implemented we can revise whether we want to generate them automatically, or error and let the user decide (ideally in such a way that a library that does it automatically can be built on top of it). FWIW, this means that we can stabilize the intrinsics, the vector types, and |
@withoutboats correct yeah, I think we could make inroads immediately on including this library into libcore. I'd agree that we should play around with this interface some more! That's sort of what we expect to naturally happen as this becomes more broadly available on nightly (e.g. included into libcore). I'm a little skeptical like @withoutboats, however, that this will turn up a whole lot and I wouldn't want to necessarily block on a proof-of-concept library (depending on the circumstances) In my mind the shims are basically necessary for this to be a "reasonable" feature in terms of usability and expectations of how Rust code works "in the large". I'd imagine that the impact of such shims would show up very rarely in practice. |
Yes, I don't think we should block moving this into
Yes, I also expect that these shims will be inserted very rarely in practice, and when it does, what most people will want is the shims to be generated anyways. |
@alexcrichton Thanks for this proposal! Broadly speaking, I'm OK with this. I would however like to note some costs so that we're all on the same page. In particular, if we omit everything except the type definitions themselves, then I think the following is true:
I do think this is a fine first step though! |
Yeah - I was porting StreamVByte from 32 to 64 bit values. The encode kernel interprets its vectors as all of the u8x32, u32x8, i32x8, and u64x4 types. I have to convert at least one of the vector arguments to every single intrinsic call. I don't know how common that stye of SIMD workload is, but the type safety was a net negative. If the |
@BurntSushi excellent points! I'm just trying to thread the needle as "quickly as possible" in the sense of building up a conservative base that can be experimented with while ensuring we can continue to add features. I'd hope enabling the @sfackler an interesting case study! Did the original code rely on implicit conversions done by the C compiler? I think we're slighly more principled than |
So FWIW I wanted to reimplement simd_cast on top of stdsimd because I have
the same problem as @sflacker and without the proper as methods nd From
trait impls this is a pain to do because 1) there are not intrinsic for al
conversions and 2) because to offer portability I need to use llvm sind
cast any ways. Adding enough as methods and from impls should fix this so I
am hopeful that this will not be a major issue to overcome with the current
approach.
…On Mon 30. Oct 2017 at 23:21, Steven Fackler ***@***.***> wrote:
Yeah - I was porting StreamVByte <https://github.com/lemire/streamvbyte>
from 32 to 64 bit values. The encode kernel
<https://github.com/sfackler/stream-vbyte64/blob/ea0d5b0afdf97f31473fafbf33d460fbbb313785/src/lib.rs#L115>
interprets its vectors as all of the u8x32, u32x8, i32x8, and u64x4 types.
I have to convert at least one of the vector arguments to every single
intrinsic call.
I don't know how common that stye of SIMD workload is, but the type safety
was a net negative. If the Into impls went away it'd be even worse.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3NptJ5zoPgiIvZD1suFB5DblmUG1jdks5sxkv3gaJpZM4QJyau>
.
|
@alexcrichton From what I can tell, the C code is able to work entirely with |
My understanding is that many of the |
There are two things people want/need:
- 1) element-wise as cast: f32x4 as f64x4
- 2) same vector width transmutes: u32x4 as u64x2
We currently offer the as casts to do both, plus the from implementations.
I think we should be a more principled here, but this is something that
we'll definitely nail down before stabilization.
…On Tue 31. Oct 2017 at 00:32, Henry de Valence ***@***.***> wrote:
My understanding is that many of the From casts currently in the stdsimd
crate are already effectively transmutes/bitcasts -- for instance, i32x8
vs i64x4. Is this correct? Does this mean that the question isn't whether
people will transmute or not, but just whether they'll do it using stdsimd-provided
impls?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3NppKUVFaZldmtxv6Cdc15UArt1_xJks5sxlyZgaJpZM4QJyau>
.
|
In general I just wanted to make a proposal which had stabilization as an option in the near future. We can of course add more requirements to that, for example auditing the current API of |
@gnzlbg I'm not sure I understand, what do you mean by more principled? |
@BurntSushi @gnzlbg do y'all have thoughts specifically on @sfackler's concern above about how the "type safety" on the Intel intrinsics is making them less ergonomic? The alternative being that we switch to precisely the signatures that Intel publishes and then somehow providing constructors via a different mechanism. |
The other (much more minor) concern I'd have with type-safe Intel intrinsics is around figuring out what those types actually are - for things like shuffles we use unsigned types, where I think Clang uses signed types for example. |
My thought is that @sfackler is correct: the type safety makes the intrinsic less ergonomic to use since now you have to write My experience is, however, completely different to @sfackler , but that is because my code never uses The alternatives I see are to either kill I'll be fine either way (we can build higher-level wrappers either way) but I prefer the ARM types (e.g.
Could you elaborate? What is it about these types that you found hard to figure out? By more principled I meant that we can use |
I think my preference is to initially try using more granular types, even for the Intel intrinsics. If more reports like @sfackler's come up before stabilization, then perhaps we can consider an alternative approach. Additionally, I am sympathetic to @gnzlbg's reasoning of keeping the units explicit. That feels like a good thing to me. |
Yeah, in most cases, the types are obvious, but you're right in that some cases, it's less clear, especially signedness. Some intrinsics also operate on "128 bit vectors" without any specific regard to lane size, and we currently use |
What's the rationale for this? Why wouldn't functions that are well-behaved with all inputs be safe?
As noted in the thread on the internals forum, basic bitwise ops, basic lane-wise arithmetic and basic lane-wise comparisons are already provided by LLVM in a cross-ISA way, so pretending that they, too, are vendor-dependent makes things more difficult on every layer. Since it appears that Additionally, it would be very practical for the initial stable API to provide functions for checking that a vector whose lanes logically represent booleans (i.e. the vector is the output of basic comparison operation) has all lanes true or at least one lane true. These functions have trivial but ISA-specific implementations on SSE2 and Aarch64 but non-trivial implementations on ARMv7+NEON. The standard library should shield Rust programmers from having to implement these for ARMv7+NEON on their own. There's a problem though in the context of the current The From the point of view of wishing to use SIMD functionality and currently using the I'd much prefer the cut-off be set according to what have been solved problems for a couple of years in the So, concretely, I propose the following to be the items in the initial stabilization:
(P.S. If |
For example, we use It doesn't actually matter since the instruction doesn't depend on the signedness of the values but it's a bit inconsistent. Having to look in clang codegen tests for what the vector types should be is a bit weird.
The functions are only well-defined if they're running on hardware that implements them. There's a pretty extensive discussion of this on the internals thread: https://internals.rust-lang.org/t/getting-explicit-simd-on-stable-rust/4380. |
Any more specific reference to the internals thread? It seem to me that e.g. ARM vendor functions should be unavailable when compiling for an x86 target instead of being |
@hsivonen Compile time isn't the issue. Runtime is. Comparing ARM with x86 isn't the interesting the case. The And yes, we all think it's unfortunate. |
@hsivonen to be a bit more concrete, when it comes to simd x86 is a lowest common denominator; most x86 simd instructions are not available on all x86 CPUs because they've been added in more recent iterations. For authors distributing binaries (e.g. ripgrep) it would be infeasible to ask them to distribute a different binary for every set of simd instructions a machine could have (and to ask their users to determine which simd instructions their own machine has). |
That is LLVM, this is clang: static __m128i __mm_shuffle_epi8(__m128i __a, __m128i __b); Note that The ARM NEON API is (IMO) a bit better and requires us to provide the strongly-typed vector types. Since we need these anyways, the cost of providing them for X86 as well is low. Whether we should do it is another matter, but if we decide to do it we are necessarily going to introduce some friction with the C API of the x86 intrinsics because it is very close to being completely untyped.
There are many ways to implement a higher-level SIMD library. Instead of trying to push a particular controversial solution through the RFC process, we are trying to find the minimal solution that allows building zero-cost, efficient, portable, and easy to use higher-level SIMD libraries on stable Rust. |
The RFC says "(see the "On the unsafety of #[target_feature]" section).", but there's no such section, so it's still unclear to me why executing an illegal instruction needs to be UB and can't be trusted to cause program termination that's as safe as (If this is answered in the internals forum thread, it's really hard to find, because Discourse 1) breaks ctrl-F and 2) even when using the browser Find feature via the menu, it doesn't work, because Discourse doesn't put the whole discussion it the DOM at the same time.)
I think putting the types in the standard library is the right call, but it also means that trait implementations for things like
I think it's very unfortunate if controversy avoidance is prioritized ahead of making good use of what LLVM provides (cross-ISA basic arithmetic, bitwise ops and comparisons) or what has been shown to be good use of the Rust type system for a couple of year by the I think it's a bad outcome for Rust if solutions to solved problems are rejected. Controversy avoidance isn't a great way to arrive at good design. |
The RFC says "(see the "On the unsafety of #[target_feature]" section).",
but there's no such section
The section was moved to the alternatives to streamline the RFC. It is
explained in:
- Make #[target_feature] safe
- Guarantee no segfaults from unsafe code
I'll PR the RFC repo with a link. It really sucks that markdown has no
cros-referencing features for sections, examples, bibliography, ...
…On Thu 2. Nov 2017 at 08:56, Henri Sivonen ***@***.***> wrote:
https://github.com/gnzlbg/rfcs/blob/target_feature/text/0000-target-feature.md#unconditional-code-generation-target_feature
The TL;DR is that we explicitly need the ability to potentially execute
vendor intrinsics on platforms that don't support them, and if the user
messes the runtime detection up, then you get UB.
The RFC says "(see the "On the unsafety of #[target_feature]" section).",
but there's no such section, so it's still unclear to me why executing an
illegal instruction needs to be UB and can't be trusted to cause program
termination that's as safe as panic=abort.
(If this is answered in the internals forum thread, it's really hard to
find, because Discourse 1) breaks ctrl-F and 2) even when using the browser
Find feature via the menu, it doesn't work, because Discourse doesn't put
the whole discussion it the DOM at the same time.)
There are *many* ways to implement a higher-level SIMD library.
I think putting the types in the standard library is the right call, but
it also means that trait implementations for things like Add and BitOr
belong there and can't be provided by a separate higher-level SIMD library.
Instead of trying to push a particular controversial solution through the
RFC process, we are trying to find the minimal solution that allows
building zero-cost, efficient, portable, and easy to use higher-level SIMD
libraries on stable Rust.
I think it's very unfortunate if controversy avoidance is prioritized
ahead of making good use of what LLVM provides (cross-ISA basic arithmetic,
bitwise ops and comparisons) or what has been shown to be good use of the
Rust type system for a couple of year by the simd crate (boolean vectors
as a way to signal that a vector promises to have all bits of each lane
either set or unset). (In terms of type system usage, I think the design
should optimize for the future of writing new cross-ISA portable Rust code
over ease of porting pre-existing ISA-specific C code.)
I think it's a bad outcome for Rust if solutions to solved problems are
rejected. Controversy avoidance isn't a great way to arrive at good design.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3NpndPkEazUhghPd931JifJ-Zs_g_Rks5syXWlgaJpZM4QJyau>
.
|
I think putting the types in the standard library is the right call, but
it also means that trait implementations for things like Addand BitOr belong
there and can't be provided by a separate higher-level SIMD library
Sure, those libraries will need to wrap the std library types. That's what
tuple structs are for. Whether higher level libraries will use simple
wrappers, or const generics once they are available, remains to be seen.
People will want to play with both.
…---
Note that if stdsimd succeeds, the simd crate in the nursery can be moved
to use stable Rust, and you will be able to use that in the stable channel.
On Thu 2. Nov 2017 at 09:08, Gonzalo BG ***@***.***> wrote:
> The RFC says "(see the "On the unsafety of #[target_feature]"
section).", but there's no such section
The section was moved to the alternatives to streamline the RFC. It is
explained in:
- Make #[target_feature] safe
- Guarantee no segfaults from unsafe code
I'll PR the RFC repo with a link. It really sucks that markdown has no
cros-referencing features for sections, examples, bibliography, ...
On Thu 2. Nov 2017 at 08:56, Henri Sivonen ***@***.***>
wrote:
>
> https://github.com/gnzlbg/rfcs/blob/target_feature/text/0000-target-feature.md#unconditional-code-generation-target_feature
> The TL;DR is that we explicitly need the ability to potentially execute
> vendor intrinsics on platforms that don't support them, and if the user
> messes the runtime detection up, then you get UB.
>
> The RFC says "(see the "On the unsafety of #[target_feature]" section).",
> but there's no such section, so it's still unclear to me why executing an
> illegal instruction needs to be UB and can't be trusted to cause program
> termination that's as safe as panic=abort.
>
> (If this is answered in the internals forum thread, it's really hard to
> find, because Discourse 1) breaks ctrl-F and 2) even when using the browser
> Find feature via the menu, it doesn't work, because Discourse doesn't put
> the whole discussion it the DOM at the same time.)
>
> There are *many* ways to implement a higher-level SIMD library.
>
> I think putting the types in the standard library is the right call, but
> it also means that trait implementations for things like Add and BitOr
> belong there and can't be provided by a separate higher-level SIMD library.
>
> Instead of trying to push a particular controversial solution through the
> RFC process, we are trying to find the minimal solution that allows
> building zero-cost, efficient, portable, and easy to use higher-level SIMD
> libraries on stable Rust.
>
> I think it's very unfortunate if controversy avoidance is prioritized
> ahead of making good use of what LLVM provides (cross-ISA basic arithmetic,
> bitwise ops and comparisons) or what has been shown to be good use of the
> Rust type system for a couple of year by the simd crate (boolean vectors
> as a way to signal that a vector promises to have all bits of each lane
> either set or unset). (In terms of type system usage, I think the design
> should optimize for the future of writing new cross-ISA portable Rust code
> over ease of porting pre-existing ISA-specific C code.)
>
> I think it's a bad outcome for Rust if solutions to solved problems are
> rejected. Controversy avoidance isn't a great way to arrive at good design.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#159 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AA3NpndPkEazUhghPd931JifJ-Zs_g_Rks5syXWlgaJpZM4QJyau>
> .
>
|
This is incorrect. The risk is not that a Quoting the RFC (which I recommend you to read):
|
@hsivonen which particular portable SIMD intrinsics would you like the |
OK. I covered that scenario in my previous comment and my conclusion stands: The
The For the non-shuffle parts, it would be preferable for them to be exposed as methods (trait implementations where possible) on the SIMD types (as seen in the The shuffles (and To be clear: While I'd prefer there not to be vendor functions for the operations that are covered by the non-shuffle, non-cast operations in Additionally, methods (Considering that |
But those are already exposed... or which one is not exposed?
Isn't this is already the case? (modulo bugs)
Do you want these methods for other types beyond boolean vectors? |
They are exposed in the
No. Just boolean vectors. AFAICT, the reason for boolean vector types to exist is to indicate conformance to the precondition of these two methods. Trying to formulate meaning for these methods without the precondition would make them less efficient, at least in the SSE2 case. |
@hsivonen are there any reasons, beyond convenience, for boolean vectors to be part of the |
The comparison methods on the types that should be part of |
I wrote a draft for this in RFC form. |
Ok so for an update on the x86 side of things (not the portable side of things yet) I believe we're currently at:
In that sense the major missing piece is still "this all needs to be sound" with the shim insertion by rustc. Some notes I'd like to take down about the state of x86 as well and its future prospects
I'd be relatively confident about stabilizing these intrinsics in that it's (a) easy to fix bugs, (b) easy to resolve disputes by looking at Intel's docs, and (c) runs a pretty low risk of us making a mistake we can't correct (API wrong, etc). The most worrisome part to me is the availablility of intrinsics across targets. For example why are some intrinsics using i64 available on i686 but some aren't? All in all the most worrisome part to me is pretty low risk in terms of stabilization. |
The intrinsics being grouped into i686, i586, etc is mostly my fault. The
objective was to have the platform macros in one place. So these modules
are basically just hiding the platform macros around all those intrinsics.
It would be good to have some more “sound” definition of when we export an
intrinsic. I’ve tried to export them for the lowest platform in which they
work on Travis... I don’t know if this is good enough.
Also, Rusts definition of i686 is basically any x86 target with Sse2
support. This is weird, in particular because many sse2 intrinsics also
work on x86. I guess we could try to do it in a best effort basis. And if
somebody complains some intrinsic is not available for a lower platform, if
that could work, moving the intrinsic to a lower platform (eg deom i686 to
i586) is backwards compatible.
So if anything I would put the intrinsics in “higher” platforms when in
doubt, unless they really do pass test on lower ones.
…On Sat 20. Jan 2018 at 19:24, Alex Crichton ***@***.***> wrote:
Ok so for an update on the x86 side of things (not the portable side of
things yet) I believe we're currently at:
- Tons of intrinsics (all?) implemented.
- All intrinsics take the vendor types (__m128i and such) rather than
the portable types
- The vendor types (on x86) have a quite minimal API, basically just
giving you Debug as a "nice to have"
In that sense the major missing piece is still "this all needs to be
sound" with the shim insertion by rustc.
Some notes I'd like to take down about the state of x86 as well and its
future prospects
- All implemented intrinsics are verified against Intel's
documentation to have the same signature. There are exceptions
<https://github.com/rust-lang-nursery/stdsimd/blob/df54eacccbdc1751cbf5c5b75ee9e27778003541/stdsimd-verify/tests/x86-intel.rs#L273-L295>,
however, along with interpretation of void* as *mut u8
<https://github.com/rust-lang-nursery/stdsimd/blob/df54eacccbdc1751cbf5c5b75ee9e27778003541/stdsimd-verify/tests/x86-intel.rs#L253-L254>
.
- Tons of intrinsics have #[assert_instr] to prevent them from
regressing over time. Unfortunately some are missing #[assert_instr]
when Intel lists and instruction and others are asserting a different
instruction than what Intel specifies. Note that we *think* at least
that all intrinsics are behaviorally the same.
- Intrinsics are currently grouped into i586/i686/x86_64 groups
largely. This division is somewhat arbitrary though. For example x86_64
intrinsics often refer to a 64-bit integer argument, and the corresponding
instruction can't work on x86 b/c there's no 64-bit integer registers.
There are intrinsics, though, that are in i586 which reference i64
arguments, for example. Additionally the i586/i686 distinction has to do
with the two targets, but in theory I believe this distinction shouldn't
exist because that's what stdsimd is all about, layering target features...
I'd be relatively confident about stabilizing these intrinsics in that
it's (a) easy to fix bugs, (b) easy to resolve disputes by looking at
Intel's docs, and (c) runs a pretty low risk of us making a mistake we
can't correct (API wrong, etc). The most worrisome part to me is the
availablility of intrinsics across targets. For example why are some
intrinsics using i64 available on i686 but some aren't? All in all the most
worrisome part to me is pretty low risk in terms of stabilization.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3NpgucD65nvqOJeDbhMcCfGPCTCxh8ks5tMi9PgaJpZM4QJyau>
.
|
@gnzlbg oh no worries I'm not trying to place blame or anything, I'm still quite a fan of the current organization! It definitely cuts down on
So for me there's sort of two action items for my "main concern" about the platform availability. One would be to discuss the distinction between x86/x86_64. Do we want to forbid intrinsics on x86 if Intel documents them as using 64-bit registers? I'm personally leaning towards "no" here because it seems that C typically has them on both 32 and 64-bit platforms (presumably with emulation on 32?). That, coupled with the fact that Intel doesn't seem to consistently document intrinsics as needing 64-bit registers or not, makes me think that it's probably worthwhile to not draw the distinction. The next item would be to investigate allowing the Overall that would actually remove the distinction between i586/i686/x86_64, and that'd mean at least that all the Intel/AMD vendor intrinsics would be available unconditionally on all x86 platforms (naming is hard). That's nice at least from a stabilization point of view, but I'm curious what others think! |
Maybe this would be better in another issue rather than in this mega-thread: is there already discussion about the best Rust type signatures for the Intel intrinsics? I looked in the |
Why? Not going with lane-aware types (and boolean vector types even) will make things harder in the future (yet another layer of wrappers). (See my previous comments here and my #Rust2018 post.) This is a case where taking incremental steps makes things harder overall that going directly to the what the (To be clear, I want to have the intrinsics, too, but I'm really worried about them not using the lane-aware types and about having them before having the portable part of the |
@hdevalence we had a short-lived issue at #263 for moving to the Intel types, although I'm not aware of another location for ongoing discussion about the best types otherwise. @hsivonen I think #263 should have some more rationale, but does that help? |
The discussion I was looking for is in #251 (linked from #263), and after reading it I think that the decision to match the upstream definitions exactly is the right one. I don't think that it's a good idea to block stabilizing vendor intrinsics until the development of a "portable" SIMD library happens. In fact I don't think they're connected at all and that it's a mistake to try to couple them together. |
It helps me to understand the decision, yes. Thank you. I'm still unhappy about it, though, because even if lane-aware wrappers are developed eventually, having the non-lane-aware versions around makes it more likely that there's going to be vendor-specific code even for stuff that could be written portably using lane-aware types and the features the That is, I believe lane-aware types nudge towards portability for the parts that can be portable, so you end up with ISA-specific divergence for small building blocks instead of whole algorithms having totally separate code for different ISAs.
I might agree if the Having non-portable SIMD on stable and portable SIMD on nightly puts pressure on crates to go with non-portable code in order to have SIMD on stable. It's not good for the ecosystem to steer crates to be Intel-specific when they could be supporting other ISAs, most notable the ARM ones. The portable stuff has real uses. Today, in the Firefox context on x86_64, I measured a 97.54% execution time reduction for our "Does this UTF-16 string contain right-to-left characters?" function with Thai input from replacing C++ that didn't use explicit SIMD with portable |
Sorry for a bikesheddy question (and it probably was discussed before), but why is the crate named "simd" and not "vendor_intrinsics" or something, if it's supposed to give access to intrinsics in general, not just SIMD-related ones, especially given that interfaces are not even |
@petrochenkov Because the purpose of the crate has evolved independently of the name. When I first started this, the intent was to provide both vendor intrinsics and a minimal platform independent SIMD API. Given the evolution of purpose, a name change is likely a good idea. |
Henri there are over 1000 x86 SIMD intrinsics without counting AVX-512 (and
many more if you count PowerPC, ARM, MIPS, Spark, Etc).
An RFC that wants to propose to stabilize the simd crate or something
similar where the intrinsics have a different signature than the spec will
need to motivate the new signature of each intrinsic on a 1:1 basis because
in some cases at least multiple possibilities exist (the x86 spec is
untyped and some intrinsics exploit this). Note that the simd crate only
supports x86 and some of ARM.
I think pushing such an RFC discussing each intrinsic is an inhuman amount
of work and that my time is better spent writing a trivial RFC where we
provide intrinsics that follow the spec 1:1 and that allow such a crate to
be built on stable Rust without having to write an RFC for it.
It would make me very happy (and many other people as well) if we ever have
a strongly typed SIMD solution available in std. The only thing preventing
that from happening is somebody writing an RFC for it. Nothing more,
nothing less. Nobody has volunteered to do this yet and those who have
tried (because we tried: the stdsimd crate used to be strongly types) found
it an unpractical thing to do right now, yet what is being pursued does not
prevent doing this later.
…On Wed 24. Jan 2018 at 19:46, Henri Sivonen ***@***.***> wrote:
I think #263 <#263>
should have some more rationale, but does that help?
It helps me to understand the decision, yes. Thank you.
I'm still unhappy about it, though, because even if lane-aware wrappers
are developed eventually, having the non-lane-aware versions around makes
it more likely that there's going to be vendor-specific code even for stuff
that could be written portably using lane-aware types and the features the
simd crate exposes.
That is, I believe lane-aware types nudge towards portability for the
parts that can be portable, so you end up with ISA-specific divergence for
small building blocks instead of whole algorithms having totally separate
code for different ISAs.
I don't think that it's a good idea to block stabilizing vendor intrinsics
until the development of a "portable" SIMD library happens. In fact I don't
think they're connected at all and that it's a mistake to try to couple
them together.
I might agree if the simd crate didn't exist, but when it exists and it's
working so well for me, it's *so* frustrating that it doesn't get
rubber-stamped for stable (with the vendor-specific bits that overlap with
the systematic vendor intrinsic support effort removed or hidden).
Having non-portable SIMD on stable and portable SIMD on nightly puts
pressure <hsivonen/encoding_rs#23> on crates to
go with non-portable code in order to have SIMD on stable. It's not good
for the ecosystem to steer crates to be Intel-specific when they could be
supporting other ISAs, most notable the ARM ones.
The portable stuff has real uses. Today, in the Firefox context on x86_64,
I measured a 97.54% execution time reduction for our "Does this UTF-16
string contain right-to-left characters?" function with Thai input from
replacing C++ that didn't use explicit SIMD with portable simd
crate-using Rust code. The difference between explicit SIMD and code
written as ALU code is less impressive on aarch64, but it's a great thing
that aarch64 support came as an essentially free by-product. (And
ARMv7+NEON will get automatically fixed once any() and all() for
ARMv7+NEON in the simd crate are fixed
<hsivonen/simd#12>. Once we get Ion
support for aarch64 in SpiderMonkey, we might actually ship the aarch64
code.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3NpsXWTvJNQTa1WejggJkVJ3mAIPC_ks5tN3qIgaJpZM4QJyau>
.
|
Like I said, I now understand the rationale for the approach even though I'm not happy about it. Obviously, researching the lane configuration for everything is a big task, so there's the matter of prioritization: Making maximal intrinsic exposure the priority or making it the priority to do the commonality (i.e. 128-bit vectors) really well typing-wise up front. At this point, I don't think it productive for me to oppose the prioritization that has been chosen.
So we could have a subset (vendor-specific methods removed) of the |
There's something I'm missing in this discussion. If the vendor intrinsics are stabilized (using the vendor-specific types), can't the |
@scottlamb The reason is because the vendor intrinsics are themselves already implemented in terms of LLVM's vector types. For instance, the |
Ok cool, thanks for looking over the discussion! Thanks for the thoughts! I'd at least personally still like to have some discussion of the portable types before we stabilize anything to see how they fit in the bigger picture and all, but for me at least all the vendor intrinsics are currently in the state that they "should" be in (vendor types, defined by Intel, automatically verified, etc). I think I may be less certain than @gnzlbg about an RFC here (but it probably can't hurt!). Would y'all perhaps be up for discussion on an issue solely focused around "What would a first pass of stabilization look like for portable types?" The motivation you oulined @hsivonen with running a risk of commonly being "accidentally less portable" is pretty convincing to me in terms of considering at least a minimal-ish subset of APIs for stabilization. (that being said, I'm also still not certain how we'd want to schedule that with stabilizing the vendor intrinsics, but there's time yet still!). Although this may also be left to in-person discussion amongst stakeholders rather than an issue. I hope to do that when we're a little closer with the technical hurdles taken care of. So maybe not an issue just yet but a discussion soon!
The |
I've submitted to rust-lang/rust#47743 to solve the shim/ABI problem we've been experiencing. We're... edging ever closer to removing blockers for stablization! |
@hsivonen As I see it, the path that has the most chances of success is:
Once we have these two things stable, re-implementing the mid-level In particular, the author of If someone is interested in working on this, I think that writing a single monolithic RFC for it right now is not the best idea because that RFC would be huge and mix way to many controversial and "in progress" things so progress and discussion will be very slow. OTOH writing an RFC / crate proposing an API for portable vector types and their API (e.g. boolean types, reductions, conversions, etc.) is something that I think we want to stabilize soonish anyways, can be done right now in parallel to all other efforts, and something that any future SIMD mid-layer can build upon. The |
Why re-implementing? Would it not be OK to put the existing |
You can try to write an RFC for putting the |
Ok! I'm going to close this now in favor of rust-lang/rfcs#2325, but thanks for the discussion everyone! |
I wanted to open a sort of tracking issue for getting stdsimd into the standard library and on the path to stable Rust. That's sort of what we've wanted all along! What I propose here may seem a bit radical, but hear me out!
First and foremost I see the purpose of the
stdsimd
crate to be providing definitions of the vendor intrinsics. These functions are allunsafe
as previously discussed and use#[target_feature]
to implement themselves. Vendors (like Intel) define how many functions are here and what their names are. The important thing about these intrinsics is that how they're defined is compiler dependent. This, in turn, forces them to be implemented with unstable functionality, namely we must move this to the standard library in one form or another to be stable.Next up if we have vendor intrinsics we also have the types! Each vendor intrinsic has types associated with it, although we are thinking of differing from the upstream vendors with more "principled" types like
u8x16
instead of__m128i
. As a result these types are also defined in the standard library.After this, however, I'd propose we stop. This issue is all about the minimal path to stabilization (while still being useful) and I believe that type definitions plus intrinsics is sufficient. I say this namely because that's all C/C++ have, type definitions and intrinsics. I personally very much want to stabilize more convenient names like
f32x4::splat
oru16x8::extract
where I don't have to remember janky names from Intel, but this is always backwards compatible to add at a later date. I think we have a lot more to gain from a little bit of SIMD now rather than "super portable SIMD right out the gate" much later. I hope to soon after the first pass of stabilization start looking to stabilizing the more convenient and powerful APIs (likeAdd for u32x8
).Once we have this then I think we also need to stabilize the
#[target_feature]
attribute and its associated features. This namely involves finishing implementing RFC 2045, a slight tweak to#[target_feature]
syntax.Up to this point I believe the last remaining issue is "what happens on the classical ABI mismatch issue". Namely, what does rustc do with this function:
(or something like that)
Notably here the
bar
function does not have theavx
feature enabled, so the ABI it will invokefoo
with differs than that of the ABI thatfoo
is expecting, hence causing a mismatch. I would propose a different strategy from RFC 2045 talked about at Rust Belt Rust with some others, namely inserting shims. This I believe solves the function pointer problem as well! Namely:foo
to a function pointer above you would actually get afoo_shim
function pointer, regardless of where the function pointer is created.When this is all assembled I believe we've got a very short path to stable SIMD on Rust today. This path should be sound so long as you have the right CPU feature checks in place, extensible to add more portable extension in the future, and complete in that we have accomodated for the existence of all vendor intrinsics.
So in summary, I believe we have three blockers for stabilization:
#[target_feature]
attribute as specified in RFC 2045Once this is all done and agreed on I think we can also look to moving this crate into the standard library. I believe that would entail basically adding a
src/stdsimd
submodule pointing to this repository, adding apub mod simd
tosrc/libcore/lib.rs
which points to this crate's root (probably setting some sort of#[cfg]
along the way so this crate knows what's happening), and then finally adding a bunch of stability annotations all throughout the crate.Ok so that may be a bit much, but I'm curious what others think!
cc @gnzlbg
cc @BurntSushi
cc @nikomatsakis
cc @withouboats
cc @hdevalence
The text was updated successfully, but these errors were encountered: