-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impl special functions for SIMD #14
Comments
I don't believe we have non-overflowing/non-wrapping ops actually. That is, we only have the wrapping version. |
Having ops that panic on overflow (like Rust's standard integer ops in debug mode) seems like something that would be useful for debugging, even if it has a runtime penalty. It could be disabled by Release mode, like usual. |
I would like to add I think we need to be careful with |
how about naming them |
Update: removing |
So... Is there a reason that these are considered required rather than nice to have? Are there architectures that offer this? I'm not opposed to it (I was working on a SSE cbrt yesterday, so I agree these aren't useless, but there's also a lot of work, and users quite reasonably might want to make different performance/accuracy tradeoffs here. Also, god, properly supporting rounding modes in these is a whole damn can of worms — but hopefully we'll just continue with the good ol rust standby of pretending rounding mode can never change). Anyway, if we're going for ieee754 recommended operations, there are some missing from the recommended set as of 754-2019. I've attached a screenshot of the relevant table. Note: |
I just went down the list of functions on Libre-SOC may potentially provide vector instructions for all of the functions you mentioned, we are almost certainly providing instructions for |
Hm, okay. Some concerns I'd have, mostly since you mentioned GPUs (which tend to answer these questions by picking whatever is fastest — and honestly somewhat fairly, a lot of these are super expensive to handle correctly in SIMD code):
And if not, what do we do? Also relevant to our fallback: I don't think I've ever seen SIMD implementations of this stuff that actually is true for all these. The vectorclass code linked elsewhere appears not to handle all of this (but I didn't look too closely and perhaps it's doing it by structuring the code so it's handled automatically), and IIRC, And to be clear, I'm not saying our fallback implementation has to handle these issues (although certainly we would in an ideal world), but if it doesn't that should be intentional. Also, I guess the fallback could just be extracting each lane and calling libm on it (although this would either require rust libm, which is pretty slow, or force this stuff into libstd). * Regarding 4, I vaguely remember hearing it was UB in rust to change the float env? Possibly because LLVM can't fully handle it, or constant propagation, or who knows. Perhaps we don't really need to handle this if that's the case. I also don't know if this is actually true. |
yeah llvm ignores floating point environment currently during optimization, so if we do anything other than the same we get code that changes based on optimization level, which is classic UB. they're developing alternative llvm ir that would let you follow the fp environment, but currently it's not ready (last i heard around the start of the year). |
Personally, IME changing fpenv is a huge headache and you're better off structuring your code so that it's not needed, even if that means you have to do some computations negated or whatever. This is the one of these I'm least willing to go to bat for as something we should support at all (in truth, I'd be happy for someone to tell me it's totally unsupported and code can assume default rounding mode). This certainly makes the impl of these functions simpler / easier to test). That said IDK, the Rust ... Also, I just realized I forgot to mention fp status registers and triggering the right fp exceptions, if relevant. Anyway, just assume that list of concerns is |
Oh, libm is just wrong in that area. Most of our libm code is just blindly copied from C. The thing is that |
AMDGPU supports infinities, NaNs (though I don't know which values it produces), signed zeros, and different rounding modes. It has 1ULP accuracy for exp2, and log2. Other exp/log instructions are implemented in terms of those. Libre-SOC will have at least 2 modes, one which is only as accurate as Vulkan requires (though if we can provide more than that without much more hardware, we probably will), and one which is supposed to provide the correctly-rounded results specified by IEEE 754 for all supported rounding modes. The second mode may just trap to a software implementation for some of the more complex instructions though, so could be very slow. We haven't decided yet. |
I think it makes sense to right now go with saying that exposing special float ops on SIMD types should currently be a relatively strong statement of "you probably can't beat this speed/accuracy tradeoff" and then implementing the rest (and weighing different speed/accuracy tradeoffs) can be its own ongoing/extended discussion. So if all the relevant vector processors reasonably consistently provide fast and accurate exp/log functions, then we want to expose those right away, and start to set aside other things we know will require more thought. |
I was not able to find integral |
I think we should have Pow on the extended list, wherever that is, even if it is always "library provided" and never actually hardware. |
It would be useful to carve up things between what we can expect to have efficient/fast hardware acceleration for and those that are reasonable but software-only, yes, for the sake of prioritization. |
Why wrap_* were removed here? In my opinion, wrap_* should always do overflow check and return an Option<Simd<_>>, which is different from the behavior of primitive ops (not check on release, check and may panic on debug). |
|
(I didn't find a tracking issue for It is quite reasonably expected that E.g. addition, checked_add(x, y) should only perform an estimated ~3-5 extra operations, given that the branch statement is |
bitwise rotate left/right came up in #328 (comment) (actually most of that issue was discussing rotations rather than chacha20) |
Any updates on this issue? I would offer to help, but I suspect it's above my skill level. |
No updates--the place to start will be adding more intrinsics to the compiler and then use them in the |
Can't promise anything, but I'll take a look at what's currently done, and if it seems achievable I'll have a go. |
Need all of:
wrapping_add/wrapping_sub/wrapping_mul/wrapping_powwrapping_div/wrapping_rem/wrapping_div_euclid/wrapping_rem_euclidwrapping_neg/wrapping_absfor integers:
powoverflowing_powsaturating_powwrapping_shl/wrapping_shrfor floats:
trig
forcore::simd
#6for signed integers and floats:
See also #109
The text was updated successfully, but these errors were encountered: