Impl special functions for SIMD #14

programmerjake · 2020-09-30T03:11:09Z

Lokathor · 2020-09-30T03:12:47Z

I don't believe we have non-overflowing/non-wrapping ops actually.

That is, we only have the wrapping version.

programmerjake · 2020-09-30T03:24:17Z

Having ops that panic on overflow (like Rust's standard integer ops in debug mode) seems like something that would be useful for debugging, even if it has a runtime penalty. It could be disabled by Release mode, like usual.

calebzulawski · 2020-09-30T22:57:58Z

I would like to add as_slice and as_array functions to this list.

I think we need to be careful with rotate_left and rotate_right--it's unfortunate that std uses the same name for rotating slice elements and rotating bits (both of these cases apply to SIMD vectors)

programmerjake · 2020-09-30T23:58:44Z

how about naming them rotate_lanes_left/right and rotate_bits_left/right?

Lokathor · 2020-10-05T23:05:35Z

Update: removing floor/ceil/round/trunc/fract from the list, opened #23 instead.

thomcc · 2020-10-06T13:45:10Z

for floats:

* [ ]  trig./hyperbolic functions: #6

* ...

* [ ]  cbrt

* ...

* [ ]  exp/exp2/ln/log/log2/log10

* [ ]  exp_m1/ln_1p

So... Is there a reason that these are considered required rather than nice to have? Are there architectures that offer this?

I'm not opposed to it (I was working on a SSE cbrt yesterday, so I agree these aren't useless, but there's also a lot of work, and users quite reasonably might want to make different performance/accuracy tradeoffs here. Also, god, properly supporting rounding modes in these is a whole damn can of worms — but hopefully we'll just continue with the good ol rust standby of pretending rounding mode can never change).

Anyway, if we're going for ieee754 recommended operations, there are some missing from the recommended set as of 754-2019. I've attached a screenshot of the relevant table.

Screenshot of IEEE754-2019 additional recommended operations

Table 9.1—Additional mathematical operations

Table 9.1—Additional mathematical operations (continued)

Note: rSqrt there is the accurately-rounded of inverse sqrt — specifically, it's not an equivalent to _mmN_rsqrt_ps (it is equivalent to the _mmN_invsqrt_ps that you can get in some places), which is approximated (but we should still expose an approximate rsqrt, since e.g. intel supports it and inverse sqrt is a very common operation in some areas).

programmerjake · 2020-10-06T13:53:18Z

for floats:
* [ ]  trig./hyperbolic functions: #6

* ...

* [ ]  cbrt

* ...

* [ ]  exp/exp2/ln/log/log2/log10

* [ ]  exp_m1/ln_1p
So... Is there a reason that these are considered required rather than nice to have? Are there architectures that offer this?

I just went down the list of functions on f32.

Libre-SOC may potentially provide vector instructions for all of the functions you mentioned, we are almost certainly providing instructions for exp, exp2, ln, log2. IIRC AMDGPU provides some exponential and logarithm functions.

thomcc · 2020-10-06T16:26:50Z

Hm, okay. Some concerns I'd have, mostly since you mentioned GPUs (which tend to answer these questions by picking whatever is fastest — and honestly somewhat fairly, a lot of these are super expensive to handle correctly in SIMD code):

Are non-finite inputs handled properly? If not, how improper?
- -ffast-math-style UB?
- consistent-but-garbage results?
- consistent-but-fixable results? (e.g. wrong sign when returning nan or whatever)
Ditto, but for other out-of-domain inputs — like neg inputs to sqrt.
Are denormals (other than zero) handled properly?
- Here proper just means "correct result".
- I'm only excluding zero because it's unfathomable that it would be broken on 0.0 (assuming 0.0 is part of the function's domain).
Is the current rounding mode respected?
- If applicable, are other relevant aspects of the fp env respected?
- Note: This is probably not relevant on GPUs, but it is for us (I think? *).
Does the function produce a precise (max error within 1ulp) result, or is it approximated?

And if not, what do we do?

Also relevant to our fallback: I don't think I've ever seen SIMD implementations of this stuff that actually is true for all these. The vectorclass code linked elsewhere appears not to handle all of this (but I didn't look too closely and perhaps it's doing it by structuring the code so it's handled automatically), and IIRC, sleef didn't used to but maybe it does now.

And to be clear, I'm not saying our fallback implementation has to handle these issues (although certainly we would in an ideal world), but if it doesn't that should be intentional.

Also, I guess the fallback could just be extracting each lane and calling libm on it (although this would either require rust libm, which is pretty slow, or force this stuff into libstd).

* Regarding 4, I vaguely remember hearing it was UB in rust to change the float env? Possibly because LLVM can't fully handle it, or constant propagation, or who knows. Perhaps we don't really need to handle this if that's the case. I also don't know if this is actually true.

Lokathor · 2020-10-06T16:47:41Z

yeah llvm ignores floating point environment currently during optimization, so if we do anything other than the same we get code that changes based on optimization level, which is classic UB.

they're developing alternative llvm ir that would let you follow the fp environment, but currently it's not ready (last i heard around the start of the year).

thomcc · 2020-10-06T17:10:35Z

Personally, IME changing fpenv is a huge headache and you're better off structuring your code so that it's not needed, even if that means you have to do some computations negated or whatever.

This is the one of these I'm least willing to go to bat for as something we should support at all (in truth, I'd be happy for someone to tell me it's totally unsupported and code can assume default rounding mode). This certainly makes the impl of these functions simpler / easier to test).

That said IDK, the Rust libm seems to handle it... I assume we need to also. (And I mean, it might be a part of floating point I don't like, but it is a part of it)

... Also, I just realized I forgot to mention fp status registers and triggering the right fp exceptions, if relevant. Anyway, just assume that list of concerns is #[non_exhaustive]

Lokathor · 2020-10-06T17:26:14Z

Oh, libm is just wrong in that area. Most of our libm code is just blindly copied from C. The thing is that libm gets too little attention for anyone to care, so oh well.

programmerjake · 2020-10-06T18:33:12Z

AMDGPU supports infinities, NaNs (though I don't know which values it produces), signed zeros, and different rounding modes. It has 1ULP accuracy for exp2, and log2. Other exp/log instructions are implemented in terms of those.

Libre-SOC will have at least 2 modes, one which is only as accurate as Vulkan requires (though if we can provide more than that without much more hardware, we probably will), and one which is supposed to provide the correctly-rounded results specified by IEEE 754 for all supported rounding modes. The second mode may just trap to a software implementation for some of the more complex instructions though, so could be very slow. We haven't decided yet.

workingjubilee · 2020-10-12T16:36:48Z

I think it makes sense to right now go with saying that exposing special float ops on SIMD types should currently be a relatively strong statement of "you probably can't beat this speed/accuracy tradeoff" and then implementing the rest (and weighing different speed/accuracy tradeoffs) can be its own ongoing/extended discussion.

So if all the relevant vector processors reasonably consistently provide fast and accurate exp/log functions, then we want to expose those right away, and start to set aside other things we know will require more thought.

workingjubilee · 2021-05-01T23:25:13Z

I was not able to find integral pow functions on Intel or Arm intrinsic lists, and so have struck them from the lists.
There are hardware accelerated floating point operations for this, of course.

Lokathor · 2021-05-01T23:55:54Z

I think we should have Pow on the extended list, wherever that is, even if it is always "library provided" and never actually hardware.

workingjubilee · 2021-05-02T00:20:05Z

It would be useful to carve up things between what we can expect to have efficient/fast hardware acceleration for and those that are reasonable but software-only, yes, for the sake of prioritization.

Add various fns - Sum/Product traits - recip/to_degrees/to_radians/min/max/clamp/signum/copysign; #14 - mul_add: #14, fixes #102

TennyZhuang · 2022-01-15T02:56:10Z

Why wrap_* were removed here? In my opinion, wrap_* should always do overflow check and return an Option<Simd<_>>, which is different from the behavior of primitive ops (not check on release, check and may panic on debug).

workingjubilee · 2022-01-17T05:34:50Z

Simd<T, N> is implicitly Simd<Wrapping<T>, N>. What you describe is the behavior of the checked_* ops.

ghost · 2022-07-14T03:44:49Z

(I didn't find a tracking issue for checked_*, which is where I would have commented; should you open one?)

It is quite reasonably expected that checked_* operations would be slower than the wrapping equivalents, but I'm not sure what implementations you all have in mind for most checked_* operations?

E.g. addition, checked_add(x, y) should only perform an estimated ~3-5 extra operations, given that the branch statement is if SIMD::saturating_add(x, y) == x + y?

programmerjake · 2023-02-03T07:09:40Z

bitwise rotate left/right came up in #328 (comment) (actually most of that issue was discussing rotations rather than chacha20)

avhz · 2024-02-25T06:13:09Z

Any updates on this issue? I would offer to help, but I suspect it's above my skill level.
But I'm particularly interested in using special functions for SIMD floats (exp, log, etc).

calebzulawski · 2024-02-26T00:18:32Z

No updates--the place to start will be adding more intrinsics to the compiler and then use them in the StdFloat trait.

avhz · 2024-02-26T01:25:52Z

Can't promise anything, but I'll take a look at what's currently done, and if it seems achievable I'll have a go.

workingjubilee added A-floating-point Area: Floating point numbers and arithmetic C-feature-request Category: a feature request, i.e. not implemented / a PR labels Sep 30, 2020

Kerollmops mentioned this issue Feb 23, 2021

Are the alignr/alignl simd functions planned? #78

Open

programmerjake mentioned this issue May 18, 2021

Create a Vector Math library to allow SimdF32::sin and similar to work in core #109

Open

calebzulawski mentioned this issue Jun 13, 2021

Add various fns #138

Merged

workingjubilee added a commit that referenced this issue Jun 23, 2021

Merge pull request #138 from rust-lang/feature/various-fns

3872723

Add various fns - Sum/Product traits - recip/to_degrees/to_radians/min/max/clamp/signum/copysign; #14 - mul_add: #14, fixes #102

calebzulawski mentioned this issue Nov 4, 2021

The meaning of rotate_left/right #180

Closed

programmerjake mentioned this issue Feb 3, 2023

ChaCha20 as a possible addition to examples area #328

Closed

avhz mentioned this issue Mar 3, 2024

feat: add SIMD float math functions (exp, exp2, log, log2, log10, sin… #400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impl special functions for SIMD #14

Impl special functions for SIMD #14

programmerjake commented Sep 30, 2020 •

edited by calebzulawski

Loading

Lokathor commented Sep 30, 2020 •

edited

Loading

programmerjake commented Sep 30, 2020

calebzulawski commented Sep 30, 2020

programmerjake commented Sep 30, 2020

Lokathor commented Oct 5, 2020

thomcc commented Oct 6, 2020

programmerjake commented Oct 6, 2020

thomcc commented Oct 6, 2020

Lokathor commented Oct 6, 2020

thomcc commented Oct 6, 2020 •

edited

Loading

Lokathor commented Oct 6, 2020

programmerjake commented Oct 6, 2020

workingjubilee commented Oct 12, 2020

workingjubilee commented May 1, 2021

Lokathor commented May 1, 2021

workingjubilee commented May 2, 2021 •

edited

Loading

TennyZhuang commented Jan 15, 2022

workingjubilee commented Jan 17, 2022

ghost commented Jul 14, 2022 •

edited by ghost

Loading

programmerjake commented Feb 3, 2023

avhz commented Feb 25, 2024

calebzulawski commented Feb 26, 2024

avhz commented Feb 26, 2024

Impl special functions for SIMD #14

Impl special functions for SIMD #14

Comments

programmerjake commented Sep 30, 2020 • edited by calebzulawski Loading

Lokathor commented Sep 30, 2020 • edited Loading

programmerjake commented Sep 30, 2020

calebzulawski commented Sep 30, 2020

programmerjake commented Sep 30, 2020

Lokathor commented Oct 5, 2020

thomcc commented Oct 6, 2020

programmerjake commented Oct 6, 2020

thomcc commented Oct 6, 2020

Lokathor commented Oct 6, 2020

thomcc commented Oct 6, 2020 • edited Loading

Lokathor commented Oct 6, 2020

programmerjake commented Oct 6, 2020

workingjubilee commented Oct 12, 2020

workingjubilee commented May 1, 2021

Lokathor commented May 1, 2021

workingjubilee commented May 2, 2021 • edited Loading

TennyZhuang commented Jan 15, 2022

workingjubilee commented Jan 17, 2022

ghost commented Jul 14, 2022 • edited by ghost Loading

programmerjake commented Feb 3, 2023

avhz commented Feb 25, 2024

calebzulawski commented Feb 26, 2024

avhz commented Feb 26, 2024

programmerjake commented Sep 30, 2020 •

edited by calebzulawski

Loading

Lokathor commented Sep 30, 2020 •

edited

Loading

thomcc commented Oct 6, 2020 •

edited

Loading

workingjubilee commented May 2, 2021 •

edited

Loading

ghost commented Jul 14, 2022 •

edited by ghost

Loading