-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Fused Multiply-Add (FMA) #102
Comments
Yes, it's simply not implemented yet, but is planned. It's easy enough to add so I'll do it when I get a chance |
Well, I'm looking at making a PR for it right now. Currently it looks like this: /// Performs `self * b + c` as a single operation.
#[inline]
pub fn fma(self, b: Self, c: Self) -> Self {
unsafe { crate::intrinsics::simd_fma(self, b, c) }
} But maybe it should be something like |
we also will want something like |
Yeah, I've just tested the Presumably both options should be made available, as I understand that fused multiply add normally comes with the guarantee of machine precision for the whole operation, whereas Doesn't look like a platform intrinsic for fmuladd is exposed yet. |
I believe the function should be called mul_add to match the equivalent scalar functions in std. Exposing fast versions of intrinsics is probably a separate issue and definitely affects other functions too |
Sounds good, with appropriate tests! I would rather not have two functions that do the same thing, nor another f32 type (SimdFastF32? what?). Resolving this is beyond the scope of this issue, however. |
yeah people know the term "multiply add" and |
maybe: impl<const N: usize> f32x<N> {
pub fn mul_add<const STRICT: bool = true>(self, multiplier: Self, addend: Self) -> Self {
if STRICT {
// llvm.fma.*
} else {
// llvm.fmuladd.*
}
}
} |
Since we're not tied to being an exact hardware impl, I think we should just define it from the start to always let llvm pick what's best. If people absolutely want strict ops they can transmute the value to a platform version and use stdarch. Particularly, I think that const-generics in methods and functions that aren't simply passed in from the type are very bad ergonomically at this time, and should be avoided if possible. |
IEEE 754-2008 specifies FMA to have a single rounding step. |
I can't tell if that means that you're for or against having strict fma |
I'm for having strict fma by default, alongside another function with semantics of: use separate mul & add (correctly rounded) or use fma (correctly rounded) at compiler's choice -- no other options (in particular the |
@dylanede happy to test your PR out if you've got something that hangs together? I am using |
ping @dylanede to check if you're still interested in followup or have any questions ❤️ |
@dylanede if you have something halfway but don't have time to finish it feel free to pop the branch on your fork and I can have a crack at finishing it. |
From my limited testing, multiplying and then adding
SimdF32
/SimdF64
does not result in FMA instructions, and rightly so, however it would be useful to have a method on these types and a corresponding platform intrinsic to generate LLVM calls tollvm.fma.*
intrinsics, which support vector types.Edit: In fact it looks like the
simd_fma
platform intrinsic is already available, just not used by the crate?The text was updated successfully, but these errors were encountered: