Add `fmae` which uses `fmuladd` #116

GabrielMajeri · 2018-09-01T11:28:22Z

Fixes #112.

When it comes to performance, the performance in my updated stencil benchmark with avx2 enabled is the same. But when avx2 is disabled, the avx fallback is much faster (before this pull request, I saw some 10x performance reduction on AVX 1).

So there are no downsides to this change, only performance improvements.

Also added some documentation for crate users.

I didn't rename the extern "C" fns, only changed the link_name attribute.

gnzlbg · 2018-09-03T09:50:16Z

So there are no downsides to this change,

So this isn't exactly accurate. The reason why, without this change the performance is worse, is IIUC because LLVM has to emulates the infinite precision of the fma instruction on the platforms that do not have it, which is expensive.

First, thank you for doing this, since I mainly test on an AVX1 machine and this is what I was seeing, and now we know that fmuladd is an intrinsic worth adding, since this is not something users can easily implement on their own, and it has a very big impact on the performance on some targets. I think we should do the following:

keep ::fma as is, guaranteeing precision - add some tests (e.g. to the ./test directory) that check this (I don't know for which triple of values lower precision is observable, but maybe we can find something).
add an fma-estimate method, e.g., ::fmae that uses fmuladd to produce an "estimate" of the infinite precision fma. On platforms that support it, it is exact, but on platforms that do not, it just does an a * b + c instead of trying to emulate the infinite precision.
finally: update the aobench and other benchmarks to use ::fmae.

GabrielMajeri · 2018-09-03T10:19:31Z

@gnzlbg I've changed this PR as you said and implemented the fmae function. I haven't added the tests for the existing fma because I don't know which numbers to use when testing either.

src/api/math/float/fmae.rs

gnzlbg · 2018-09-03T11:06:28Z

src/api/math/float/fmae.rs

-            /// fused multiply subtract (`self * y - z`).
-            /// Simply negating the second parameter of this function
-            /// will make the compiler generate it.
+            /// While fused-multiply add ([fma]) has infinite precision,


I think you need to put fma in ticks `` for the doc link to work properly.

gnzlbg · 2018-09-03T11:44:53Z

I haven't added the tests for the existing fma because I don't know which numbers to use when testing either.

I've been fuzzing fma and a * b + c for a while and I do not manage to get them to produce different results on my machine :/

gnzlbg · 2018-09-03T12:23:38Z

It seems that rustdoc trips on trying to generate the links - let's just remove the [ ] so that we can merge this. I've opened a PR to fix the clippy bot.

GabrielMajeri · 2018-09-03T12:45:08Z

@gnzlbg Alright, the links shouldn' be a problem since rustdoc will place these functions one next to another in the end.

I have removed the mention of fmsub from fma's documentation, because it didn't tell the full story anyway. x86 has additional intrinsics for advanced FMA stuff like fmaddsub, masked fma, etc.

LLVM will simply do its best to pick the right instruction, based on how fma is used.

Also, some bikeshedding: Rust's f32 primitive type has a mul_add function which (contrary to its name) will always use FMA (like the current fma function). Any reason for not choosing that name?

gnzlbg · 2018-09-03T13:57:53Z

Any reason for not choosing that name?

Oversight probably. Please open an issue about this. The floating-point methods are not in the initial round for stabilization so not much discussion has been going on about these.

GabrielMajeri mentioned this pull request Sep 3, 2018

aobench: cannot use sleef-sys #118

Closed

GabrielMajeri force-pushed the fmuladd branch from 3d7712b to b719209 Compare September 3, 2018 10:17

GabrielMajeri changed the title ~~Use fmuladd for fma and document this behavior~~ Add fmae which uses fmuladd Sep 3, 2018

gnzlbg reviewed Sep 3, 2018

View reviewed changes

src/api/math/float/fmae.rs Outdated Show resolved Hide resolved

gnzlbg reviewed Sep 3, 2018

View reviewed changes

GabrielMajeri force-pushed the fmuladd branch from b6d5cd4 to 8ad06a1 Compare September 3, 2018 11:07

Add fma estimate which uses fmuladd

0230670

GabrielMajeri force-pushed the fmuladd branch from 8ad06a1 to 0230670 Compare September 3, 2018 12:43

gnzlbg merged commit 3f2d3b2 into rust-lang:master Sep 3, 2018

GabrielMajeri deleted the fmuladd branch September 3, 2018 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `fmae` which uses `fmuladd` #116

Add `fmae` which uses `fmuladd` #116

GabrielMajeri commented Sep 1, 2018 •

edited

Loading

gnzlbg commented Sep 3, 2018 •

edited

Loading

GabrielMajeri commented Sep 3, 2018

gnzlbg Sep 3, 2018 •

edited

Loading

gnzlbg commented Sep 3, 2018

gnzlbg commented Sep 3, 2018

GabrielMajeri commented Sep 3, 2018 •

edited

Loading

gnzlbg commented Sep 3, 2018

Add fmae which uses fmuladd #116

Add fmae which uses fmuladd #116

Conversation

GabrielMajeri commented Sep 1, 2018 • edited Loading

gnzlbg commented Sep 3, 2018 • edited Loading

GabrielMajeri commented Sep 3, 2018

gnzlbg Sep 3, 2018 • edited Loading

Choose a reason for hiding this comment

gnzlbg commented Sep 3, 2018

gnzlbg commented Sep 3, 2018

GabrielMajeri commented Sep 3, 2018 • edited Loading

gnzlbg commented Sep 3, 2018

Add `fmae` which uses `fmuladd` #116

Add `fmae` which uses `fmuladd` #116

GabrielMajeri commented Sep 1, 2018 •

edited

Loading

gnzlbg commented Sep 3, 2018 •

edited

Loading

gnzlbg Sep 3, 2018 •

edited

Loading

GabrielMajeri commented Sep 3, 2018 •

edited

Loading